3. 3
Evolutionary DNN
• Usually, used to decide DNN structure
– Number of layers, number of nodes..
• Can be used to decide weight values
– Flappy bird example
5. 5
Biological Basis
• Biological systems adapt themselves to a
new environment by evolution.
• Biological evolution
– Production of descendants changed from
their parents
– Selective survival of some of these
descendants to produce more descendants
Survival of the Fittest
10. 10
GA(Genetic Algorithms)
• 자연계의 유전 현상을 모방하여 적합한 가설을 얻
어내는 방법
• 특성
– 진화는 자연계에서 성공적이고 정교한 적응 방법
– 모델링하기 힘든 복잡한 문제에도 적용 가능
– 병렬화가 가능, H/W 성능의 도움을 받을 수 있음
• 대규모 탐색공간에서 최선의 fitness의 해를 찾는
일반적인 최적화 과정
• 최적의 해를 찾는다고 보장할 수는 없지만 높은
fitness의 해를 얻을 수 있음
11. 11
GA의 기본용어 (1/2)
• 염색체 (Chromosome) 개체 (Individual)
– 주어진 문제에 대한 가능한 해 또는 가설
– 대부분 string으로 표현됨
– string의 원소는 정수, 실수 등 필요에 의해 결정됨
• 개체군 (population)
– 개체(가설)들의 집합
1 1 0 1 0 0 1 1
12. 12
GA의 기본용어 (2/2)
• 적합도 (fitness)
– 산술적인 단위로 가설의 적합도를 표시한다.
– 유전자의 각 개체의 환경에 대한 적합의 비율을 평가
하는 값
– 평가치로 최적화 문제를 대상으로 하는 경우 목적함
수 값이나 제약조건을 고려하여 페널티 함수 값
• 적합도 함수 (fitness function)
– 적합도를 구하기 위해서 사용되는 기준방법
13. 13
GA의 연산자 (1/5)
• 선택 연산자 (Selection Operator)
– 개체를 선택하여 부모들로 선정
– 우수한 자손들이 많이 생성되도록 하기 위해서(해답
을 발견하기 위해서) 좀더 우수한 적합도를 가진 개
체들이 선택될 확률이 비교적 높도록 함.
– Proportional (Roulette wheel) selection
– Tournament selection
– Ranking-based selection
14. 14
GA의 연산자 (2/5)
• 교차 연산자 (Crossover Operator)
– 생물들이 생식을 하는 것처럼 부모들의 염색체를 서
로 교차시켜서 자손을 만드는 연산자.
– Crossover rate라고 불리는 임의의 확률에 의해서
교차연산의 수행여부가 결정된다.
18. 18
가설 공간 탐색
• 다른 탐색 방법과의 비교
– local minima에 빠질 확률이 적다(급격한 움직임 가
능)
• Crowding
– 유사한 개체들이 개체군의 다수를 점유하는 현상
– 다양성을 감소시킨다.
19. 19
Crowding
• Crowding의 해결법
– 선택방법을 바꾼다.
• Tournament selection, ranking selection
– “fitness sharing”
• 유사한 개체가 많으면 fitness를 감소시킨다.
– 결합하는 개체들을 제한
• 가장 비슷한 개체끼리 결합하게 함으로써 cluster or
multiple subspecies 형성
• 개체들을 공간적으로 분포시키고 근처의 것끼리만 결합 가
능하게 함
20. 20
Typical behavior of an EA
• Phases in optimizing on a 1-dimensional fitness
landscape
Early phase:
quasi-random population distribution
Mid-phase:
population arranged around/on hills
Late phase:
population concentrated on high hills
22. 22
Typical run: progression of fitness
Typical run of an EA shows so-called “anytime behavior”
Bestfitnessinpopulation
Time (number of generations)
23. 23
Bestfitnessinpopulation
Time (number of generations)
Progress in 1st half
Progress in 2nd half
Are long runs beneficial?
• Answer:
- it depends how much you want the last bit of progress
- it may be better to do more shorter runs
24. 24
Scale of “all” problems
Performanceofmethodsonproblems
Random search
Special, problem tailored method
Evolutionary algorithm
ECs as problem solvers: Goldberg’s 1989 view
25. 25
Advantages of EC
• No presumptions w.r.t. problem space
• Widely applicable
• Low development & application costs
• Easy to incorporate other methods
• Solutions are interpretable (unlike NN)
• Can be run interactively, accommodate
user proposed solutions
• Provide many alternative solutions
26. 26
Disadvantages of EC
• No guarantee for optimal solution within
finite time
• Weak theoretical basis
• May need parameter tuning
• Often computationally expensive, i.e.
slow
28. 28
Genetic Programming
• Genetic programming uses variable-size
tree-representations rather than fixed-
length strings of binary values.
• Program tree
= S-expression
= LISP parse tree
• Tree = Functions (Nonterminals) +
Terminals
29. 29
GP Tree: An Example
• Function set: internal nodes
– Functions, predicates, or actions which
take one or more arguments
• Terminal set: leaf nodes
– Program constants, actions, or functions
which take no arguments
S-expression: (+ 3 (/ ( 5 4) 7))
Terminals = {3, 4, 5, 7}
Functions = {+, , /}
30. 30
Tree based representation
• Trees are a universal form, e.g. consider
• Arithmetic formula
• Logical formula
• Program
15
)3(2
y
x
(x true) (( x y ) (z (x y)))
i =1;
while (i < 20)
{
i = i +1
}
31. 31
Tree based representation
• In GA, ES, EP chromosomes are linear
structures (bit strings, integer string, real-
valued vectors, permutations)
• Tree shaped chromosomes are non-linear
structures.
• In GA, ES, EP the size of the
chromosomes is fixed.
• Trees in GP may vary in depth and width.
35. 35
ES quick overview
• Developed: Germany in the 1970’s
• Early names: I. Rechenberg, H.-P. Schwefel
• Typically applied to:
– numerical optimisation
• Attributed features:
– fast
– good optimizer for real-valued optimisation
– relatively much theory
• Special:
– self-adaptation of (mutation) parameters standard
36. 36
ES technical summary
Representation Real-valued vectors
Recombination Discrete or intermediary
Mutation Gaussian perturbation
Parent selection Uniform random
Survivor selection (,) or (+)
Specialty Self-adaptation of mutation
step sizes
37. 37
Introductory example
• Task: minimimise f : Rn
R
• Algorithm: “two-membered ES” using
– Vectors from R
n
directly as chromosomes
– Population size 1
– Only mutation creating one child
– Greedy selection
38. 38
Parent selection
• Parents are selected by uniform random
distribution whenever an operator needs
one/some
• Thus: ES parent selection is unbiased -
every individual has the same probability
to be selected
• Note that in ES “parent” means a
population member (in GA’s: a population
member selected to undergo variation)
39. 39
Survivor selection
• Applied after creating children from the
parents by mutation and recombination
• Deterministically chops off the “bad stuff”
• Basis of selection is either:
– The set of children only: (,)-selection
– The set of parents and children: (+)-
selection
40. 40
Survivor selection cont’d
• (+)-selection is an elitist strategy
• (,)-selection can “forget”
• Often (,)-selection is preferred for:
– Better in leaving local optima
– Better in following moving optima
– Using the + strategy bad values can survive in x, too long if
their host x is very fit
• Selective pressure in ES is very high ( 7 • is the
common setting)
42. 42
EP quick overview
• Developed: USA in the 1960’s
• Early names: D. Fogel
• Typically applied to:
– traditional EP: machine learning tasks by finite state machines
– contemporary EP: (numerical) optimization
• Attributed features:
– very open framework: any representation and mutation op’s OK
– crossbred with ES (contemporary EP)
– consequently: hard to say what “standard” EP is
• Special:
– no recombination
– self-adaptation of parameters standard (contemporary EP)
43. 43
EP technical summary tableau
Representation Real-valued vectors
Recombination None
Mutation Gaussian perturbation
Parent selection Deterministic
Survivor selection Probabilistic (+)
Specialty Self-adaptation of mutation
step sizes (in meta-EP)
45. 45
ENN
• The back-propagation learning algorithm
cannot guarantee an optimal solution.
• In real-world applications, the back-
propagation algorithm might converge to
a set of sub-optimal weights from which
it cannot escape.
• As a result, the neural network is often
unable to find a desirable solution to a
problem at hand.
46. 46
ENN
• Another difficulty is related to selecting an
optimal topology for the neural network.
– The “right” network architecture for a particular
problem is often chosen by means of heuristics,
and designing a neural network topology is still
more art than engineering.
• Genetic algorithms are an effective
optimization technique that can guide both
weight optimization and topology selection.
48. 48
Fitness function
• The second step is to define a fitness
function for evaluating the chromosome’s
performance.
– This function must estimate the performance
of a given neural network.
– Simple function defined by the sum of
squared errors.
51. 51
Architecture Selection
• The architecture of the network (i.e. the
number of neurons and their
interconnections) often determines the
success or failure of the application.
• Usually the network architecture is decided
by trial and error; there is a great need for a
method of automatically designing the
architecture for a particular application.
– Genetic algorithms may well be suited for this
task.
56. 56
Evolving Deep Neural Networks
• https://arxiv.org/pdf/1703.00548.pdf
• CoDeepNEAT
– for optimizing deep learning architectures
through evolution
– Evolving DNNS for CIFAR-10
– Evolving LSTM architecture
– Not so clear experimental comparison..
57. 57
Large-Scale Evolution of Image
Classifiers
• https://arxiv.org/abs/1703.01041
• Individual
– a trained architecture
• Fitness
– Individual’s accuracy on a validation set
• Selection (tournament selection)
– Randomly choose two individuals
– Select better one (parent)
58. 58
Large-Scale Evolution of Image
Classifiers
• Mutation
– Pick a mutation from a
predetermined set
• Train child
• Repeat.
60. 60
Convolution by Evolution
• https://arxiv.org/pdf/1606.02580.pdf
• GECCO16 paper
• Differential version of the Compositional
Pattern Producing Network (DPPN)
– Topology is evolved but the weights are
learned
– Compressed the weights of a denoising
autoencoder from 157684 to roughly 200
parameters with comparable image
reconstruction accuracy