Evolutionary (deep) neural network

Evolutionary Deep Neural Network
(or NeuroEvolution)
신수용
2017. 8. 29
@SNU TensorFlow Study

2
https://www.youtube.com/watch?v=aeWmdojEJf0
https://github.com/ssusnic/Machine-Learning-Flappy-Bird

3
Evolutionary DNN
• Usually, used to decide DNN structure
– Number of layers, number of nodes..
• Can be used to decide weight values
– Flappy bird example

5
Biological Basis
• Biological systems adapt themselves to a
new environment by evolution.
• Biological evolution
– Production of descendants changed from
their parents
– Selective survival of some of these
descendants to produce more descendants
Survival of the Fittest

6
Evolutionary Computation
• Stochastic search (or problem solving)
techniques that mimic the metaphor of
natural biological evolution.

7
7
General Framework
초기해집합 생성
적합도 평가
종료?
부모 개체 선택
자손 생성
적합도 함수
최적해
Yes
No
교차 연산
돌연변이 연산
선택 연산

8
Paradigms in EC
• Genetic Algorithm (GA)
– [J. Holland, 1975]
– Bitstrings, mainly crossover, proportionate selection
• Genetic Programming (GP)
– [J. Koza, 1992]
– Trees, mainly crossover, proportionate selection
• Evolutionary Programming (EP)
– [L. Fogel et al., 1966]
– FSMs, mutation only, tournament selection
• Evolution Strategy (ES)
– [I. Rechenberg, 1973]
– Real values, mainly mutation, ranking selection

10
GA(Genetic Algorithms)
• 자연계의 유전 현상을 모방하여 적합한 가설을 얻
어내는 방법
• 특성
– 진화는 자연계에서 성공적이고 정교한 적응 방법
– 모델링하기 힘든 복잡한 문제에도 적용 가능
– 병렬화가 가능, H/W 성능의 도움을 받을 수 있음
• 대규모 탐색공간에서 최선의 fitness의 해를 찾는
일반적인 최적화 과정
• 최적의 해를 찾는다고 보장할 수는 없지만 높은
fitness의 해를 얻을 수 있음

11
GA의 기본용어 (1/2)
• 염색체 (Chromosome)  개체 (Individual)
– 주어진 문제에 대한 가능한 해 또는 가설
– 대부분 string으로 표현됨
– string의 원소는 정수, 실수 등 필요에 의해 결정됨
• 개체군 (population)
– 개체(가설)들의 집합
1 1 0 1 0 0 1 1

12
GA의 기본용어 (2/2)
• 적합도 (fitness)
– 산술적인 단위로 가설의 적합도를 표시한다.
– 유전자의 각 개체의 환경에 대한 적합의 비율을 평가
하는 값
– 평가치로 최적화 문제를 대상으로 하는 경우 목적함
수 값이나 제약조건을 고려하여 페널티 함수 값
• 적합도 함수 (fitness function)
– 적합도를 구하기 위해서 사용되는 기준방법

13
GA의 연산자 (1/5)
• 선택 연산자 (Selection Operator)
– 개체를 선택하여 부모들로 선정
– 우수한 자손들이 많이 생성되도록 하기 위해서(해답
을 발견하기 위해서) 좀더 우수한 적합도를 가진 개
체들이 선택될 확률이 비교적 높도록 함.
– Proportional (Roulette wheel) selection
– Tournament selection
– Ranking-based selection

14
• 교차 연산자 (Crossover Operator)
– 생물들이 생식을 하는 것처럼 부모들의 염색체를 서
로 교차시켜서 자손을 만드는 연산자.
– Crossover rate라고 불리는 임의의 확률에 의해서
교차연산의 수행여부가 결정된다.

15
– One-point crossover
1 1 0 1 0 0 1 1
0 1 1 1 0 1 1 0
Crossover point
1 1 0 1 0 1 1 0
0 1 1 1 0 0 1 1

16
• 돌연변이 연산자 (Mutation Operator)
– 한 bit를 mutation rate라는 임의의 확률로 변화
(flip)시키는 연산자
– 아주 작은 확률로 적용된다. (ex) 0.001
1 1 0 1 0 0 1 1
1 1 0 1 1 0 1 1

17
Example of Genetic Algorithm

18
가설 공간 탐색
• 다른 탐색 방법과의 비교
– local minima에 빠질 확률이 적다(급격한 움직임 가
능)
• Crowding
– 유사한 개체들이 개체군의 다수를 점유하는 현상
– 다양성을 감소시킨다.

19
Crowding
• Crowding의 해결법
– 선택방법을 바꾼다.
• Tournament selection, ranking selection
– “fitness sharing”
• 유사한 개체가 많으면 fitness를 감소시킨다.
– 결합하는 개체들을 제한
• 가장 비슷한 개체끼리 결합하게 함으로써 cluster or
multiple subspecies 형성
• 개체들을 공간적으로 분포시키고 근처의 것끼리만 결합 가
능하게 함

20
Typical behavior of an EA
• Phases in optimizing on a 1-dimensional fitness
landscape
Early phase:
quasi-random population distribution
Mid-phase:
population arranged around/on hills
Late phase:
population concentrated on high hills

21
Geometric Analogy - Mathematical Landscape

22
Typical run: progression of fitness
Typical run of an EA shows so-called “anytime behavior”
Bestfitnessinpopulation
Time (number of generations)

23
Bestfitnessinpopulation
Time (number of generations)
Progress in 1st half
Progress in 2nd half
Are long runs beneficial?
• Answer:
- it depends how much you want the last bit of progress
- it may be better to do more shorter runs

24
Scale of “all” problems
Performanceofmethodsonproblems
Random search
Special, problem tailored method
Evolutionary algorithm
ECs as problem solvers: Goldberg’s 1989 view

25
Advantages of EC
• No presumptions w.r.t. problem space
• Widely applicable
• Low development & application costs
• Easy to incorporate other methods
• Solutions are interpretable (unlike NN)
• Can be run interactively, accommodate
user proposed solutions
• Provide many alternative solutions

26
Disadvantages of EC
• No guarantee for optimal solution within
finite time
• Weak theoretical basis
• May need parameter tuning
• Often computationally expensive, i.e.
slow

28
Genetic Programming
• Genetic programming uses variable-size
tree-representations rather than fixed-
length strings of binary values.
• Program tree
= S-expression
= LISP parse tree
• Tree = Functions (Nonterminals) +
Terminals

29
GP Tree: An Example
• Function set: internal nodes
– Functions, predicates, or actions which
take one or more arguments
• Terminal set: leaf nodes
– Program constants, actions, or functions
which take no arguments
S-expression: (+ 3 (/ ( 5 4) 7))
Terminals = {3, 4, 5, 7}
Functions = {+, , /}

30
Tree based representation
• Trees are a universal form, e.g. consider
• Arithmetic formula
• Logical formula
• Program








15
)3(2
y
x
(x  true)  (( x  y )  (z  (x  y)))
i =1;
while (i < 20)
{
i = i +1
}

31
Tree based representation
• In GA, ES, EP chromosomes are linear
structures (bit strings, integer string, real-
valued vectors, permutations)
• Tree shaped chromosomes are non-linear
structures.
• In GA, ES, EP the size of the
chromosomes is fixed.
• Trees in GP may vary in depth and width.

32
Crossover: Subtree Exchange
+
b
 
a b
+
b
+ 
 
a a b
+

a b

 
a b
+
b
+

a
b


33
Mutation

a b
+
b
/

a
+

b
+
b
/

a
-
b a

35
ES quick overview
• Developed: Germany in the 1970’s
• Early names: I. Rechenberg, H.-P. Schwefel
• Typically applied to:
– numerical optimisation
• Attributed features:
– fast
– good optimizer for real-valued optimisation
– relatively much theory
• Special:
– self-adaptation of (mutation) parameters standard

36
ES technical summary
Representation Real-valued vectors
Recombination Discrete or intermediary
Mutation Gaussian perturbation
Parent selection Uniform random
Survivor selection (,) or (+)
Specialty Self-adaptation of mutation
step sizes

37
Introductory example
• Task: minimimise f : Rn
 R
• Algorithm: “two-membered ES” using
– Vectors from R
n
directly as chromosomes
– Population size 1
– Only mutation creating one child
– Greedy selection

38
Parent selection
• Parents are selected by uniform random
distribution whenever an operator needs
one/some
• Thus: ES parent selection is unbiased -
every individual has the same probability
to be selected
• Note that in ES “parent” means a
population member (in GA’s: a population
member selected to undergo variation)

39
Survivor selection
• Applied after creating  children from the
 parents by mutation and recombination
• Deterministically chops off the “bad stuff”
• Basis of selection is either:
– The set of children only: (,)-selection
– The set of parents and children: (+)-
selection

40
Survivor selection cont’d
• (+)-selection is an elitist strategy
• (,)-selection can “forget”
• Often (,)-selection is preferred for:
– Better in leaving local optima
– Better in following moving optima
– Using the + strategy bad  values can survive in x, too long if
their host x is very fit
• Selective pressure in ES is very high (  7 •  is the
common setting)

42
EP quick overview
• Developed: USA in the 1960’s
• Early names: D. Fogel
• Typically applied to:
– traditional EP: machine learning tasks by finite state machines
– contemporary EP: (numerical) optimization
• Attributed features:
– very open framework: any representation and mutation op’s OK
– crossbred with ES (contemporary EP)
– consequently: hard to say what “standard” EP is
• Special:
– no recombination
– self-adaptation of parameters standard (contemporary EP)

43
EP technical summary tableau
Representation Real-valued vectors
Recombination None
Mutation Gaussian perturbation
Parent selection Deterministic
Survivor selection Probabilistic (+)
Specialty Self-adaptation of mutation
step sizes (in meta-EP)

Evolutionary Neural Networks
(or Neuro-evolution)

45
ENN
• The back-propagation learning algorithm
cannot guarantee an optimal solution.
• In real-world applications, the back-
propagation algorithm might converge to
a set of sub-optimal weights from which
it cannot escape.
• As a result, the neural network is often
unable to find a desirable solution to a
problem at hand.

46
ENN
• Another difficulty is related to selecting an
optimal topology for the neural network.
– The “right” network architecture for a particular
problem is often chosen by means of heuristics,
and designing a neural network topology is still
more art than engineering.
• Genetic algorithms are an effective
optimization technique that can guide both
weight optimization and topology selection.

47
Encoding a set of weights in a chromosome
y
0.9
1
3
4
x1
x3
x2
2
-0.8
0.4
0.8
-0.7
0.2
-0.2
0.6
-0.3 0.1
-0.2
0.9
-0.60.1
0.3
0.5
From neuron:
To neuron:
12 34 5678
1
2
3
4
5
6
7
8
00 00 0000
00 00 0000
00 00 0000
0.9 -0.3 -0.7 0 0000
-0.8 0.6 0.3 0 0000
0.1 -0.2 0.2 0 0000
0.4 0.5 0.8 0 0000
00 0 -0.6 0.1 -0.2 0.9 0
Chromosome: 0.9 -0.3 -0.7 -0.8 0.6 0.3 0.1 -0.2 0.2 0.4 0.5 0.8 -0.6 0.1 -0.2 0.9

48
Fitness function
• The second step is to define a fitness
function for evaluating the chromosome’s
performance.
– This function must estimate the performance
of a given neural network.
– Simple function defined by the sum of
squared errors.

49
4
5
y
x2
2
-0.3
0.9
-0.7
0.5
-0.8
-0.6
Parent 1 Parent 2
x1
1 -0.2
0.1
0.4
4
5
y
x2
2
-0.1
-0.5
0.2
-0.9
0.6
0.3x1
1 0.9
0.3
-0.8
0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9 0.4 -0.3 0.3 0.2 0.3 -0.9 0.60.9 -0.5 -0.8 -0.1
0.1 -0.7 -0.6 0.5 -0.80.9 -0.5 -0.8 0.1
4
y
x2
2
-0.1
-0.5
-0.7
0.5
-0.8
-0.6
Child
x1
1 0.9
0.1
-0.8
Crossover

50
Mutation
Original network
3
4
5
y
6
x2
2
-0.3
0.9
-0.7
0.5
-0.8
-0.6x1
1
-0.2
0.1
0.4
0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9
3
4
5
y
6
x2
2
0.2
0.9
-0.7
0.5
-0.8
-0.6x1
1
-0.2
0.1
-0.1
0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9
Mutated network
0.4 -0.3 -0.1 0.2

51
Architecture Selection
• The architecture of the network (i.e. the
number of neurons and their
interconnections) often determines the
success or failure of the application.
• Usually the network architecture is decided
by trial and error; there is a great need for a
method of automatically designing the
architecture for a particular application.
– Genetic algorithms may well be suited for this
task.

52
Encoding
Fromneuron:
To neuron:
1 2
0
5
0
3
0
4
0
6
1
2
3
4
5
6
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1 1
1
1 1 1 1
0
1 0
0 0
0 0
3
4
5
y
6
x2
2
x1
1
Chromosome:
0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0

53
Process
Neural Network j
Fitness = 117
Neural Network j
Fitness = 117
Generation i
Training Data Set
0 0 1.0000
0.1000 0.0998 0.8869
0.2000 0.1987 0.7551
0.3000 0.2955 0.6142
0.4000 0.3894 0.4720
0.5000 0.4794 0.3345
0.6000 0.5646 0.2060
0.7000 0.6442 0.0892
0.8000 0.7174 -0.0143
0.9000 0.7833 -0.1038
1.0000 0.8415 -0.1794
Child 2
Child 1
Crossover
Parent 1
Parent 2
Mutation
Generation (i + 1)

55
Good reference blog
• https://medium.com/@stathis/design-
by-evolution-393e41863f98

56
Evolving Deep Neural Networks
• https://arxiv.org/pdf/1703.00548.pdf
• CoDeepNEAT
– for optimizing deep learning architectures
through evolution
– Evolving DNNS for CIFAR-10
– Evolving LSTM architecture
– Not so clear experimental comparison..

57
Large-Scale Evolution of Image
Classifiers
• https://arxiv.org/abs/1703.01041
• Individual
– a trained architecture
• Fitness
– Individual’s accuracy on a validation set
• Selection (tournament selection)
– Randomly choose two individuals
– Select better one (parent)

58
Classifiers
• Mutation
– Pick a mutation from a
predetermined set
• Train child
• Repeat.

59
Classifiers

60
Convolution by Evolution
• https://arxiv.org/pdf/1606.02580.pdf
• GECCO16 paper
• Differential version of the Compositional
Pattern Producing Network (DPPN)
– Topology is evolved but the weights are
learned
– Compressed the weights of a denoising
autoencoder from 157684 to roughly 200
parameters with comparable image
reconstruction accuracy

sooyong.shin@khu.ac.kr
@likesky3

Evolutionary (deep) neural network

Recommandé

Recommandé

Contenu connexe

Similaire à Evolutionary (deep) neural network

Similaire à Evolutionary (deep) neural network (20)

Dernier

Dernier (20)

Evolutionary (deep) neural network