SlideShare une entreprise Scribd logo
1  sur  177
Machine Learning on Big Data
Lessons Learned from Google Projects

Max Lin
Software Engineer | Google Research

Massively Parallel Computing | Harvard CS 264
Guest Lecture | March 29th, 2011
Outline

• Machine Learning intro
• Scaling machine learning algorithms up
• Design choices of large scale ML systems
Outline

• Machine Learning intro
• Scaling machine learning algorithms up
• Design choices of large scale ML systems
“Machine Learning is a study
of computer algorithms that
   improve automatically
    through experience.”
The quick brown fox
jumped over the lazy dog.
The quick brown fox
                            English
jumped over the lazy dog.
The quick brown fox
                            English
jumped over the lazy dog.
To err is human, but to
really foul things up you
need a computer.
The quick brown fox
                            English
jumped over the lazy dog.
To err is human, but to
really foul things up you   English
need a computer.
The quick brown fox
                            English
jumped over the lazy dog.
To err is human, but to
really foul things up you   English
need a computer.
No hay mal que por bien
no venga.
The quick brown fox
                            English
jumped over the lazy dog.
To err is human, but to
really foul things up you   English
need a computer.
No hay mal que por bien
                            Spanish
no venga.
The quick brown fox
                            English
jumped over the lazy dog.
To err is human, but to
really foul things up you   English
need a computer.
No hay mal que por bien
                            Spanish
no venga.
La tercera es la vencida.
The quick brown fox
                            English
jumped over the lazy dog.
To err is human, but to
really foul things up you   English
need a computer.
No hay mal que por bien
                            Spanish
no venga.
La tercera es la vencida.   Spanish
The quick brown fox
                             English
jumped over the lazy dog.
To err is human, but to
really foul things up you    English
need a computer.
No hay mal que por bien
                             Spanish
no venga.
La tercera es la vencida.    Spanish

To be or not to be -- that
is the question
The quick brown fox
                             English
jumped over the lazy dog.
To err is human, but to
really foul things up you    English
need a computer.
No hay mal que por bien
                             Spanish
no venga.
La tercera es la vencida.    Spanish

To be or not to be -- that
                                ?
is the question
The quick brown fox
                             English
jumped over the lazy dog.
To err is human, but to
really foul things up you    English
need a computer.
No hay mal que por bien
                             Spanish
no venga.
La tercera es la vencida.    Spanish

To be or not to be -- that
                                ?
is the question

La fe mueve montañas.
The quick brown fox
                             English
jumped over the lazy dog.
To err is human, but to
really foul things up you    English
need a computer.
No hay mal que por bien
                             Spanish
no venga.
La tercera es la vencida.    Spanish

To be or not to be -- that
                                ?
is the question

La fe mueve montañas.           ?
The quick brown fox
                                        English
           jumped over the lazy dog.
           To err is human, but to
           really foul things up you    English
Training   need a computer.
           No hay mal que por bien
                                        Spanish
           no venga.
           La tercera es la vencida.    Spanish

           To be or not to be -- that
                                           ?
           is the question

           La fe mueve montañas.           ?
The quick brown fox
                                        English
           jumped over the lazy dog.
           To err is human, but to
           really foul things up you    English
Training        Input X
           need a computer.
           No hay mal que por bien
                                        Spanish
           no venga.
           La tercera es la vencida.    Spanish

           To be or not to be -- that
                                           ?
           is the question

           La fe mueve montañas.           ?
The quick brown fox
                                          English
           jumped over the lazy dog.
           To err is human, but to
           really foul things up you      English
Training        Input X
           need a computer.             Output Y
           No hay mal que por bien
                                          Spanish
           no venga.
           La tercera es la vencida.      Spanish

           To be or not to be -- that
                                             ?
           is the question

           La fe mueve montañas.             ?
The quick brown fox
                                          English
           jumped over the lazy dog.
           To err is human, but to
           really foul things up you      English
Training        Input X
           need a computer.             Output Y
           No hay mal que por bien
                                          Spanish
           no venga.
                            Model f(x)
           La tercera es la vencida. Spanish

           To be or not to be -- that
                                             ?
           is the question

           La fe mueve montañas.             ?
The quick brown fox
                                          English
           jumped over the lazy dog.
           To err is human, but to
           really foul things up you      English
Training        Input X
           need a computer.             Output Y
           No hay mal que por bien
                                          Spanish
           no venga.
                            Model f(x)
           La tercera es la vencida. Spanish

           To be or not to be -- that
                                             ?
Testing    is the question

           La fe mueve montañas.             ?
The quick brown fox
                                          English
           jumped over the lazy dog.
           To err is human, but to
           really foul things up you      English
Training        Input X
           need a computer.             Output Y
           No hay mal que por bien
                                          Spanish
           no venga.
                            Model f(x)
           La tercera es la vencida. Spanish

           To be or not to be -- that
                                             ?
Testing                 f(x’)
           is the question

           La fe mueve montañas.             ?
The quick brown fox
                                           English
           jumped over the lazy dog.
           To err is human, but to
           really foul things up you       English
Training        Input X
           need a computer.             Output Y
           No hay mal que por bien
                                           Spanish
           no venga.
                            Model f(x)
           La tercera es la vencida. Spanish

           To be or not to be -- that
                                               ?
Testing                 f(x’)
           is the question
                                        = y’
           La fe mueve montañas.               ?
Linear Classifier
Linear Classifier
The quick brown fox jumped over the lazy dog.
Linear Classifier
  The quick brown fox jumped over the lazy dog.

‘a’
Linear Classifier
  The quick brown fox jumped over the lazy dog.

‘a’ ...
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ...
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ...
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ...
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
0,
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
 0, ...
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
 0, ...     0,
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
 0, ...     0,     ...
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
 0, ...     0,     ... 1,
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
 0, ...     0,     ... 1, ...
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
 0, ...     0,     ... 1, ... 1,
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
 0, ...     0,     ... 1, ... 1, ...
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
 0, ...     0,     ... 1, ... 1, ...          0,
Linear Classifier
 The quick brown fox jumped over the lazy dog.

‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
 0, ...     0,     ... 1, ... 1, ...          0,      ...
Linear Classifier
   The quick brown fox jumped over the lazy dog.

  ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
[ 0, ...      0,     ... 1, ... 1, ...          0,      ...
Linear Classifier
   The quick brown fox jumped over the lazy dog.

  ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
[ 0, ...      0,     ... 1, ... 1, ...          0,      ... ]
Linear Classifier
       The quick brown fox jumped over the lazy dog.

    ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
x [ 0, ...      0,     ... 1, ... 1, ...          0,      ... ]
Linear Classifier
       The quick brown fox jumped over the lazy dog.

    ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
x [ 0, ...      0,     ... 1, ... 1, ...          0,      ... ]

   [ 0.1, ...    132,    ... 150, ... 200, ...   -153,     ... ]
Linear Classifier
       The quick brown fox jumped over the lazy dog.

    ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
x [ 0, ...      0,     ... 1, ... 1, ...          0,      ... ]

w [ 0.1, ...    132,     ... 150, ... 200, ...   -153,     ... ]
Linear Classifier
       The quick brown fox jumped over the lazy dog.

    ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
x [ 0, ...      0,     ... 1, ... 1, ...          0,      ... ]

w [ 0.1, ...    132,     ... 150, ... 200, ...      -153,   ... ]
                                   P
                 f (x) = w · x =         w p ∗ xp
                                   p=1
Training Data
                 Input X                      Ouput Y

                        P


                                  ...

                                  ...

                                  ...
N




     ...   ...    ...       ...         ...     ...

                                  ...
Typical machine learning
data at Google

N: 100 billions / 1 billion
P: 1 billion / 10 million
(mean / median)




                              http://www.flickr.com/photos/mr_t_in_dc/5469563053
Classifier Training


• Training: Given {(x, y)} and f, minimize the
  following objective function
                  N
        arg min         L(yi , f (xi ; w)) + R(w)
             w
                  n=1
Use Newton’s method?
    t +1      t     t −1                    t
w          ← w − H(w )           J(w )




                     http://www.flickr.com/photos/visitfinland/5424369765/
Outline

• Machine Learning intro
• Scaling machine learning algorithms up
• Design choices of large scale ML systems
Scaling Up
Scaling Up

• Why big data?
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
 • Parallelize sub-routines
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
 • Parallelize sub-routines
 • Distributed learning
Subsampling
Subsampling
      Big Data
Subsampling
                    Big Data




Shard 1   Shard 2     Shard 3         Shard M
                                ...
Subsampling
                      Big Data




Reduce N   Shard 1
Subsampling
                      Big Data




Reduce N   Shard 1



           Machine
Subsampling
                      Big Data




Reduce N

           Machine

           Shard 1
Subsampling
                      Big Data




Reduce N

           Machine

           Shard 1




           Model
Why not Small Data?




                [Banko and Brill, 2001]
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
 • Parallelize sub-routines
 • Distributed learning
Parallelize Estimates
• Naive Bayes Classifier
                 N   P
                               i
     arg min −             P (xp |yi ; w)P (yi ; w)
         w
                 i=1 p=1


• Maximum Likelihood Estimates
                           N              i
                           i=1 1EN,the (x )
        wthe|EN =            N
                             i=1 1EN (xi )
Word Counting
Word Counting
Map
Word Counting
      X: “The quick brown fox ...”
Map
      Y: EN
Word Counting
                                     (‘the|EN’, 1)
      X: “The quick brown fox ...”
Map
      Y: EN
Word Counting
                                     (‘the|EN’, 1)
      X: “The quick brown fox ...”
Map                                  (‘quick|EN’, 1)
      Y: EN
Word Counting
                                     (‘the|EN’, 1)
      X: “The quick brown fox ...”
Map                                  (‘quick|EN’, 1)
      Y: EN
                                     (‘brown|EN’, 1)
Word Counting
                                        (‘the|EN’, 1)
         X: “The quick brown fox ...”
 Map                                    (‘quick|EN’, 1)
         Y: EN
                                        (‘brown|EN’, 1)

Reduce
Word Counting
                                            (‘the|EN’, 1)
         X: “The quick brown fox ...”
 Map                                        (‘quick|EN’, 1)
         Y: EN
                                            (‘brown|EN’, 1)

Reduce     [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ]
Word Counting
                                            (‘the|EN’, 1)
         X: “The quick brown fox ...”
 Map                                        (‘quick|EN’, 1)
         Y: EN
                                            (‘brown|EN’, 1)

Reduce     [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ]
               C(‘the’|EN) = SUM of values = 3
Word Counting
                                            (‘the|EN’, 1)
         X: “The quick brown fox ...”
 Map                                        (‘quick|EN’, 1)
         Y: EN
                                            (‘brown|EN’, 1)

Reduce     [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ]
               C(‘the’|EN) = SUM of values = 3
                         C( the |EN )
           w the |EN   =
                           C(EN )
Word Counting
Word Counting
       Big Data
Word Counting
                    Big Data




Shard 1   Shard 2   Shard 3    ...   Shard M
Word Counting
                                   Big Data

          Mapper 1   Mapper 2     Mapper 3          Mapper M

Map        Shard 1      Shard 2    Shard 3    ...   Shard M



      (‘the’ | EN, 1)
Word Counting
                                      Big Data

             Mapper 1   Mapper 2    Mapper 3             Mapper M

 Map          Shard 1    Shard 2      Shard 3      ...    Shard M



         (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1)

                                     Reducer
Reduce                              Tally counts
                                   and update w
Word Counting
                                      Big Data

             Mapper 1   Mapper 2    Mapper 3             Mapper M

 Map          Shard 1    Shard 2      Shard 3      ...    Shard M



         (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1)

                                     Reducer
Reduce                              Tally counts
                                   and update w


                                      Model
Parallelize Optimization
            N           P       i yi
                 exp( p=1 wp ∗ xp )
    arg min               P
         w
            i=1 1 + exp( p=1 wp ∗ xi )
                                    p
Parallelize Optimization
• Maximum Entropy Classifiers
                         P
             N                   i yi
                  exp( p=1 wp ∗ xp )
     arg min               P
          w
             i=1 1 + exp( p=1 wp ∗ xi )
                                     p
Parallelize Optimization
• Maximum Entropy Classifiers
                         P
             N                   i yi
                  exp( p=1 wp ∗ xp )
     arg min               P
          w
             i=1 1 + exp( p=1 wp ∗ xi )
                                     p
Parallelize Optimization
• Maximum Entropy Classifiers
                         P
             N                   i yi
                  exp( p=1 wp ∗ xp )
     arg min               P
          w
             i=1 1 + exp( p=1 wp ∗ xi )
                                     p
Parallelize Optimization
• Maximum Entropy Classifiers
                          P
              N                   i yi
                   exp( p=1 wp ∗ xp )
      arg min               P
           w
              i=1 1 + exp( p=1 wp ∗ xi )
                                      p


• Good: J(w) is concave
Parallelize Optimization
• Maximum Entropy Classifiers
                          P
              N                   i yi
                   exp( p=1 wp ∗ xp )
      arg min               P
           w
              i=1 1 + exp( p=1 wp ∗ xi )
                                      p


• Good: J(w) is concave
• Bad: no closed-form solution like NB
Parallelize Optimization
• Maximum Entropy Classifiers
                          P
              N                   i yi
                   exp( p=1 wp ∗ xp )
      arg min               P
           w
              i=1 1 + exp( p=1 wp ∗ xi )
                                      p


• Good: J(w) is concave
• Bad: no closed-form solution like NB
• Ugly: Large N
Gradient Descent




        http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture7.pdf
Gradient Descent
Gradient Descent
• w is initialized as zero
• for t in 1 to T
 • Calculate gradients
 •
Gradient Descent
• w is initialized as zero
• for t in 1 to T
 • Calculate gradients       J(w)

 •
Gradient Descent
• w is initialized as zero
• for t in 1 to T
 • Calculate gradients J(w)
 • w ← w − η J(w)
     t+1    t
Gradient Descent
• w is initialized as zero
• for t in 1 to T
 • Calculate gradients J(w)
 • w ← w − η J(w)
     t+1          t


           N
 J(w) =         P (w, xi , yi )
          i=1
Distribute Gradient
• w is initialized as zero
• for t in 1 to T
 • Calculate gradients in parallel


• Training CPU: O(TPN) to O(TPN / M)
Distribute Gradient
• w is initialized as zero
• for t in 1 to T
 • Calculate gradients in parallel
    wt+1 ← wt − η J(w)



• Training CPU: O(TPN) to O(TPN / M)
Distribute Gradient
Distribute Gradient
          Big Data
Distribute Gradient
                     Big Data




 Shard 1   Shard 2   Shard 3    ...   Shard M
Distribute Gradient
                                   Big Data

       Machine 1     Machine 2   Machine 3          Machine M

Map     Shard 1       Shard 2     Shard 3     ...    Shard M



                  (dummy key, partial gradient sum)
Distribute Gradient
                                      Big Data

          Machine 1     Machine 2   Machine 3          Machine M

 Map       Shard 1       Shard 2     Shard 3     ...    Shard M



                     (dummy key, partial gradient sum)


Reduce                               Sum and
                                     Update w
Distribute Gradient
                                      Big Data

          Machine 1     Machine 2   Machine 3          Machine M

 Map       Shard 1       Shard 2     Shard 3     ...    Shard M



                     (dummy key, partial gradient sum)


Reduce                               Sum and
                                     Update w


           Repeat M/R
          until converge               Model
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
 • Parallelize sub-routines
 • Distributed learning
Parallelize Subroutines
• Support Vector Machines
                 1
                                         n
                                2
           arg min         ||w||2   +C         ζi
               w,b,ζ   2                 i=1

    s.t.   1 − yi (w · φ(xi ) + b) ≤ ζi , ζi ≥ 0
• Solve the dual problem
                    1 T
             arg min α Qα − αT 1
                  α 2

            s.t.   0 ≤ α ≤ C, yT α = 0
The computational
cost for the Primal-
Dual Interior Point
Method is O(n^3) in
time and O(n^2) in
      memory




http://www.flickr.com/photos/sea-turtle/198445204/
Parallel SVM      [Chang et al, 2007]




          √
              N
Parallel SVM                    [Chang et al, 2007]




•   Parallel, row-wise incomplete Cholesky
    Factorization for Q



                                    √
                                        N
Parallel SVM                [Chang et al, 2007]




•   Parallel, row-wise incomplete Cholesky
    Factorization for Q
•   Parallel interior point method
    •   Time O(n^3) becomes O(n^2 / M)
                                   √
    •   Memory O(n^2) becomes O(n N / M)
Parallel SVM                [Chang et al, 2007]




•   Parallel, row-wise incomplete Cholesky
    Factorization for Q
•   Parallel interior point method
    •   Time O(n^3) becomes O(n^2 / M)
                                   √
    •   Memory O(n^2) becomes O(n N / M)
•   Parallel Support Vector Machines (psvm) http://
    code.google.com/p/psvm/
    •   Implement in MPI
Parallel ICF
• Distribute Q by row into M machines
    Machine 1     Machine 2   Machine 3

      row 1        row 3       row 5      ...
      row 2        row 4       row 6


• For each dimension n < N    √

  • Send local pivots to master
  • Master selects largest local pivots and
    broadcast the global pivot to workers
Scaling Up

• Why big data?
• Parallelize machine learning algorithms
 • Embarrassingly parallel
 • Parallelize sub-routines
 • Distributed learning
Majority Vote
Majority Vote
       Big Data
Majority Vote
                    Big Data




Shard 1   Shard 2   Shard 3    ...   Shard M
Majority Vote
                                Big Data

      Machine 1   Machine 2   Machine 3          Machine M

Map    Shard 1     Shard 2     Shard 3     ...    Shard M
Majority Vote
                                Big Data

      Machine 1   Machine 2   Machine 3          Machine M

Map    Shard 1     Shard 2     Shard 3     ...    Shard M




      Model 1     Model 2      Model 3           Model 4
Majority Vote

• Train individual classifiers independently
• Predict by taking majority votes
• Training CPU: O(TPN) to O(TPN / M)
Parameter Mixture
               [Mann et al, 2009]
Parameter Mixture   [Mann et al, 2009]

         Big Data
Parameter Mixture                     [Mann et al, 2009]

                     Big Data




 Shard 1   Shard 2   Shard 3    ...             Shard M
Parameter Mixture                          [Mann et al, 2009]

                                Big Data

      Machine 1   Machine 2   Machine 3                   Machine M

Map    Shard 1     Shard 2     Shard 3     ...             Shard M




          (dummy key, w1) (dummy key, w2) ...
Parameter Mixture                          [Mann et al, 2009]

                                   Big Data

         Machine 1   Machine 2   Machine 3                   Machine M

 Map      Shard 1     Shard 2     Shard 3     ...             Shard M




             (dummy key, w1) (dummy key, w2) ...

Reduce                            Average w
Parameter Mixture                          [Mann et al, 2009]

                                   Big Data

         Machine 1   Machine 2   Machine 3                   Machine M

 Map      Shard 1     Shard 2     Shard 3     ...             Shard M




             (dummy key, w1) (dummy key, w2) ...

Reduce                            Average w




                                    Model
Much Less network
                                                      usage than
                                                      distributed gradient
                                                      descent
                                                      O(MN) vs. O(MNT)




ttp://www.flickr.com/photos/annamatic3000/127945652/
Iterative Param Mixture
                  [McDonald et al., 2010]
Iterative Param Mixture[McDonald et al., 2010]

            Big Data
Iterative Param Mixture            [McDonald et al., 2010]

                       Big Data




   Shard 1   Shard 2   Shard 3    ...           Shard M
Iterative Param Mixture                   [McDonald et al., 2010]

                                Big Data

      Machine 1   Machine 2   Machine 3                Machine M

Map    Shard 1     Shard 2     Shard 3     ...           Shard M




          (dummy key, w1) (dummy key, w2) ...
Iterative Param Mixture                       [McDonald et al., 2010]

                                       Big Data

             Machine 1   Machine 2   Machine 3                Machine M

  Map         Shard 1     Shard 2     Shard 3     ...           Shard M




                 (dummy key, w1) (dummy key, w2) ...
 Reduce
after each                            Average w

 epoch
Iterative Param Mixture                       [McDonald et al., 2010]

                                       Big Data

             Machine 1   Machine 2   Machine 3                Machine M

  Map         Shard 1     Shard 2     Shard 3     ...           Shard M




                 (dummy key, w1) (dummy key, w2) ...
 Reduce
after each                            Average w

 epoch
                                        Model
Outline

• Machine Learning intro
• Scaling machine learning algorithms up
• Design choices of large scale ML systems
Scalable



           http://www.flickr.com/photos/mr_t_in_dc/5469563053
Parallel



http://www.flickr.com/photos/aloshbennett/3209564747/
Accuracy
http://www.flickr.com/photos/wanderlinse/4367261825/
http://www.flickr.com/photos/imagelink/4006753760/
Binary
                                                     Classification
http://www.flickr.com/photos/brenderous/4532934181/
Automatic
 Feature
Discovery


   http://www.flickr.com/photos/mararie/2340572508/
Fast
                                              Response

http://www.flickr.com/photos/prunejuice/3687192643/
Memory is new
      hard disk.




http://www.flickr.com/photos/jepoirrier/840415676/
Algorithm +
                                                Infrastructure

http://www.flickr.com/photos/neubie/854242030/
Design for
Multicores
             http://www.flickr.com/photos/geektechnique/2344029370/
Combiner
Multi-shard Combiner




[Chandra et al., 2010]
Machine
Learning on
 Big Data
Parallelize ML
 Algorithms
Parallelize ML
         Algorithms

• Embarrassingly parallel
Parallelize ML
         Algorithms

• Embarrassingly parallel
• Parallelize sub-routines
Parallelize ML
         Algorithms

• Embarrassingly parallel
• Parallelize sub-routines
• Distributed learning
Parallel
Parallel   Accuracy
Parallel   Accuracy


  Fast
Response
Parallel   Accuracy


  Fast
Response
Google APIs
Google APIs
•   Prediction API
    •   machine learning service on the cloud
    •   http://code.google.com/apis/predict
Google APIs
•   Prediction API
    •   machine learning service on the cloud
    •   http://code.google.com/apis/predict


•   BigQuery
    •   interactive analysis of massive data on the cloud
    •   http://code.google.com/apis/bigquery

Contenu connexe

Dernier

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Dernier (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Machine Learning on Big Data

  • 1. Machine Learning on Big Data Lessons Learned from Google Projects Max Lin Software Engineer | Google Research Massively Parallel Computing | Harvard CS 264 Guest Lecture | March 29th, 2011
  • 2. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  • 3. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  • 4. “Machine Learning is a study of computer algorithms that improve automatically through experience.”
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. The quick brown fox jumped over the lazy dog.
  • 14. The quick brown fox English jumped over the lazy dog.
  • 15. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you need a computer.
  • 16. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer.
  • 17. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien no venga.
  • 18. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga.
  • 19. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida.
  • 20. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish
  • 21. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that is the question
  • 22. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question
  • 23. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas.
  • 24. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  • 25. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  • 26. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  • 27. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  • 28. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? is the question La fe mueve montañas. ?
  • 29. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? Testing is the question La fe mueve montañas. ?
  • 30. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? Testing f(x’) is the question La fe mueve montañas. ?
  • 31. The quick brown fox English jumped over the lazy dog. To err is human, but to really foul things up you English Training Input X need a computer. Output Y No hay mal que por bien Spanish no venga. Model f(x) La tercera es la vencida. Spanish To be or not to be -- that ? Testing f(x’) is the question = y’ La fe mueve montañas. ?
  • 33. Linear Classifier The quick brown fox jumped over the lazy dog.
  • 34. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’
  • 35. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ...
  • 36. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’
  • 37. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ...
  • 38. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’
  • 39. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ...
  • 40. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’
  • 41. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ...
  • 42. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’
  • 43. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ...
  • 44. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0,
  • 45. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ...
  • 46. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0,
  • 47. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ...
  • 48. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1,
  • 49. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ...
  • 50. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1,
  • 51. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ...
  • 52. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0,
  • 53. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... 0, ... 0, ... 1, ... 1, ... 0, ...
  • 54. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... [ 0, ... 0, ... 1, ... 1, ... 0, ...
  • 55. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
  • 56. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ]
  • 57. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ] [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ]
  • 58. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ] w [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ]
  • 59. Linear Classifier The quick brown fox jumped over the lazy dog. ‘a’ ... ‘aardvark’ ... ‘dog’ ... ‘the’ ... ‘montañas’ ... x [ 0, ... 0, ... 1, ... 1, ... 0, ... ] w [ 0.1, ... 132, ... 150, ... 200, ... -153, ... ] P f (x) = w · x = w p ∗ xp p=1
  • 60. Training Data Input X Ouput Y P ... ... ... N ... ... ... ... ... ... ...
  • 61. Typical machine learning data at Google N: 100 billions / 1 billion P: 1 billion / 10 million (mean / median) http://www.flickr.com/photos/mr_t_in_dc/5469563053
  • 62. Classifier Training • Training: Given {(x, y)} and f, minimize the following objective function N arg min L(yi , f (xi ; w)) + R(w) w n=1
  • 63. Use Newton’s method? t +1 t t −1 t w ← w − H(w ) J(w ) http://www.flickr.com/photos/visitfinland/5424369765/
  • 64. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  • 66. Scaling Up • Why big data?
  • 67. Scaling Up • Why big data? • Parallelize machine learning algorithms
  • 68. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel
  • 69. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines
  • 70. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 72. Subsampling Big Data
  • 73. Subsampling Big Data Shard 1 Shard 2 Shard 3 Shard M ...
  • 74. Subsampling Big Data Reduce N Shard 1
  • 75. Subsampling Big Data Reduce N Shard 1 Machine
  • 76. Subsampling Big Data Reduce N Machine Shard 1
  • 77. Subsampling Big Data Reduce N Machine Shard 1 Model
  • 78. Why not Small Data? [Banko and Brill, 2001]
  • 79. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 80. Parallelize Estimates • Naive Bayes Classifier N P i arg min − P (xp |yi ; w)P (yi ; w) w i=1 p=1 • Maximum Likelihood Estimates N i i=1 1EN,the (x ) wthe|EN = N i=1 1EN (xi )
  • 83. Word Counting X: “The quick brown fox ...” Map Y: EN
  • 84. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map Y: EN
  • 85. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN
  • 86. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1)
  • 87. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1) Reduce
  • 88. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1) Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ]
  • 89. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1) Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ] C(‘the’|EN) = SUM of values = 3
  • 90. Word Counting (‘the|EN’, 1) X: “The quick brown fox ...” Map (‘quick|EN’, 1) Y: EN (‘brown|EN’, 1) Reduce [ (‘the|EN’, 1), (‘the|EN’, 1), (‘the|EN’, 1) ] C(‘the’|EN) = SUM of values = 3 C( the |EN ) w the |EN = C(EN )
  • 92. Word Counting Big Data
  • 93. Word Counting Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  • 94. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper M Map Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1)
  • 95. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper M Map Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1) Reducer Reduce Tally counts and update w
  • 96. Word Counting Big Data Mapper 1 Mapper 2 Mapper 3 Mapper M Map Shard 1 Shard 2 Shard 3 ... Shard M (‘the’ | EN, 1) (‘fox’ | EN, 1) ... (‘montañas’ | ES, 1) Reducer Reduce Tally counts and update w Model
  • 97. Parallelize Optimization N P i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
  • 98. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
  • 99. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
  • 100. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p
  • 101. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p • Good: J(w) is concave
  • 102. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p • Good: J(w) is concave • Bad: no closed-form solution like NB
  • 103. Parallelize Optimization • Maximum Entropy Classifiers P N i yi exp( p=1 wp ∗ xp ) arg min P w i=1 1 + exp( p=1 wp ∗ xi ) p • Good: J(w) is concave • Bad: no closed-form solution like NB • Ugly: Large N
  • 104. Gradient Descent http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture7.pdf
  • 106. Gradient Descent • w is initialized as zero • for t in 1 to T • Calculate gradients •
  • 107. Gradient Descent • w is initialized as zero • for t in 1 to T • Calculate gradients J(w) •
  • 108. Gradient Descent • w is initialized as zero • for t in 1 to T • Calculate gradients J(w) • w ← w − η J(w) t+1 t
  • 109. Gradient Descent • w is initialized as zero • for t in 1 to T • Calculate gradients J(w) • w ← w − η J(w) t+1 t N J(w) = P (w, xi , yi ) i=1
  • 110. Distribute Gradient • w is initialized as zero • for t in 1 to T • Calculate gradients in parallel • Training CPU: O(TPN) to O(TPN / M)
  • 111. Distribute Gradient • w is initialized as zero • for t in 1 to T • Calculate gradients in parallel wt+1 ← wt − η J(w) • Training CPU: O(TPN) to O(TPN / M)
  • 113. Distribute Gradient Big Data
  • 114. Distribute Gradient Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  • 115. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum)
  • 116. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum) Reduce Sum and Update w
  • 117. Distribute Gradient Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, partial gradient sum) Reduce Sum and Update w Repeat M/R until converge Model
  • 118. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 119. Parallelize Subroutines • Support Vector Machines 1 n 2 arg min ||w||2 +C ζi w,b,ζ 2 i=1 s.t. 1 − yi (w · φ(xi ) + b) ≤ ζi , ζi ≥ 0 • Solve the dual problem 1 T arg min α Qα − αT 1 α 2 s.t. 0 ≤ α ≤ C, yT α = 0
  • 120. The computational cost for the Primal- Dual Interior Point Method is O(n^3) in time and O(n^2) in memory http://www.flickr.com/photos/sea-turtle/198445204/
  • 121. Parallel SVM [Chang et al, 2007] √ N
  • 122. Parallel SVM [Chang et al, 2007] • Parallel, row-wise incomplete Cholesky Factorization for Q √ N
  • 123. Parallel SVM [Chang et al, 2007] • Parallel, row-wise incomplete Cholesky Factorization for Q • Parallel interior point method • Time O(n^3) becomes O(n^2 / M) √ • Memory O(n^2) becomes O(n N / M)
  • 124. Parallel SVM [Chang et al, 2007] • Parallel, row-wise incomplete Cholesky Factorization for Q • Parallel interior point method • Time O(n^3) becomes O(n^2 / M) √ • Memory O(n^2) becomes O(n N / M) • Parallel Support Vector Machines (psvm) http:// code.google.com/p/psvm/ • Implement in MPI
  • 125. Parallel ICF • Distribute Q by row into M machines Machine 1 Machine 2 Machine 3 row 1 row 3 row 5 ... row 2 row 4 row 6 • For each dimension n < N √ • Send local pivots to master • Master selects largest local pivots and broadcast the global pivot to workers
  • 126.
  • 127. Scaling Up • Why big data? • Parallelize machine learning algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 129. Majority Vote Big Data
  • 130. Majority Vote Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  • 131. Majority Vote Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M
  • 132. Majority Vote Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M Model 1 Model 2 Model 3 Model 4
  • 133. Majority Vote • Train individual classifiers independently • Predict by taking majority votes • Training CPU: O(TPN) to O(TPN / M)
  • 134. Parameter Mixture [Mann et al, 2009]
  • 135. Parameter Mixture [Mann et al, 2009] Big Data
  • 136. Parameter Mixture [Mann et al, 2009] Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  • 137. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ...
  • 138. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce Average w
  • 139. Parameter Mixture [Mann et al, 2009] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce Average w Model
  • 140. Much Less network usage than distributed gradient descent O(MN) vs. O(MNT) ttp://www.flickr.com/photos/annamatic3000/127945652/
  • 141.
  • 142.
  • 143.
  • 144. Iterative Param Mixture [McDonald et al., 2010]
  • 145. Iterative Param Mixture[McDonald et al., 2010] Big Data
  • 146. Iterative Param Mixture [McDonald et al., 2010] Big Data Shard 1 Shard 2 Shard 3 ... Shard M
  • 147. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ...
  • 148. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce after each Average w epoch
  • 149. Iterative Param Mixture [McDonald et al., 2010] Big Data Machine 1 Machine 2 Machine 3 Machine M Map Shard 1 Shard 2 Shard 3 ... Shard M (dummy key, w1) (dummy key, w2) ... Reduce after each Average w epoch Model
  • 150.
  • 151. Outline • Machine Learning intro • Scaling machine learning algorithms up • Design choices of large scale ML systems
  • 152. Scalable http://www.flickr.com/photos/mr_t_in_dc/5469563053
  • 156. Binary Classification http://www.flickr.com/photos/brenderous/4532934181/
  • 157. Automatic Feature Discovery http://www.flickr.com/photos/mararie/2340572508/
  • 158. Fast Response http://www.flickr.com/photos/prunejuice/3687192643/
  • 159. Memory is new hard disk. http://www.flickr.com/photos/jepoirrier/840415676/
  • 160. Algorithm + Infrastructure http://www.flickr.com/photos/neubie/854242030/
  • 161. Design for Multicores http://www.flickr.com/photos/geektechnique/2344029370/
  • 163.
  • 167. Parallelize ML Algorithms • Embarrassingly parallel
  • 168. Parallelize ML Algorithms • Embarrassingly parallel • Parallelize sub-routines
  • 169. Parallelize ML Algorithms • Embarrassingly parallel • Parallelize sub-routines • Distributed learning
  • 170.
  • 172. Parallel Accuracy
  • 173. Parallel Accuracy Fast Response
  • 174. Parallel Accuracy Fast Response
  • 176. Google APIs • Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict
  • 177. Google APIs • Prediction API • machine learning service on the cloud • http://code.google.com/apis/predict • BigQuery • interactive analysis of massive data on the cloud • http://code.google.com/apis/bigquery

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \\arg \\min_{\\mathbf{w}} \\sum_{n = 1}^N L(y_i, f(x_i; \\mathbf{w})) + R(\\mathbf{w})\n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n\\arg \\min_\\mathbf{w} -\\prod_{i=1}^N \\prod_{p=1}^P P(x^i_p | y_i; \\mathbf{w}) P(y_i; \\mathbf{w})\n\nf(x) = \\sum_{p=1}^P \\log \\frac{\\mathbf{1}(x_p) * w_{p|EN}}{\\mathbf{1}(x_p) * w_{p|ES}} + \\log \\frac{w_{EN}}{w_{ES}}\n\nw_{the|EN} = \\frac{\\sum_{i=1}^N \\mathbf{1}_{EN,the}(x^i)}{\\sum_{i=1}^N \\mathbf{1}_{EN}(x^i)}\n\n\n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n
  118. \n
  119. \n
  120. \n
  121. \n
  122. \n
  123. \n
  124. \n
  125. \n
  126. \n
  127. \n
  128. \n
  129. \n
  130. \n
  131. \n
  132. \n
  133. \n
  134. \n
  135. concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  136. concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  137. concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  138. concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  139. concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  140. concave os w\nno closed-form\n\n\\arg \\min_\\mathbf{w} \\prod_{i=1}^N \\frac{\\exp(\\sum_{p=1}^P w_p * x^i_p)^{y_i}}{1 + \\exp(\\sum_{p=1}^P w_p * x^i_p)}\n
  141. \n
  142. \\mathbf{w}^{t+1} \\leftarrow \\mathbf{w}^{t} - \\eta \\nabla J(\\mathbf{w})\n\n\\frac{\\partial}{\\partial w_p} J(\\mathbf{w}) = \\sum_{i=1}^N \\frac{\\partial}{\\partial w_p} y^i \\sum_p w_p * x^i_p - \\ln (1 + \\exp(\\sum_p w_p * x^i_p))\n
  143. \\mathbf{w}^{t+1} \\leftarrow \\mathbf{w}^{t} - \\eta \\nabla J(\\mathbf{w})\n\n\\frac{\\partial}{\\partial w_p} J(\\mathbf{w}) = \\sum_{i=1}^N \\frac{\\partial}{\\partial w_p} y^i \\sum_p w_p * x^i_p - \\ln (1 + \\exp(\\sum_p w_p * x^i_p))\n
  144. \\mathbf{w}^{t+1} \\leftarrow \\mathbf{w}^{t} - \\eta \\nabla J(\\mathbf{w})\n\n\\frac{\\partial}{\\partial w_p} J(\\mathbf{w}) = \\sum_{i=1}^N \\frac{\\partial}{\\partial w_p} y^i \\sum_p w_p * x^i_p - \\ln (1 + \\exp(\\sum_p w_p * x^i_p))\n
  145. \\mathbf{w}^{t+1} \\leftarrow \\mathbf{w}^{t} - \\eta \\nabla J(\\mathbf{w})\n\n\\frac{\\partial}{\\partial w_p} J(\\mathbf{w}) = \\sum_{i=1}^N \\frac{\\partial}{\\partial w_p} y^i \\sum_p w_p * x^i_p - \\ln (1 + \\exp(\\sum_p w_p * x^i_p))\n
  146. \n
  147. \n
  148. \n
  149. \n
  150. \n
  151. \n
  152. \n
  153. \n
  154. \n
  155. \n
  156. \n
  157. \n
  158. \n
  159. \n
  160. \n
  161. \n
  162. \n
  163. \n
  164. \\arg \\min \\frac{1}{2} || \\mathbf{w} ||_2^2 + C \\sum_{i=1}^n \\zeta_i\n\\arg \\min_{\\mathbf{w},b,\\mathbf{\\zeta}} \\frac{1}{2} || \\mathbf{w} ||_2^2 + C \\sum_{i=1}^n \\zeta_i\n\n\\text{s.t.} \\quad 1 - y_i(\\mathbf{w} \\cdot \\phi(x_i) + b) \\le \\zeta_i, \\zeta_i \\ge 0\n\n\\arg \\min_\\mathbf{\\alpha} \\frac{1}{2} \\mathbf{\\alpha}^T \\mathbf{Q} \\mathbf{\\alpha} - \\mathbf{\\alpha}^T\\mathbf{1}\n\n\n\n
  165. \n
  166. \n
  167. \n
  168. \n
  169. \n
  170. Amdahl&amp;#x2019;s law\ncommunication cost, choose M\n
  171. \n
  172. \n
  173. \n
  174. \n
  175. \n
  176. \n
  177. \n
  178. \n
  179. \n
  180. \n
  181. \n
  182. \n
  183. \n
  184. \n
  185. \n
  186. \n
  187. \n
  188. \n
  189. \n
  190. \n
  191. \n
  192. \n
  193. \n
  194. \n
  195. \n
  196. \n
  197. \n
  198. \n
  199. \n
  200. \n
  201. \n
  202. \n
  203. \n
  204. \n
  205. \n
  206. \n
  207. \n
  208. \n
  209. \n
  210. \n
  211. \n
  212. \n
  213. \n
  214. \n
  215. \n
  216. \n
  217. \n
  218. \n
  219. \n
  220. \n
  221. \n
  222. \n
  223. \n
  224. \n
  225. \n
  226. Sub-sampling provides inferior performance\nParameter mixture improves, but not as good as all data\nIterative parameter mixture achieves as good as all data\nDistributed algorithms return better classifiers quicker\n\n
  227. \n
  228. billion of instances, millions of features\nwithin reasonable resources\n
  229. \n
  230. guarantee, state-of-the-art\n
  231. easy to use - adoption setup\neasy to maintain - reliable, production, work with batch systems\n
  232. \n
  233. \n
  234. new features every day\nfeedback change\n
  235. iterative, fast retrieval for data and model / parameters\n
  236. fault-tolerant pieces: MapReduce (scalable), multi-cores, GFS for data\n
  237. \n
  238. \n
  239. \n
  240. \n
  241. \n
  242. \n
  243. \n
  244. \n
  245. \n
  246. \n
  247. \n
  248. \n
  249. \n
  250. \n
  251. \n
  252. \n
  253. \n