SlideShare une entreprise Scribd logo
1  sur  50
MACHINE LEARNING

             Hidden Markov Models
                         VU H. Pham
                     phvu@fit.hcmus.edu.vn


                 Department of Computer Science

                      Dececmber 6th, 2010




08/12/2010             Hidden Markov Models       1
Contents
• Introduction

• Markov Chain

• Hidden Markov Models




 08/12/2010       Hidden Markov Models   2
Introduction
• Markov processes are first proposed by
   Russian mathematician Andrei Markov
    – He used these processes to investigate
        Pushkin’s poem.
• Nowaday, Markov property and HMMs are
   widely used in many domains:
    – Natural Language Processing
    – Speech Recognition
    – Bioinformatics
    – Image/video processing
    – ...

  08/12/2010                    Hidden Markov Models   3
Markov Chain
• Has N states, called s1, s2, ..., sN
• There are discrete timesteps, t=0,
                                                                             s2
  t=1,...
                                                       s1
• On the t’th timestep the system is in
  exactly one of the available states.
                                                                              s3
  Call it qt ∈ {s1 , s2 ,..., sN }

                                                               Current state



                                                            N=3
                                                            t=0
                                                            q t = q 0 = s3
  08/12/2010                    Hidden Markov Models                           4
Markov Chain
• Has N states, called s1, s2, ..., sN
• There are discrete timesteps, t=0,
                                                                             s2
  t=1,...
                                                       s1
• On the t’th timestep the system is in                      Current state

  exactly one of the available states.
                                                                              s3
  Call it qt ∈ {s1 , s2 ,..., sN }
• Between each timestep, the next
  state is chosen randomly.

                                                            N=3
                                                            t=1
                                                            q t = q 1 = s2
  08/12/2010                    Hidden Markov Models                          5
p ( s1 ˚ s2 ) = 1 2
Markov Chain                                                                  p ( s2 ˚ s2 ) = 1 2
                                                                              p ( s3 ˚ s2 ) = 0
• Has N states, called s1, s2, ..., sN
• There are discrete timesteps, t=0,
                                                                                          s2
  t=1,...
                                                                  s1
• On the t’th timestep the system is in
  exactly one of the available states.
                                        p ( qt +1 = s1 ˚ qt = s1 ) = 0                     s3
  Call it qt ∈ {s1 , s2 ,..., sN }
                                                     p ( s2 ˚ s1 ) = 0
• Between each timestep, the next                    p ( s3 ˚ s1 ) = 1         p ( s1 ˚ s3 ) = 1 3
  state is chosen randomly.                                                    p ( s2 ˚ s3 ) = 2 3
                                                                               p ( s3 ˚ s3 ) = 0
• The current state determines the
  probability for the next state.                                        N=3
                                                                         t=1
                                                                         q t = q 1 = s2
  08/12/2010                      Hidden Markov Models                                      6
p ( s1 ˚ s2 ) = 1 2
Markov Chain                                                                    p ( s2 ˚ s2 ) = 1 2
                                                                                p ( s3 ˚ s2 ) = 0
• Has N states, called s1, s2, ..., sN                                      1/2
• There are discrete timesteps, t=0,
                                                                                           s2
                                                                          1/2
  t=1,...
                                                                  s1                           2/3
• On the t’th timestep the system is in                               1/3
                                                             1
  exactly one of the available states.
                                        p ( qt +1 = s1 ˚ qt = s1 ) = 0                       s3
  Call it qt ∈ {s1 , s2 ,..., sN }
                                                     p ( s2 ˚ s1 ) = 0
• Between each timestep, the next                    p ( s3 ˚ s1 ) = 1            p ( s1 ˚ s3 ) = 1 3
  state is chosen randomly.                                                       p ( s2 ˚ s3 ) = 2 3
                                                                                  p ( s3 ˚ s3 ) = 0
• The current state determines the
  probability for the next state.                                        N=3
    – Often notated with arcs between states
                                                                         t=1
                                                                         q t = q 1 = s2
  08/12/2010                      Hidden Markov Models                                         7
p ( s1 ˚ s2 ) = 1 2
Markov Property                                                                            p ( s2 ˚ s2 ) = 1 2
                                                                                           p ( s3 ˚ s2 ) = 0
• qt+1 is conditionally independent of                                               1/2
  {qt-1, qt-2,..., q0} given qt.                                                                     s2
                                                                                   1/2
• In other words:
                                                                       s1                                2/3
   p ( qt +1 ˚ qt , qt −1 ,..., q0 )                                                     1/3
                                                                              1
   = p ( qt +1 ˚ qt )                                     p ( qt +1 = s1 ˚ qt = s1 ) = 0               s3
                                                          p ( s2 ˚ s1 ) = 0
                                                          p ( s3 ˚ s1 ) = 1                 p ( s1 ˚ s3 ) = 1 3
                                                                                            p ( s2 ˚ s3 ) = 2 3
                                                                                            p ( s3 ˚ s3 ) = 0

                                                                                  N=3
                                                                                  t=1
                                                                                  q t = q 1 = s2
 08/12/2010                            Hidden Markov Models                                              8
p ( s1 ˚ s2 ) = 1 2
Markov Property                                                                            p ( s2 ˚ s2 ) = 1 2
                                                                                           p ( s3 ˚ s2 ) = 0
• qt+1 is conditionally independent of                                               1/2
  {qt-1, qt-2,..., q0} given qt.                                                                     s2
                                                                                   1/2
• In other words:
                                                                       s1                                2/3
   p ( qt +1 ˚ qt , qt −1 ,..., q0 )                                                     1/3
                                                                              1
    = p ( qt +1 ˚ qt )                                    p ( qt +1 = s1 ˚ qt = s1 ) = 0               s3
  The state at timestep t+1 depends                       p ( s2 ˚ s1 ) = 0
                                                          p ( s3 ˚ s1 ) = 1                 p ( s1 ˚ s3 ) = 1 3
  only on the state at timestep t
                                                                                            p ( s2 ˚ s3 ) = 2 3
                                                                                            p ( s3 ˚ s3 ) = 0

                                                                                  N=3
                                                                                  t=1
                                                                                  q t = q 1 = s2
  08/12/2010                           Hidden Markov Models                                              9
p ( s1 ˚ s2 ) = 1 2
Markov Property                                                                            p ( s2 ˚ s2 ) = 1 2
                                                                                           p ( s3 ˚ s2 ) = 0
• qt+1 is conditionally independent of                                               1/2
  {qt-1, qt-2,..., q0} given qt.                                                                     s2
                                                                                   1/2
• In other words:
                                                                       s1                                2/3
   p ( qt +1 ˚ qt , qt −1 ,..., q0 )                                                     1/3
                                                                              1
    = p ( qt +1 ˚ qt )                                    p ( qt +1 = s1 ˚ qt = s1 ) = 0               s3
  The state at timestep t+1 depends                       p ( s2 ˚ s1 ) = 0
                                                          p ( s3 ˚ s1 ) = 1                 p ( s1 ˚ s3 ) = 1 3
  only on the state at timestep t
                                                                                            p ( s2 ˚ s3 ) = 2 3
• How to represent the joint                                                                p ( s3 ˚ s3 ) = 0
  distribution of (q0, q1, q2...) using
                                                                                  N=3
  graphical models?                                                               t=1
                                                                                  q t = q 1 = s2
  08/12/2010                           Hidden Markov Models                                             10
p ( s1 ˚ s2 ) = 1 2
Markov Property                                                                          p ( s2 ˚ s2 ) = 1 2

                                                                                    q0p ( s    3   ˚ s2 ) = 0
• qt+1 is conditionally independent of                                               1/2
  {qt-1, qt-2,..., q0} given qt.                                                                         s2
                                                                                   1/2
• In other words:                                                                   q1
                                                                       s1                                   1/3
   p ( qt +1 ˚ qt , qt −1 ,..., q0 )                                                     1/3
                                                                              1
    = p ( qt +1 ˚ qt )                                    p ( qt +1 = s1 ˚ qt = s1 ) = 0
                                                                                     q2                    s3
  The state at timestep t+1 depends                       p ( s2 ˚ s1 ) = 0
                                                          p ( s3 ˚ s1 ) = 1                p ( s1 ˚ s3 ) = 1 3
  only on the state at timestep t
• How to represent the joint                                                        q3 p ( s       2   ˚ s3 ) = 2 3
                                                                                           p ( s3 ˚ s3 ) = 0
  distribution of (q0, q1, q2...) using
                                                                                  N=3
  graphical models?                                                               t=1
                                                                                  q t = q 1 = s2
  08/12/2010                           Hidden Markov Models                                                11
Markov chain
• So, the chain of {qt} is called Markov chain
           q0      q1          q2                   q3




  08/12/2010                 Hidden Markov Models        12
Markov chain
• So, the chain of {qt} is called Markov chain
           q0           q1             q2                   q3
• Each qt takes value from the finite state-space {s1, s2, s3}
• Each qt is observed at a discrete timestep t
• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )




  08/12/2010                         Hidden Markov Models                            13
Markov chain
• So, the chain of {qt} is called Markov chain
           q0                  q1                q2                   q3
• Each qt takes value from the finite state-space {s1, s2, s3}
• Each qt is observed at a discrete timestep t
• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )
• The transition from qt to qt+1 is calculated from the transition
  probability matrix
             1/2
                                                   s1     s2     s3
                                    s2                                 s1        0       0        1
                   1/2
      s1                                                               s2        ½       ½        0
                                         2/3
               1
                         1/3                                           s3        1/3     2/3      0

  08/12/2010
                                     s3        Hidden Markov Models
                                                                            Transition probabilities
                                                                                                       14
Markov chain
• So, the chain of {qt} is called Markov chain
           q0                  q1                q2                   q3
• Each qt takes value from the finite state-space {s1, s2, s3}
• Each qt is observed at a discrete timestep t
• {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt )
• The transition from qt to qt+1 is calculated from the transition
  probability matrix
             1/2
                                                   s1     s2     s3
                                    s2                                 s1        0       0        1
                   1/2
      s1                                                               s2        ½       ½        0
                                         2/3
               1
                         1/3                                           s3        1/3     2/3      0

  08/12/2010
                                     s3        Hidden Markov Models
                                                                            Transition probabilities
                                                                                                       15
Markov Chain – Important property
• In a Markov chain, the joint distribution is
                                                 m
              p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 )
                                                j =1




 08/12/2010                         Hidden Markov Models               16
Markov Chain – Important property
• In a Markov chain, the joint distribution is
                                                         m
                      p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 )
                                                        j =1



• Why?                                         m
              p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 , previous states )
                                               j =1
                                               m
                                  = p ( q0 ) ∏ p ( q j | q j −1 )
                                               j =1




                Due to the Markov property


 08/12/2010                                 Hidden Markov Models                         17
Markov Chain: e.g.
• The state-space of weather:

              rain            wind



                     cloud




 08/12/2010                  Hidden Markov Models   18
Markov Chain: e.g.
• The state-space of weather:
                           1/2                                        Rain   Cloud   Wind
              rain                      wind
                                                              Rain    ½      0       ½
                                 2/3                          Cloud   1/3    0       2/3
 1/2                 1/3                   1
                           cloud                              Wind    0      1       0




 08/12/2010                            Hidden Markov Models                                19
Markov Chain: e.g.
• The state-space of weather:
                           1/2                                        Rain   Cloud   Wind
              rain                      wind
                                                              Rain    ½      0       ½
                                 2/3                          Cloud   1/3    0       2/3
 1/2                 1/3                   1
                           cloud                              Wind    0      1       0


• Markov assumption: weather in the t+1’th day is
  depends only on the t’th day.




 08/12/2010                            Hidden Markov Models                                20
Markov Chain: e.g.
• The state-space of weather:
                                    1/2                                        Rain    Cloud   Wind
                  rain                           wind
                                                                       Rain    ½       0       ½
                                          2/3                          Cloud   1/3     0       2/3
 1/2                     1/3                          1
                                      cloud                            Wind    0       1       0


• Markov assumption: weather in the t+1’th day is
  depends only on the t’th day.
• We have observed the weather in a week:
        rain                   wind              rain                   rain          cloud

Day:          0                 1                 2                      3              4
 08/12/2010                                     Hidden Markov Models                                 21
Markov Chain: e.g.
• The state-space of weather:
                                    1/2                                        Rain    Cloud    Wind
                  rain                           wind
                                                                       Rain    ½       0        ½
                                          2/3                          Cloud   1/3     0        2/3
 1/2                     1/3                          1
                                      cloud                            Wind    0       1        0


• Markov assumption: weather in the t+1’th day is
  depends only on the t’th day.
• We have observed the weather in a week:                                                   Markov Chain

        rain                   wind              rain                   rain          cloud

Day:          0                 1                 2                      3              4
 08/12/2010                                     Hidden Markov Models                                  22
Contents
• Introduction

• Markov Chain

• Hidden Markov Models




 08/12/2010       Hidden Markov Models   23
Modeling pairs of sequences
• In many applications, we have to model pair of sequences
• Examples:
    – POS tagging in Natural Language Processing (assign each word in a
        sentence to Noun, Adj, Verb...)
    – Speech recognition (map acoustic sequences to sequences of words)
    – Computational biology (recover gene boundaries in DNA sequences)
    – Video tracking (estimate the underlying model states from the observation
        sequences)
    – And many others...




  08/12/2010                       Hidden Markov Models                     24
Probabilistic models for sequence pairs
• We have two sequences of random variables:
   X1, X2, ..., Xm and S1, S2, ..., Sm

• Intuitively, in a pratical system, each Xi corresponds to an observation
   and each Si corresponds to a state that generated the observation.

• Let each Si be in {1, 2, ..., k} and each Xi be in {1, 2, ..., o}

• How do we model the joint distribution:

               p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm )




  08/12/2010                           Hidden Markov Models              25
Hidden Markov Models (HMMs)
• In HMMs, we assume that
              p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., Sm = sm )
                               m                                 m
              = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) ∏ p ( X j = x j ˚ S j = s j )
                               j =2                              j =1




• This is often called Independence assumptions in
  HMMs

• We are gonna prove it in the next slides

 08/12/2010                               Hidden Markov Models                                    26
Independence Assumptions in HMMs [1]
                                 p ( ABC ) = p ( A | BC ) p ( BC ) = p ( A | BC ) p ( B ˚ C ) p ( C )
• By the chain rule, the following equality is exact:
          p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm )
         = p ( S1 = s1 ,..., S m = sm ) ×
          p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm )

• Assumption 1: the state sequence forms a Markov chain
                                                           m
         p ( S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 )
                                                           j =2




  08/12/2010                            Hidden Markov Models                                 27
Independence Assumptions in HMMs [2]
• By the chain rule, the following equality is exact:
               p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm )
                  m
               = ∏ p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 )
                  j =1

• Assumption 2: each observation depends only on the underlying
   state
                p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 )
                = p( X j = xj ˚ S j = sj )
• These two assumptions are often called independence
   assumptions in HMMs

  08/12/2010                              Hidden Markov Models                                28
The Model form for HMMs
• The model takes the following form:
                                                            m                   m
              p ( x1 ,.., xm , s1 ,..., sm ;θ ) = π ( s1 ) ∏ t ( s j ˚ s j −1 ) ∏ e ( x j ˚ s j )
                                                            j =2               j =1



• Parameters in the model:
   – Initial probabilities π ( s ) for s ∈ {1, 2,..., k }

   – Transition probabilities t ( s ˚ s′ ) for s, s ' ∈ {1, 2,..., k }

   – Emission probabilities e ( x ˚ s ) for s ∈ {1, 2,..., k }
         and x ∈ {1, 2,.., o}
 08/12/2010                                  Hidden Markov Models                                   29
6 components of HMMs
                                                                         start
• Discrete timesteps: 1, 2, ...
• Finite state space: {si}                                    π1              π2           π3
• Events {xi}                                                                               t31
                                               t11
                                                                   t12             t23
                                   π
• Vector of initial probabilities {πi}                   s1               s2                       s3
                                                                   t21               t32
  πi = p(q0 = si)
• Matrix of transition probabilities                               e13
                                                     e11                             e23           e33
                                                               e31
  T = {tij} = { p(qt+1=sj|qt=si) }                                        e22
• Matrix of emission probabilities                    x1                 x2                  x3
  E = {eij} = { p(ot=xj|qt=si) }


 The observations at continuous timesteps form an observation sequence
 {o1, o2, ..., ot}, where oi ∈ {x1, x2, ..., xo}

  08/12/2010                      Hidden Markov Models                                            30
6 components of HMMs
                                                                    start
• Given a specific HMM and an
  observation sequence, the                              π1              π2           π3
  corresponding sequence of states                                                     t31
                                          t11
  is generally not deterministic                              t12             t23
• Example:                                          s1        t21
                                                                     s2         t32
                                                                                              s3
  Given the observation sequence:                             e13
                                                e11                             e23           e33
  {x1, x3, x3, x2}                                        e31
                                                                     e22
  The corresponding states can be
  any of following sequences:
                                                 x1                 x2                  x3
  {s1, s1, s2, s2}
  {s1, s2, s3, s2}
  {s1, s1, s1, s2}
  ...
 08/12/2010                  Hidden Markov Models                                            31
Here’s an HMM
                                                                               0.2
                       0.5
                                              0.5                   0.6
                                  s1          0.4
                                                          s2         0.8
                                                                                     s3

                         0.3                  0.7                        0.9         0.8
                                        0.2               0.1

                              x1                         x2                    x3


             T    s1         s2        s3           E         x1    x2     x3              π   s1    s2    s3
             s1   0.5        0.5       0            s1        0.3   0      0.7                 0.3   0.3   0.4
             s2   0.4        0         0.6          s2        0     0.1    0.9
             s3   0.2        0.8       0            s3        0.2   0      0.8



08/12/2010                                      Hidden Markov Models                                             32
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                 0.3 - 0.3 - 0.4
π      s1      s2         s3                                                   randomply choice
                                                                               between S1, S2, S3
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1              o1
s2     0.4     0          0.6         s2    0      0.1        0.9         q2              o2
s3     0.2     0.8        0           s3    0.2    0          0.8         q3              o3

 08/12/2010                                       Hidden Markov Models                              33
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                    0.2 - 0.8
π      s1      s2         s3                                                   choice between X1
                                                                                     and X3
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3     o1
s2     0.4     0          0.6         s2    0      0.1        0.9         q2             o2
s3     0.2     0.8        0           s3    0.2    0          0.8         q3             o3

 08/12/2010                                       Hidden Markov Models                             34
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                 Go to S2 with
π      s1      s2         s3                                                   probability 0.8 or
                                                                               S1 with prob. 0.2
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3      o1        X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2              o2
s3     0.2     0.8        0           s3    0.2    0          0.8         q3              o3

 08/12/2010                                       Hidden Markov Models                                   35
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                    0.3 - 0.7
π      s1      s2         s3                                                   choice between X1
                                                                                     and X3
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3     o1        X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2      S1     o2
s3     0.2     0.8        0           s3    0.2    0          0.8         q3             o3

 08/12/2010                                       Hidden Markov Models                                  36
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                 Go to S2 with
π      s1      s2         s3                                                   probability 0.5 or
                                                                               S1 with prob. 0.5
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3      o1        X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2      S1      o2        X1
s3     0.2     0.8        0           s3    0.2    0          0.8         q3              o3

 08/12/2010                                       Hidden Markov Models                                   37
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                                    0.3 - 0.7
π      s1      s2         s3                                                   choice between X1
                                                                                     and X3
       0.3     0.3        0.4

T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3     o1        X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2      S1     o2        X1
s3     0.2     0.8        0           s3    0.2    0          0.8         q3      S1     o3

 08/12/2010                                       Hidden Markov Models                                  38
Here’s a HMM
                                                  0.2
0.5                                                                  • Start randomly in state 1, 2
                    0.5                    0.6
       s1                       s2                      s3             or 3.
                    0.4                     0.8
                                                                     • Choose a output at each
     0.3            0.7                     0.9                        state in random.
              0.2                                       0.8
                                0.1                                  • Let’s generate a sequence
                                                                       of observations:
      x1                       x2                 x3
                                                                               We got a sequence
                                                                                 of states and
π      s1      s2         s3                                                    corresponding
       0.3     0.3        0.4                                                   observations!
T      s1      s2         s3          E     x1     x2         x3
s1     0.5     0.5        0           s1    0.3    0          0.7         q1      S3     o1    X3
s2     0.4     0          0.6         s2    0      0.1        0.9         q2      S1     o2    X1
s3     0.2     0.8        0           s3    0.2    0          0.8         q3      S1     o3    X3

 08/12/2010                                       Hidden Markov Models                              39
Three famous HMM tasks
• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:
• Probability of an observation sequence (state estimation)
    – Given: Φ, observation O = {o1, o2,..., ot}
    – Goal: p(O|Φ), or equivalently p(st = Si|O)
• Most likely expaination (inference)
    – Given: Φ, the observation O = {o1, o2,..., ot}
    – Goal: Q* = argmaxQ p(Q|O)
• Learning the HMM
    – Given: observation O = {o1, o2,..., ot} and corresponding state sequence
    – Goal: estimate parameters of the HMM Φ = (T, E, π)


  08/12/2010                       Hidden Markov Models                          40
Three famous HMM tasks
• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:
• Probability of an observation sequence (state estimation)
    – Given: Φ, observation O = {o1, o2,..., ot}
    – Goal: p(O|Φ), or equivalently p(st = Si|O)          Calculating the probability of

• Most likely expaination (inference)                     observing the sequence O over
                                                          all of possible sequences.
    – Given: Φ, the observation O = {o1, o2,..., ot}
    – Goal: Q* = argmaxQ p(Q|O)
• Learning the HMM
    – Given: observation O = {o1, o2,..., ot} and corresponding state sequence
    – Goal: estimate parameters of the HMM Φ = (T, E, π)


  08/12/2010                       Hidden Markov Models                                41
Three famous HMM tasks
• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:
• Probability of an observation sequence (state estimation)
    – Given: Φ, observation O = {o1, o2,..., ot}
    – Goal: p(O|Φ), or equivalently p(st = Si|O)          Calculating the best

• Most likely expaination (inference)                     corresponding state sequence,
                                                          given an observation
    – Given: Φ, the observation O = {o1, o2,..., ot}
                                                          sequence.
    – Goal: Q* = argmaxQ p(Q|O)
• Learning the HMM
    – Given: observation O = {o1, o2,..., ot} and corresponding state sequence
    – Goal: estimate parameters of the HMM Φ = (T, E, π)


  08/12/2010                       Hidden Markov Models                             42
Three famous HMM tasks
• Given a HMM Φ = (T, E, π). Three famous HMM tasks are:
• Probability of an observation sequence (state estimation)
    – Given: Φ, observation O = {o1, o2,..., ot}
                                                          Given an (or a set of)
    – Goal: p(O|Φ), or equivalently p(st = Si|O)          observation sequence and
• Most likely expaination (inference)                     corresponding state sequence,
    – Given: Φ, the observation O = {o1, o2,..., ot}      estimate the Transition matrix,

    – Goal: Q* = argmaxQ p(Q|O)                           Emission matrix and initial
                                                          probabilities of the HMM
• Learning the HMM
    – Given: observation O = {o1, o2,..., ot} and corresponding state sequence
    – Goal: estimate parameters of the HMM Φ = (T, E, π)


  08/12/2010                       Hidden Markov Models                                 43
Three famous HMM tasks
  Problem                             Algorithm           Complexity

  State estimation                    Forward-Backward    O(TN2)
  Calculating: p(O|Φ)

  Inference                           Viterbi decoding    O(TN2)
  Calculating: Q*= argmaxQp(Q|O)

  Learning                            Baum-Welch (EM)     O(TN2)
  Calculating: Φ* = argmaxΦp(O|Φ)


   T: number of timesteps
   N: number of states

08/12/2010                         Hidden Markov Models                44
The Forward-Backward Algorithm
• Given: Φ = (T, E, π), observation O = {o1, o2,..., ot}

• Goal: What is p(o1o2...ot)

• We can do this in a slow, stupid way
   – As shown in the next slide...




 08/12/2010              Hidden Markov Models         45
Here’s a HMM
0.5                                     0.2
                    0.5          0.6                       • What is p(O) = p(o1o2o3)
       s1           0.4
                           s2     0.8
                                              s3             = p(o1=X3 ∧ o2=X1 ∧ o3=X3)?

  0.3               0.7           0.9                      • Slow, stupid way:
              0.2                             0.8
                           0.1
                                                                     p (O ) =          ∑              p ( OQ )
      x1                  x2            x3                                      Q∈paths of length 3

                                                                           =           ∑              p (O | Q ) p (Q )
                                                                                Q∈paths of length 3
                                                                                Q∈



                                                           • How to compute p(Q) for an
                                                             arbitrary path Q?
                                                           • How to compute p(O|Q) for an
                                                             arbitrary path Q?



      08/12/2010                              Hidden Markov Models                                               46
Here’s a HMM
0.5                                         0.2
                    0.5              0.6                       • What is p(O) = p(o1o2o3)
       s1           0.4
                           s2         0.8
                                                  s3             = p(o1=X3 ∧ o2=X1 ∧ o3=X3)?

  0.3               0.7               0.9                      • Slow, stupid way:
              0.2                                 0.8
                               0.1
                                                                         p (O ) =          ∑              p ( OQ )
      x1                  x2                x3                                      Q∈paths of length 3


  π         s1      s2    s3                                                   =           ∑              p (O | Q ) p (Q )
                                                                                    Q∈paths of length 3
                                                                                    Q∈
            0.3     0.3   0.4

 p(Q) = p(q1q2q3)                                              • How to compute p(Q) for an
 = p(q1)p(q2|q1)p(q3|q2,q1) (chain)                              arbitrary path Q?
 = p(q1)p(q2|q1)p(q3|q2) (why?)                                • How to compute p(O|Q) for an
                                                                 arbitrary path Q?
 Example in the case Q=S3S1S1
 P(Q) = 0.4 * 0.2 * 0.5 = 0.04
      08/12/2010                                  Hidden Markov Models                                               47
Here’s a HMM
0.5                                         0.2
                    0.5              0.6                       • What is p(O) = p(o1o2o3)
       s1           0.4
                           s2         0.8
                                                  s3             = p(o1=X3 ∧ o2=X1 ∧ o3=X3)?

  0.3               0.7               0.9                      • Slow, stupid way:
              0.2                                 0.8
                               0.1
                                                                         p (O ) =          ∑              p ( OQ )
      x1                  x2                x3                                      Q∈paths of length 3


  π         s1      s2    s3                                                   =           ∑              p (O | Q ) p (Q )
                                                                                    Q∈paths of length 3
                                                                                    Q∈
            0.3     0.3   0.4

 p(O|Q) = p(o1o2o3|q1q2q3)                                     • How to compute p(Q) for an
 = p(o1|q1)p(o2|q1)p(o3|q3) (why?)                               arbitrary path Q?
                                                               • How to compute p(O|Q) for an
 Example in the case Q=S3S1S1                                    arbitrary path Q?
 P(O|Q) = p(X3|S3)p(X1|S1) p(X3|S1)
 =0.8 * 0.3 * 0.7 = 0.168
      08/12/2010                                  Hidden Markov Models                                               48
Here’s a HMM
0.5                                         0.2
                    0.5              0.6                       • What is p(O) = p(o1o2o3)
       s1           0.4
                           s2         0.8
                                                  s3             = p(o1=X3 ∧ o2=X1 ∧ o3=X3)?

  0.3               0.7               0.9                      • Slow, stupid way:
              0.2                                 0.8
                               0.1
                                                                         p (O ) =          ∑              p ( OQ )
      x1                  x2                x3                                      Q∈paths of length 3


  π         s1      s2    s3                                                   =           ∑              p (O | Q ) p (Q )
                                                                                    Q∈paths of length 3
                                                                                    Q∈
            0.3     0.3   0.4

 p(O|Q) = p(o1o2o3|q1q2q3)                                     • How to compute p(Q) for an
              p(O) needs 27 p(Q)                                 arbitrary path Q?
 = p(o1|q1)p(o2|q1)p(o3|q3) (why?)
                     computations and 27
                                                               • How to compute p(O|Q) for an
                     p(O|Q) computations.
 Example in the case Q=S3S1S1                                    arbitrary path Q?
 P(O|Q) = p(X3|S3)p(Xsequence3has )
           What if the
                       1|S1) p(X |S1
                20 observations?
 =0.8 * 0.3 * 0.7 = 0.168                                    So let’s be smarter...
      08/12/2010                                  Hidden Markov Models                                               49
The Forward algorithm
• Given observation o1o2...oT

• Define:

  αt(i) = p(o1o2...ot ∧ qt = Si | Φ)               where 1 ≤ t ≤ T

  αt(i) = probability that, in a random trial:
   – We’d have seen the first t observations

   – We’d have ended up in Si as the t’th state visited.

• In our example, what is α2(3) ?

 08/12/2010                 Hidden Markov Models                     50

Contenu connexe

Tendances

Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Probabilistic Models of Time Series and Sequences
Probabilistic Models of Time Series and SequencesProbabilistic Models of Time Series and Sequences
Probabilistic Models of Time Series and SequencesZitao Liu
 
Pattern Recognition
Pattern RecognitionPattern Recognition
Pattern RecognitionMaaz Hasan
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
PRML 13.2.2: The Forward-Backward Algorithm
PRML 13.2.2: The Forward-Backward AlgorithmPRML 13.2.2: The Forward-Backward Algorithm
PRML 13.2.2: The Forward-Backward AlgorithmShinichi Tamura
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clusteringMegha Sharma
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsDerek Kane
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clusteringKrish_ver2
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised LearningLukas Tencer
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningYan Xu
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Rohit Kumar
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithmswapnac12
 

Tendances (20)

Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Probabilistic Models of Time Series and Sequences
Probabilistic Models of Time Series and SequencesProbabilistic Models of Time Series and Sequences
Probabilistic Models of Time Series and Sequences
 
Pattern Recognition
Pattern RecognitionPattern Recognition
Pattern Recognition
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Structured Knowledge Representation
Structured Knowledge RepresentationStructured Knowledge Representation
Structured Knowledge Representation
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
PRML 13.2.2: The Forward-Backward Algorithm
PRML 13.2.2: The Forward-Backward AlgorithmPRML 13.2.2: The Forward-Backward Algorithm
PRML 13.2.2: The Forward-Backward Algorithm
 
Clustering, k-means clustering
Clustering, k-means clusteringClustering, k-means clustering
Clustering, k-means clustering
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 
Activation function
Activation functionActivation function
Activation function
 
Introduction to pattern recognition
Introduction to pattern recognitionIntroduction to pattern recognition
Introduction to pattern recognition
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.Pattern recognition and Machine Learning.
Pattern recognition and Machine Learning.
 
Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
 
GMM
GMMGMM
GMM
 
Lecture 9 Perceptron
Lecture 9 PerceptronLecture 9 Perceptron
Lecture 9 Perceptron
 

Similaire à Hidden Markov Models

Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Modelsguestfee8698
 
Markov Models
Markov ModelsMarkov Models
Markov ModelsVu Pham
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelShih-Hsiang Lin
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherAmirul Wiramuda
 
Solution of the Difference equations.pptx
Solution of  the Difference equations.pptxSolution of  the Difference equations.pptx
Solution of the Difference equations.pptxvikhramecesec
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFTtaha25
 
95414579 flip-flop
95414579 flip-flop95414579 flip-flop
95414579 flip-flopKyawthu Koko
 
FiniteAutomata (1).ppt
FiniteAutomata (1).pptFiniteAutomata (1).ppt
FiniteAutomata (1).pptssuser47f7f2
 
FiniteAutomata.ppt
FiniteAutomata.pptFiniteAutomata.ppt
FiniteAutomata.pptRohitPaul71
 
Tele4653 l7
Tele4653 l7Tele4653 l7
Tele4653 l7Vin Voro
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examplesankitamakin
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examplesankitamakin
 
Statistical Tools for the Quality Control Laboratory and Validation Studies
Statistical Tools for the Quality Control Laboratory and Validation StudiesStatistical Tools for the Quality Control Laboratory and Validation Studies
Statistical Tools for the Quality Control Laboratory and Validation StudiesInstitute of Validation Technology
 

Similaire à Hidden Markov Models (17)

Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
Markov Models
Markov ModelsMarkov Models
Markov Models
 
Hmm viterbi
Hmm viterbiHmm viterbi
Hmm viterbi
 
hmm.ppt
hmm.ppthmm.ppt
hmm.ppt
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov Model
 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream Cipher
 
Solution of the Difference equations.pptx
Solution of  the Difference equations.pptxSolution of  the Difference equations.pptx
Solution of the Difference equations.pptx
 
Dsp U Lec10 DFT And FFT
Dsp U   Lec10  DFT And  FFTDsp U   Lec10  DFT And  FFT
Dsp U Lec10 DFT And FFT
 
95414579 flip-flop
95414579 flip-flop95414579 flip-flop
95414579 flip-flop
 
FiniteAutomata (1).ppt
FiniteAutomata (1).pptFiniteAutomata (1).ppt
FiniteAutomata (1).ppt
 
FiniteAutomata.ppt
FiniteAutomata.pptFiniteAutomata.ppt
FiniteAutomata.ppt
 
Finite automata
Finite automataFinite automata
Finite automata
 
Tele4653 l7
Tele4653 l7Tele4653 l7
Tele4653 l7
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examples
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examples
 
Statistical controls for qc
Statistical controls for qcStatistical controls for qc
Statistical controls for qc
 
Statistical Tools for the Quality Control Laboratory and Validation Studies
Statistical Tools for the Quality Control Laboratory and Validation StudiesStatistical Tools for the Quality Control Laboratory and Validation Studies
Statistical Tools for the Quality Control Laboratory and Validation Studies
 

Dernier

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Dernier (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Hidden Markov Models

  • 1. MACHINE LEARNING Hidden Markov Models VU H. Pham phvu@fit.hcmus.edu.vn Department of Computer Science Dececmber 6th, 2010 08/12/2010 Hidden Markov Models 1
  • 2. Contents • Introduction • Markov Chain • Hidden Markov Models 08/12/2010 Hidden Markov Models 2
  • 3. Introduction • Markov processes are first proposed by Russian mathematician Andrei Markov – He used these processes to investigate Pushkin’s poem. • Nowaday, Markov property and HMMs are widely used in many domains: – Natural Language Processing – Speech Recognition – Bioinformatics – Image/video processing – ... 08/12/2010 Hidden Markov Models 3
  • 4. Markov Chain • Has N states, called s1, s2, ..., sN • There are discrete timesteps, t=0, s2 t=1,... s1 • On the t’th timestep the system is in exactly one of the available states. s3 Call it qt ∈ {s1 , s2 ,..., sN } Current state N=3 t=0 q t = q 0 = s3 08/12/2010 Hidden Markov Models 4
  • 5. Markov Chain • Has N states, called s1, s2, ..., sN • There are discrete timesteps, t=0, s2 t=1,... s1 • On the t’th timestep the system is in Current state exactly one of the available states. s3 Call it qt ∈ {s1 , s2 ,..., sN } • Between each timestep, the next state is chosen randomly. N=3 t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 5
  • 6. p ( s1 ˚ s2 ) = 1 2 Markov Chain p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • Has N states, called s1, s2, ..., sN • There are discrete timesteps, t=0, s2 t=1,... s1 • On the t’th timestep the system is in exactly one of the available states. p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 Call it qt ∈ {s1 , s2 ,..., sN } p ( s2 ˚ s1 ) = 0 • Between each timestep, the next p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 state is chosen randomly. p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 • The current state determines the probability for the next state. N=3 t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 6
  • 7. p ( s1 ˚ s2 ) = 1 2 Markov Chain p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • Has N states, called s1, s2, ..., sN 1/2 • There are discrete timesteps, t=0, s2 1/2 t=1,... s1 2/3 • On the t’th timestep the system is in 1/3 1 exactly one of the available states. p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 Call it qt ∈ {s1 , s2 ,..., sN } p ( s2 ˚ s1 ) = 0 • Between each timestep, the next p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 state is chosen randomly. p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 • The current state determines the probability for the next state. N=3 – Often notated with arcs between states t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 7
  • 8. p ( s1 ˚ s2 ) = 1 2 Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2 • In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 N=3 t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 8
  • 9. p ( s1 ˚ s2 ) = 1 2 Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2 • In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 N=3 t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 9
  • 10. p ( s1 ˚ s2 ) = 1 2 Markov Property p ( s2 ˚ s2 ) = 1 2 p ( s3 ˚ s2 ) = 0 • qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2 • In other words: s1 2/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t p ( s2 ˚ s3 ) = 2 3 • How to represent the joint p ( s3 ˚ s3 ) = 0 distribution of (q0, q1, q2...) using N=3 graphical models? t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 10
  • 11. p ( s1 ˚ s2 ) = 1 2 Markov Property p ( s2 ˚ s2 ) = 1 2 q0p ( s 3 ˚ s2 ) = 0 • qt+1 is conditionally independent of 1/2 {qt-1, qt-2,..., q0} given qt. s2 1/2 • In other words: q1 s1 1/3 p ( qt +1 ˚ qt , qt −1 ,..., q0 ) 1/3 1 = p ( qt +1 ˚ qt ) p ( qt +1 = s1 ˚ qt = s1 ) = 0 q2 s3 The state at timestep t+1 depends p ( s2 ˚ s1 ) = 0 p ( s3 ˚ s1 ) = 1 p ( s1 ˚ s3 ) = 1 3 only on the state at timestep t • How to represent the joint q3 p ( s 2 ˚ s3 ) = 2 3 p ( s3 ˚ s3 ) = 0 distribution of (q0, q1, q2...) using N=3 graphical models? t=1 q t = q 1 = s2 08/12/2010 Hidden Markov Models 11
  • 12. Markov chain • So, the chain of {qt} is called Markov chain q0 q1 q2 q3 08/12/2010 Hidden Markov Models 12
  • 13. Markov chain • So, the chain of {qt} is called Markov chain q0 q1 q2 q3 • Each qt takes value from the finite state-space {s1, s2, s3} • Each qt is observed at a discrete timestep t • {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt ) 08/12/2010 Hidden Markov Models 13
  • 14. Markov chain • So, the chain of {qt} is called Markov chain q0 q1 q2 q3 • Each qt takes value from the finite state-space {s1, s2, s3} • Each qt is observed at a discrete timestep t • {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt ) • The transition from qt to qt+1 is calculated from the transition probability matrix 1/2 s1 s2 s3 s2 s1 0 0 1 1/2 s1 s2 ½ ½ 0 2/3 1 1/3 s3 1/3 2/3 0 08/12/2010 s3 Hidden Markov Models Transition probabilities 14
  • 15. Markov chain • So, the chain of {qt} is called Markov chain q0 q1 q2 q3 • Each qt takes value from the finite state-space {s1, s2, s3} • Each qt is observed at a discrete timestep t • {qt} sastifies the Markov property: p ( qt +1 ˚ qt , qt −1 ,..., q0 ) = p ( qt +1 ˚ qt ) • The transition from qt to qt+1 is calculated from the transition probability matrix 1/2 s1 s2 s3 s2 s1 0 0 1 1/2 s1 s2 ½ ½ 0 2/3 1 1/3 s3 1/3 2/3 0 08/12/2010 s3 Hidden Markov Models Transition probabilities 15
  • 16. Markov Chain – Important property • In a Markov chain, the joint distribution is m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 08/12/2010 Hidden Markov Models 16
  • 17. Markov Chain – Important property • In a Markov chain, the joint distribution is m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 • Why? m p ( q0 , q1 ,..., qm ) = p ( q0 ) ∏ p ( q j | q j −1 , previous states ) j =1 m = p ( q0 ) ∏ p ( q j | q j −1 ) j =1 Due to the Markov property 08/12/2010 Hidden Markov Models 17
  • 18. Markov Chain: e.g. • The state-space of weather: rain wind cloud 08/12/2010 Hidden Markov Models 18
  • 19. Markov Chain: e.g. • The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 08/12/2010 Hidden Markov Models 19
  • 20. Markov Chain: e.g. • The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 • Markov assumption: weather in the t+1’th day is depends only on the t’th day. 08/12/2010 Hidden Markov Models 20
  • 21. Markov Chain: e.g. • The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 • Markov assumption: weather in the t+1’th day is depends only on the t’th day. • We have observed the weather in a week: rain wind rain rain cloud Day: 0 1 2 3 4 08/12/2010 Hidden Markov Models 21
  • 22. Markov Chain: e.g. • The state-space of weather: 1/2 Rain Cloud Wind rain wind Rain ½ 0 ½ 2/3 Cloud 1/3 0 2/3 1/2 1/3 1 cloud Wind 0 1 0 • Markov assumption: weather in the t+1’th day is depends only on the t’th day. • We have observed the weather in a week: Markov Chain rain wind rain rain cloud Day: 0 1 2 3 4 08/12/2010 Hidden Markov Models 22
  • 23. Contents • Introduction • Markov Chain • Hidden Markov Models 08/12/2010 Hidden Markov Models 23
  • 24. Modeling pairs of sequences • In many applications, we have to model pair of sequences • Examples: – POS tagging in Natural Language Processing (assign each word in a sentence to Noun, Adj, Verb...) – Speech recognition (map acoustic sequences to sequences of words) – Computational biology (recover gene boundaries in DNA sequences) – Video tracking (estimate the underlying model states from the observation sequences) – And many others... 08/12/2010 Hidden Markov Models 24
  • 25. Probabilistic models for sequence pairs • We have two sequences of random variables: X1, X2, ..., Xm and S1, S2, ..., Sm • Intuitively, in a pratical system, each Xi corresponds to an observation and each Si corresponds to a state that generated the observation. • Let each Si be in {1, 2, ..., k} and each Xi be in {1, 2, ..., o} • How do we model the joint distribution: p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm ) 08/12/2010 Hidden Markov Models 25
  • 26. Hidden Markov Models (HMMs) • In HMMs, we assume that p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., Sm = sm ) m m = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) ∏ p ( X j = x j ˚ S j = s j ) j =2 j =1 • This is often called Independence assumptions in HMMs • We are gonna prove it in the next slides 08/12/2010 Hidden Markov Models 26
  • 27. Independence Assumptions in HMMs [1] p ( ABC ) = p ( A | BC ) p ( BC ) = p ( A | BC ) p ( B ˚ C ) p ( C ) • By the chain rule, the following equality is exact: p ( X 1 = x1 ,..., X m = xm , S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ,..., S m = sm ) × p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm ) • Assumption 1: the state sequence forms a Markov chain m p ( S1 = s1 ,..., S m = sm ) = p ( S1 = s1 ) ∏ p ( S j = s j ˚ S j −1 = s j −1 ) j =2 08/12/2010 Hidden Markov Models 27
  • 28. Independence Assumptions in HMMs [2] • By the chain rule, the following equality is exact: p ( X 1 = x1 ,..., X m = xm ˚ S1 = s1 ,..., S m = sm ) m = ∏ p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 ) j =1 • Assumption 2: each observation depends only on the underlying state p ( X j = x j ˚ S1 = s1 ,..., Sm = sm , X 1 = x1 ,..., X j −1 = x j −1 ) = p( X j = xj ˚ S j = sj ) • These two assumptions are often called independence assumptions in HMMs 08/12/2010 Hidden Markov Models 28
  • 29. The Model form for HMMs • The model takes the following form: m m p ( x1 ,.., xm , s1 ,..., sm ;θ ) = π ( s1 ) ∏ t ( s j ˚ s j −1 ) ∏ e ( x j ˚ s j ) j =2 j =1 • Parameters in the model: – Initial probabilities π ( s ) for s ∈ {1, 2,..., k } – Transition probabilities t ( s ˚ s′ ) for s, s ' ∈ {1, 2,..., k } – Emission probabilities e ( x ˚ s ) for s ∈ {1, 2,..., k } and x ∈ {1, 2,.., o} 08/12/2010 Hidden Markov Models 29
  • 30. 6 components of HMMs start • Discrete timesteps: 1, 2, ... • Finite state space: {si} π1 π2 π3 • Events {xi} t31 t11 t12 t23 π • Vector of initial probabilities {πi} s1 s2 s3 t21 t32 πi = p(q0 = si) • Matrix of transition probabilities e13 e11 e23 e33 e31 T = {tij} = { p(qt+1=sj|qt=si) } e22 • Matrix of emission probabilities x1 x2 x3 E = {eij} = { p(ot=xj|qt=si) } The observations at continuous timesteps form an observation sequence {o1, o2, ..., ot}, where oi ∈ {x1, x2, ..., xo} 08/12/2010 Hidden Markov Models 30
  • 31. 6 components of HMMs start • Given a specific HMM and an observation sequence, the π1 π2 π3 corresponding sequence of states t31 t11 is generally not deterministic t12 t23 • Example: s1 t21 s2 t32 s3 Given the observation sequence: e13 e11 e23 e33 {x1, x3, x3, x2} e31 e22 The corresponding states can be any of following sequences: x1 x2 x3 {s1, s1, s2, s2} {s1, s2, s3, s2} {s1, s1, s1, s2} ... 08/12/2010 Hidden Markov Models 31
  • 32. Here’s an HMM 0.2 0.5 0.5 0.6 s1 0.4 s2 0.8 s3 0.3 0.7 0.9 0.8 0.2 0.1 x1 x2 x3 T s1 s2 s3 E x1 x2 x3 π s1 s2 s3 s1 0.5 0.5 0 s1 0.3 0 0.7 0.3 0.3 0.4 s2 0.4 0 0.6 s2 0 0.1 0.9 s3 0.2 0.8 0 s3 0.2 0 0.8 08/12/2010 Hidden Markov Models 32
  • 33. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 0.3 - 0.3 - 0.4 π s1 s2 s3 randomply choice between S1, S2, S3 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 o1 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 o2 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 33
  • 34. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 0.2 - 0.8 π s1 s2 s3 choice between X1 and X3 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 o2 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 34
  • 35. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 Go to S2 with π s1 s2 s3 probability 0.8 or S1 with prob. 0.2 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 o2 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 35
  • 36. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 0.3 - 0.7 π s1 s2 s3 choice between X1 and X3 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 S1 o2 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 36
  • 37. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 Go to S2 with π s1 s2 s3 probability 0.5 or S1 with prob. 0.5 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 S1 o2 X1 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 o3 08/12/2010 Hidden Markov Models 37
  • 38. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 0.3 - 0.7 π s1 s2 s3 choice between X1 and X3 0.3 0.3 0.4 T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 S1 o2 X1 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 S1 o3 08/12/2010 Hidden Markov Models 38
  • 39. Here’s a HMM 0.2 0.5 • Start randomly in state 1, 2 0.5 0.6 s1 s2 s3 or 3. 0.4 0.8 • Choose a output at each 0.3 0.7 0.9 state in random. 0.2 0.8 0.1 • Let’s generate a sequence of observations: x1 x2 x3 We got a sequence of states and π s1 s2 s3 corresponding 0.3 0.3 0.4 observations! T s1 s2 s3 E x1 x2 x3 s1 0.5 0.5 0 s1 0.3 0 0.7 q1 S3 o1 X3 s2 0.4 0 0.6 s2 0 0.1 0.9 q2 S1 o2 X1 s3 0.2 0.8 0 s3 0.2 0 0.8 q3 S1 o3 X3 08/12/2010 Hidden Markov Models 39
  • 40. Three famous HMM tasks • Given a HMM Φ = (T, E, π). Three famous HMM tasks are: • Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) • Most likely expaination (inference) – Given: Φ, the observation O = {o1, o2,..., ot} – Goal: Q* = argmaxQ p(Q|O) • Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 08/12/2010 Hidden Markov Models 40
  • 41. Three famous HMM tasks • Given a HMM Φ = (T, E, π). Three famous HMM tasks are: • Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) Calculating the probability of • Most likely expaination (inference) observing the sequence O over all of possible sequences. – Given: Φ, the observation O = {o1, o2,..., ot} – Goal: Q* = argmaxQ p(Q|O) • Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 08/12/2010 Hidden Markov Models 41
  • 42. Three famous HMM tasks • Given a HMM Φ = (T, E, π). Three famous HMM tasks are: • Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} – Goal: p(O|Φ), or equivalently p(st = Si|O) Calculating the best • Most likely expaination (inference) corresponding state sequence, given an observation – Given: Φ, the observation O = {o1, o2,..., ot} sequence. – Goal: Q* = argmaxQ p(Q|O) • Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 08/12/2010 Hidden Markov Models 42
  • 43. Three famous HMM tasks • Given a HMM Φ = (T, E, π). Three famous HMM tasks are: • Probability of an observation sequence (state estimation) – Given: Φ, observation O = {o1, o2,..., ot} Given an (or a set of) – Goal: p(O|Φ), or equivalently p(st = Si|O) observation sequence and • Most likely expaination (inference) corresponding state sequence, – Given: Φ, the observation O = {o1, o2,..., ot} estimate the Transition matrix, – Goal: Q* = argmaxQ p(Q|O) Emission matrix and initial probabilities of the HMM • Learning the HMM – Given: observation O = {o1, o2,..., ot} and corresponding state sequence – Goal: estimate parameters of the HMM Φ = (T, E, π) 08/12/2010 Hidden Markov Models 43
  • 44. Three famous HMM tasks Problem Algorithm Complexity State estimation Forward-Backward O(TN2) Calculating: p(O|Φ) Inference Viterbi decoding O(TN2) Calculating: Q*= argmaxQp(Q|O) Learning Baum-Welch (EM) O(TN2) Calculating: Φ* = argmaxΦp(O|Φ) T: number of timesteps N: number of states 08/12/2010 Hidden Markov Models 44
  • 45. The Forward-Backward Algorithm • Given: Φ = (T, E, π), observation O = {o1, o2,..., ot} • Goal: What is p(o1o2...ot) • We can do this in a slow, stupid way – As shown in the next slide... 08/12/2010 Hidden Markov Models 45
  • 46. Here’s a HMM 0.5 0.2 0.5 0.6 • What is p(O) = p(o1o2o3) s1 0.4 s2 0.8 s3 = p(o1=X3 ∧ o2=X1 ∧ o3=X3)? 0.3 0.7 0.9 • Slow, stupid way: 0.2 0.8 0.1 p (O ) = ∑ p ( OQ ) x1 x2 x3 Q∈paths of length 3 = ∑ p (O | Q ) p (Q ) Q∈paths of length 3 Q∈ • How to compute p(Q) for an arbitrary path Q? • How to compute p(O|Q) for an arbitrary path Q? 08/12/2010 Hidden Markov Models 46
  • 47. Here’s a HMM 0.5 0.2 0.5 0.6 • What is p(O) = p(o1o2o3) s1 0.4 s2 0.8 s3 = p(o1=X3 ∧ o2=X1 ∧ o3=X3)? 0.3 0.7 0.9 • Slow, stupid way: 0.2 0.8 0.1 p (O ) = ∑ p ( OQ ) x1 x2 x3 Q∈paths of length 3 π s1 s2 s3 = ∑ p (O | Q ) p (Q ) Q∈paths of length 3 Q∈ 0.3 0.3 0.4 p(Q) = p(q1q2q3) • How to compute p(Q) for an = p(q1)p(q2|q1)p(q3|q2,q1) (chain) arbitrary path Q? = p(q1)p(q2|q1)p(q3|q2) (why?) • How to compute p(O|Q) for an arbitrary path Q? Example in the case Q=S3S1S1 P(Q) = 0.4 * 0.2 * 0.5 = 0.04 08/12/2010 Hidden Markov Models 47
  • 48. Here’s a HMM 0.5 0.2 0.5 0.6 • What is p(O) = p(o1o2o3) s1 0.4 s2 0.8 s3 = p(o1=X3 ∧ o2=X1 ∧ o3=X3)? 0.3 0.7 0.9 • Slow, stupid way: 0.2 0.8 0.1 p (O ) = ∑ p ( OQ ) x1 x2 x3 Q∈paths of length 3 π s1 s2 s3 = ∑ p (O | Q ) p (Q ) Q∈paths of length 3 Q∈ 0.3 0.3 0.4 p(O|Q) = p(o1o2o3|q1q2q3) • How to compute p(Q) for an = p(o1|q1)p(o2|q1)p(o3|q3) (why?) arbitrary path Q? • How to compute p(O|Q) for an Example in the case Q=S3S1S1 arbitrary path Q? P(O|Q) = p(X3|S3)p(X1|S1) p(X3|S1) =0.8 * 0.3 * 0.7 = 0.168 08/12/2010 Hidden Markov Models 48
  • 49. Here’s a HMM 0.5 0.2 0.5 0.6 • What is p(O) = p(o1o2o3) s1 0.4 s2 0.8 s3 = p(o1=X3 ∧ o2=X1 ∧ o3=X3)? 0.3 0.7 0.9 • Slow, stupid way: 0.2 0.8 0.1 p (O ) = ∑ p ( OQ ) x1 x2 x3 Q∈paths of length 3 π s1 s2 s3 = ∑ p (O | Q ) p (Q ) Q∈paths of length 3 Q∈ 0.3 0.3 0.4 p(O|Q) = p(o1o2o3|q1q2q3) • How to compute p(Q) for an p(O) needs 27 p(Q) arbitrary path Q? = p(o1|q1)p(o2|q1)p(o3|q3) (why?) computations and 27 • How to compute p(O|Q) for an p(O|Q) computations. Example in the case Q=S3S1S1 arbitrary path Q? P(O|Q) = p(X3|S3)p(Xsequence3has ) What if the 1|S1) p(X |S1 20 observations? =0.8 * 0.3 * 0.7 = 0.168 So let’s be smarter... 08/12/2010 Hidden Markov Models 49
  • 50. The Forward algorithm • Given observation o1o2...oT • Define: αt(i) = p(o1o2...ot ∧ qt = Si | Φ) where 1 ≤ t ≤ T αt(i) = probability that, in a random trial: – We’d have seen the first t observations – We’d have ended up in Si as the t’th state visited. • In our example, what is α2(3) ? 08/12/2010 Hidden Markov Models 50