Difference between revisions of "Markov Models"

From Colettapedia
Jump to navigation Jump to search
Line 15: Line 15:
 
* position of robot GPS, noisy measurements of position. what is the actual position in time t?
 
* position of robot GPS, noisy measurements of position. what is the actual position in time t?
 
* fill in the blank language: what is the word at the end of this ___________?
 
* fill in the blank language: what is the word at the end of this ___________?
 +
* handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
 +
 
==Definitions==
 
==Definitions==
 
* Q = a set of N states <math>Q=q_1 q_2 ... q_N</math>
 
* Q = a set of N states <math>Q=q_1 q_2 ... q_N</math>
 +
** Discrete, e.g., 26 letters
 +
** could be hidden
 
* A = transition probability matrix A, where each state a_ij represents the probability of moving from state i to state j, and the sum of all state transitions sums to 1.
 
* A = transition probability matrix A, where each state a_ij represents the probability of moving from state i to state j, and the sum of all state transitions sums to 1.
 +
** transition probabilities <math>T(ij)= p( z_{k+1} = j | z_k = i)</math>
 
* Pi = An initial probability distribution over states, where pi_i is the probability that the Markov Chain will start in state i
 
* Pi = An initial probability distribution over states, where pi_i is the probability that the Markov Chain will start in state i
 
** Some states may have pi=0, meaning cannot be initial state.
 
** Some states may have pi=0, meaning cannot be initial state.
 
* O = observations <math>O = o_1, o_2,..., o_n)</math>
 
* O = observations <math>O = o_1, o_2,..., o_n)</math>
 +
* B observation likelihoods expressing the probability of an observation o_t being generated from a state q_i
 +
** emission probabilities <math>e_i(x) = p( x| Z_k = i )</math>
 
* V = vocabulary from which the observations can be drawn
 
* V = vocabulary from which the observations can be drawn
 
* <math>X_t</math> = Random variable that depends on <math>X_{t-1}, X_{t-2}, ... , X{t-m}</math> (fixed m)
 
* <math>X_t</math> = Random variable that depends on <math>X_{t-1}, X_{t-2}, ... , X{t-m}</math> (fixed m)
** Simplest case: m = 1
+
* Simplifying assumptions
* Simplifying assumptions: discrete time and discrete space, i.e., xi is discrete variable that happens at discrete times
+
** discrete time and discrete space, i.e., xi is discrete variable that happens at discrete times
 +
** Simplest case: m = 1 (First-order markov model)
 +
** Output independence: other states don't effect observation
 
* Discrete random variables X1, ..., xn  form a discrete time Markov Chain  
 
* Discrete random variables X1, ..., xn  form a discrete time Markov Chain  
 
* Joint distribution <math>p(x_t|x_1, ..., x_{t-1}) = p(x_t|x_{t-1})</math>
 
* Joint distribution <math>p(x_t|x_1, ..., x_{t-1}) = p(x_t|x_{t-1})</math>
 
* Ergo <math>p(x_1, ..., x_n) = p(x_1) p(x_2|x_1) p(x_3|x_2) ... p(x_n|x_{n-1})</math>
 
* Ergo <math>p(x_1, ..., x_n) = p(x_1) p(x_2|x_1) p(x_3|x_2) ... p(x_n|x_{n-1})</math>
===different types (generalizations)===
+
 
 +
==Generalizations==
 
* can also have second order markov chain where m = 2
 
* can also have second order markov chain where m = 2
 
* can also have continuous time markov chain
 
* can also have continuous time markov chain
Line 37: Line 47:
 
* e.g., four states of weather. transition probabilities between any two states
 
* e.g., four states of weather. transition probabilities between any two states
 
* discrete-time, continuous space = "state-space"  
 
* discrete-time, continuous space = "state-space"  
* BUT: Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
 
* Answer - to acknowledge fact that there's hidden information that we're not seeing
 
* can break up system into observed and hidden parts of the state
 
* can model with hidden (latent) variables
 
  
===uses===
 
* handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
 
* parameters:
 
# transition probabilities - T(ij)= p(zk+1= j| zk = i)
 
# emission probabilities ei(x)= p(x|Zk = i) for i in set and x in its set .. e sub i is a prob dist. on space capitol X
 
# initial distribution
 
  
==Hidden Markov Model==
+
== Hidden Markov Model ==
 +
* Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
 +
* Acknowledge fact that there's hidden information that we're not seeing
 +
* Break up system into observed and hidden parts of the state
 +
* Model with hidden (latent) variables
 
* HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence.
 
* HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence.
 
* A state machine, but you don't know the state
 
* A state machine, but you don't know the state
* But there's a probability
+
 
* NLP
+
== HMM NLP considerations==
** Only information is the order of the sentence
 
** Capures probabl
 
* can estimate parameters
 
* goto baseline model for sequential data
 
* some rand vars z1, ... , zn in discrete integers in some finite set (like 26 letters)
 
* hidden vars x1, ..., xn in some set capital X
 
* the trelis diagram: x's are observed data D = (x1, ..., xn), and z's are hidden /latent vars
 
* joint distribution of all random variables p(x1,..,xn,z1,...,zn) factors = p(z1)*p(x1|z1)* PRODUCT( from k = 2 to n: p(zk|zk-1)*p(xk|zk) )
 
=== NLP considerations===
 
 
* Only information is the order of hte sentence
 
* Only information is the order of hte sentence
 
* Captures probability of moving to part of speech
 
* Captures probability of moving to part of speech

Revision as of 17:58, 26 October 2019

General

  • The future is independent of the past given the present
  • if know state of the world right now, then knowing state of the world in the past is not going to help you predict the future
  • There's clearly some dependency on the points that are nearby in time, can't call them iid's.
  • If everything is dependent, totally intractable problem
  • For the most accurate prediction of what's gonna happen in the near future is what's happening right now.
  • If looking xn + 1, just look at more recent data, don't look at data from distant past
  • Recent past tells you more than distant past.
  • "A Markov chain makes a very strong assumption that if we want to predict the future in the sequence,all that matters is the current state." - Jurafsky
  • uses: temporal data, or some sequence of data. weather, economic, language, speech recognition, automatically generated music
  • "Mark V. Shaney" - parody usenet user, a play on the words "markov chain"

Examples

  • CO2 levels in atmosphere: y axis time sequence data with periodicity and some randomness
  • position of robot GPS, noisy measurements of position. what is the actual position in time t?
  • fill in the blank language: what is the word at the end of this ___________?
  • handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.

Definitions

  • Q = a set of N states
    • Discrete, e.g., 26 letters
    • could be hidden
  • A = transition probability matrix A, where each state a_ij represents the probability of moving from state i to state j, and the sum of all state transitions sums to 1.
    • transition probabilities
  • Pi = An initial probability distribution over states, where pi_i is the probability that the Markov Chain will start in state i
    • Some states may have pi=0, meaning cannot be initial state.
  • O = observations
  • B observation likelihoods expressing the probability of an observation o_t being generated from a state q_i
    • emission probabilities
  • V = vocabulary from which the observations can be drawn
  • = Random variable that depends on (fixed m)
  • Simplifying assumptions
    • discrete time and discrete space, i.e., xi is discrete variable that happens at discrete times
    • Simplest case: m = 1 (First-order markov model)
    • Output independence: other states don't effect observation
  • Discrete random variables X1, ..., xn form a discrete time Markov Chain
  • Joint distribution
  • Ergo

Generalizations

  • can also have second order markov chain where m = 2
  • can also have continuous time markov chain
    • poisson process
    • brownian motion continuous time
      • e.g., modelling stock prices, brownian motion in 2-D to model a particle of pollen in a glass of water
  • e.g. taking a random walk along the integer - discrete time, discrete space, p(moving up) = 1/2, p(movingdown) = 1/2
  • e.g., four states of weather. transition probabilities between any two states
  • discrete-time, continuous space = "state-space"


Hidden Markov Model

  • Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
  • Acknowledge fact that there's hidden information that we're not seeing
  • Break up system into observed and hidden parts of the state
  • Model with hidden (latent) variables
  • HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence.
  • A state machine, but you don't know the state

HMM NLP considerations

  • Only information is the order of hte sentence
  • Captures probability of moving to part of speech
  • Transition probaility matrix - for that particular row, the sum of probabilities has to be 1 because that covers all the possibilities of transition
  • Incoming probabilities does not have to sum to 1
  • observation is a word emission
  • observation liklihood = how many times you see that word, divided by how many words in the corpus
  • E.g., when I start the sentence, what is the probabiltity that the part of speech will be an article.