Difference between revisions of "Markov Models"
Jump to navigation
Jump to search
Line 49: | Line 49: | ||
* can model with hidden (latent) variables | * can model with hidden (latent) variables | ||
===Hidden Markov Model=== | ===Hidden Markov Model=== | ||
+ | * A state machine, but you don't know the state | ||
+ | * But there's a probability | ||
+ | * NLP | ||
+ | ** Only information is the order of hte sentence | ||
+ | ** Capures probabl | ||
* can estimate parameters | * can estimate parameters | ||
* goto baseline model for sequential data | * goto baseline model for sequential data | ||
Line 55: | Line 60: | ||
* the trelis diagram: x's are observed data D = (x1, ..., xn), and z's are hidden /latent vars | * the trelis diagram: x's are observed data D = (x1, ..., xn), and z's are hidden /latent vars | ||
* joint distribution of all random variables p(x1,..,xn,z1,...,zn) factors = p(z1)*p(x1|z1)* PRODUCT( from k = 2 to n: p(zk|zk-1)*p(xk|zk) ) | * joint distribution of all random variables p(x1,..,xn,z1,...,zn) factors = p(z1)*p(x1|z1)* PRODUCT( from k = 2 to n: p(zk|zk-1)*p(xk|zk) ) | ||
+ | ==== NLP considerations==== | ||
+ | * Only information is the order of hte sentence | ||
+ | * Captures probability of moving to part of speech | ||
+ | * Transition probaility matrix - for that particular row, the sum of probabilities has to be 1 because that covers all the possibilities of transition | ||
+ | * Incoming probabilities does not have to sum to 1 | ||
+ | * observation is a word emission | ||
+ | * observation liklihood = how many times you see that word, divided by how many words in the corpus | ||
+ | * E.g., when I start the sentence, what is the probabiltity that the part of speech will be an article. | ||
+ | |||
===uses=== | ===uses=== | ||
* handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter. | * handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter. |
Revision as of 23:11, 23 October 2019
Contents
Markov Models
- The future is independent of the past given the present
- if know state of the world right now, then knowing state of the world in the past is not going to help you predict the future
- uses: temporal data, or some sequence of data
- weather
- economic
- language
- speech recognition
- Mark V. Shaney
- music
- automatically generated music
- etc.
example data
ex CO2 levels in atmosphere
- x axis is time
- y axis time sequence data with periodicity and some randomness
position of robot GPS
- x & y-axes are lat long
- noisy measurements of position
- what is the actual position in time t?
fill in the blank language ex
- what is the word at the end of this ___________?
formal def'n
- sequential data D=(x1, ..., xn)
- take random variables X1, .. ,Xn
- there's clearly some dependency on the points that are nearby in time, can't call them iid's.
- if everything is dependent, totally intractable problem
- for the most accurate prediction of what's gonna happen in the near future is what's happening right now.
- if looking xn + 1, just look at more recent data, don't look at data from distant past
- recent past tells you more than distant past.
- Xt depends on Xt-1, Xt-2, ... , Xt-m (fixed m)
- Simplest case: m = 1
- simplifying assumptions: discrete time (duh) discrete space, i.e., xi is discrete variable that happens at discrete times
- discrete r.v.s X1, ..., xn form a discrete time Markov Chain
- i.e., joint distribution p(xt|x1, ..., xt-1) = p(xt|xt-1)
- AND p(x1,...,xn) = p(x1)*p(x2|x1)*p(x3|x2)*...*p(xn|xn-1)
different types (generalizations)
- can also have second order markov chain where m = 2
- can also have continuous time markov chain
- poisson process
- brownian motion continuous time
- e.g., modelling stock prices, brownian motion in 2-D to model a particle of pollen in a glass of water
- e.g. taking a random walk along the integer - discrete time, discrete space, p(moving up) = 1/2, p(movingdown) = 1/2
- e.g., four states of weather. transition probabilities between any two states
- discrete-time, continuous space = "state-space"
- BUT: Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
- Answer - to acknowledge fact that there's hidden information that we're not seeing
- can break up system into observed and hidden parts of the state
- can model with hidden (latent) variables
Hidden Markov Model
- A state machine, but you don't know the state
- But there's a probability
- NLP
- Only information is the order of hte sentence
- Capures probabl
- can estimate parameters
- goto baseline model for sequential data
- some rand vars z1, ... , zn in discrete integers in some finite set (like 26 letters)
- hidden vars x1, ..., xn in some set capital X
- the trelis diagram: x's are observed data D = (x1, ..., xn), and z's are hidden /latent vars
- joint distribution of all random variables p(x1,..,xn,z1,...,zn) factors = p(z1)*p(x1|z1)* PRODUCT( from k = 2 to n: p(zk|zk-1)*p(xk|zk) )
NLP considerations
- Only information is the order of hte sentence
- Captures probability of moving to part of speech
- Transition probaility matrix - for that particular row, the sum of probabilities has to be 1 because that covers all the possibilities of transition
- Incoming probabilities does not have to sum to 1
- observation is a word emission
- observation liklihood = how many times you see that word, divided by how many words in the corpus
- E.g., when I start the sentence, what is the probabiltity that the part of speech will be an article.
uses
- handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
- parameters:
- transition probabilities - T(ij)= p(zk+1= j| zk = i)
- emission probabilities ei(x)= p(x|Zk = i) for i in set and x in its set .. e sub i is a prob dist. on space capitol X
- initial distribution