Difference between revisions of "Markov Models"
Jump to navigation
Jump to search
Line 15: | Line 15: | ||
* position of robot GPS, noisy measurements of position. what is the actual position in time t? | * position of robot GPS, noisy measurements of position. what is the actual position in time t? | ||
* fill in the blank language: what is the word at the end of this ___________? | * fill in the blank language: what is the word at the end of this ___________? | ||
+ | * handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter. | ||
+ | |||
==Definitions== | ==Definitions== | ||
* Q = a set of N states <math>Q=q_1 q_2 ... q_N</math> | * Q = a set of N states <math>Q=q_1 q_2 ... q_N</math> | ||
+ | ** Discrete, e.g., 26 letters | ||
+ | ** could be hidden | ||
* A = transition probability matrix A, where each state a_ij represents the probability of moving from state i to state j, and the sum of all state transitions sums to 1. | * A = transition probability matrix A, where each state a_ij represents the probability of moving from state i to state j, and the sum of all state transitions sums to 1. | ||
+ | ** transition probabilities <math>T(ij)= p( z_{k+1} = j | z_k = i)</math> | ||
* Pi = An initial probability distribution over states, where pi_i is the probability that the Markov Chain will start in state i | * Pi = An initial probability distribution over states, where pi_i is the probability that the Markov Chain will start in state i | ||
** Some states may have pi=0, meaning cannot be initial state. | ** Some states may have pi=0, meaning cannot be initial state. | ||
* O = observations <math>O = o_1, o_2,..., o_n)</math> | * O = observations <math>O = o_1, o_2,..., o_n)</math> | ||
+ | * B observation likelihoods expressing the probability of an observation o_t being generated from a state q_i | ||
+ | ** emission probabilities <math>e_i(x) = p( x| Z_k = i )</math> | ||
* V = vocabulary from which the observations can be drawn | * V = vocabulary from which the observations can be drawn | ||
* <math>X_t</math> = Random variable that depends on <math>X_{t-1}, X_{t-2}, ... , X{t-m}</math> (fixed m) | * <math>X_t</math> = Random variable that depends on <math>X_{t-1}, X_{t-2}, ... , X{t-m}</math> (fixed m) | ||
− | ** | + | * Simplifying assumptions |
− | * | + | ** discrete time and discrete space, i.e., xi is discrete variable that happens at discrete times |
+ | ** Simplest case: m = 1 (First-order markov model) | ||
+ | ** Output independence: other states don't effect observation | ||
* Discrete random variables X1, ..., xn form a discrete time Markov Chain | * Discrete random variables X1, ..., xn form a discrete time Markov Chain | ||
* Joint distribution <math>p(x_t|x_1, ..., x_{t-1}) = p(x_t|x_{t-1})</math> | * Joint distribution <math>p(x_t|x_1, ..., x_{t-1}) = p(x_t|x_{t-1})</math> | ||
* Ergo <math>p(x_1, ..., x_n) = p(x_1) p(x_2|x_1) p(x_3|x_2) ... p(x_n|x_{n-1})</math> | * Ergo <math>p(x_1, ..., x_n) = p(x_1) p(x_2|x_1) p(x_3|x_2) ... p(x_n|x_{n-1})</math> | ||
− | == | + | |
+ | ==Generalizations== | ||
* can also have second order markov chain where m = 2 | * can also have second order markov chain where m = 2 | ||
* can also have continuous time markov chain | * can also have continuous time markov chain | ||
Line 37: | Line 47: | ||
* e.g., four states of weather. transition probabilities between any two states | * e.g., four states of weather. transition probabilities between any two states | ||
* discrete-time, continuous space = "state-space" | * discrete-time, continuous space = "state-space" | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | ==Hidden Markov Model== | + | == Hidden Markov Model == |
+ | * Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements | ||
+ | * Acknowledge fact that there's hidden information that we're not seeing | ||
+ | * Break up system into observed and hidden parts of the state | ||
+ | * Model with hidden (latent) variables | ||
* HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence. | * HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence. | ||
* A state machine, but you don't know the state | * A state machine, but you don't know the state | ||
− | + | ||
− | + | == HMM NLP considerations== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
* Only information is the order of hte sentence | * Only information is the order of hte sentence | ||
* Captures probability of moving to part of speech | * Captures probability of moving to part of speech |
Revision as of 17:58, 26 October 2019
Contents
General
- The future is independent of the past given the present
- if know state of the world right now, then knowing state of the world in the past is not going to help you predict the future
- There's clearly some dependency on the points that are nearby in time, can't call them iid's.
- If everything is dependent, totally intractable problem
- For the most accurate prediction of what's gonna happen in the near future is what's happening right now.
- If looking xn + 1, just look at more recent data, don't look at data from distant past
- Recent past tells you more than distant past.
- "A Markov chain makes a very strong assumption that if we want to predict the future in the sequence,all that matters is the current state." - Jurafsky
- uses: temporal data, or some sequence of data. weather, economic, language, speech recognition, automatically generated music
- "Mark V. Shaney" - parody usenet user, a play on the words "markov chain"
Examples
- CO2 levels in atmosphere: y axis time sequence data with periodicity and some randomness
- position of robot GPS, noisy measurements of position. what is the actual position in time t?
- fill in the blank language: what is the word at the end of this ___________?
- handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
Definitions
- Q = a set of N states
- Discrete, e.g., 26 letters
- could be hidden
- A = transition probability matrix A, where each state a_ij represents the probability of moving from state i to state j, and the sum of all state transitions sums to 1.
- transition probabilities
- Pi = An initial probability distribution over states, where pi_i is the probability that the Markov Chain will start in state i
- Some states may have pi=0, meaning cannot be initial state.
- O = observations
- B observation likelihoods expressing the probability of an observation o_t being generated from a state q_i
- emission probabilities
- V = vocabulary from which the observations can be drawn
- = Random variable that depends on (fixed m)
- Simplifying assumptions
- discrete time and discrete space, i.e., xi is discrete variable that happens at discrete times
- Simplest case: m = 1 (First-order markov model)
- Output independence: other states don't effect observation
- Discrete random variables X1, ..., xn form a discrete time Markov Chain
- Joint distribution
- Ergo
Generalizations
- can also have second order markov chain where m = 2
- can also have continuous time markov chain
- poisson process
- brownian motion continuous time
- e.g., modelling stock prices, brownian motion in 2-D to model a particle of pollen in a glass of water
- e.g. taking a random walk along the integer - discrete time, discrete space, p(moving up) = 1/2, p(movingdown) = 1/2
- e.g., four states of weather. transition probabilities between any two states
- discrete-time, continuous space = "state-space"
Hidden Markov Model
- Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
- Acknowledge fact that there's hidden information that we're not seeing
- Break up system into observed and hidden parts of the state
- Model with hidden (latent) variables
- HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence.
- A state machine, but you don't know the state
HMM NLP considerations
- Only information is the order of hte sentence
- Captures probability of moving to part of speech
- Transition probaility matrix - for that particular row, the sum of probabilities has to be 1 because that covers all the possibilities of transition
- Incoming probabilities does not have to sum to 1
- observation is a word emission
- observation liklihood = how many times you see that word, divided by how many words in the corpus
- E.g., when I start the sentence, what is the probabiltity that the part of speech will be an article.