Difference between revisions of "Markov Models"

Revision as of 17:58, 26 October 2019

General

The future is independent of the past given the present
if know state of the world right now, then knowing state of the world in the past is not going to help you predict the future
There's clearly some dependency on the points that are nearby in time, can't call them iid's.
If everything is dependent, totally intractable problem
For the most accurate prediction of what's gonna happen in the near future is what's happening right now.
If looking xn + 1, just look at more recent data, don't look at data from distant past
Recent past tells you more than distant past.
"A Markov chain makes a very strong assumption that if we want to predict the future in the sequence,all that matters is the current state." - Jurafsky
uses: temporal data, or some sequence of data. weather, economic, language, speech recognition, automatically generated music
"Mark V. Shaney" - parody usenet user, a play on the words "markov chain"

Examples

CO2 levels in atmosphere: y axis time sequence data with periodicity and some randomness
position of robot GPS, noisy measurements of position. what is the actual position in time t?
fill in the blank language: what is the word at the end of this ___________?
handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.

Definitions

Q = a set of N states $Q=q_{1}q_{2}...q_{N}$ $Q=q_{1}q_{2}...q_{N}$
- Discrete, e.g., 26 letters
- could be hidden
A = transition probability matrix A, where each state a_ij represents the probability of moving from state i to state j, and the sum of all state transitions sums to 1.
- transition probabilities $T(ij)=p(z_{k+1}=j|z_{k}=i)$
Pi = An initial probability distribution over states, where pi_i is the probability that the Markov Chain will start in state i
- Some states may have pi=0, meaning cannot be initial state.
O = observations $O=o_{1},o_{2},...,o_{n})$
B observation likelihoods expressing the probability of an observation o_t being generated from a state q_i
- emission probabilities $e_{i}(x)=p(x|Z_{k}=i)$
V = vocabulary from which the observations can be drawn
$X_{t}$ = Random variable that depends on $X_{t-1},X_{t-2},...,X{t-m}$ (fixed m)
Simplifying assumptions
- discrete time and discrete space, i.e., xi is discrete variable that happens at discrete times
- Simplest case: m = 1 (First-order markov model)
- Output independence: other states don't effect observation
Discrete random variables X1, ..., xn form a discrete time Markov Chain
Joint distribution $p(x_{t}|x_{1},...,x_{t-1})=p(x_{t}|x_{t-1})$
Ergo $p(x_{1},...,x_{n})=p(x_{1})p(x_{2}|x_{1})p(x_{3}|x_{2})...p(x_{n}|x_{n-1})$

Generalizations

can also have second order markov chain where m = 2
can also have continuous time markov chain
- poisson process
- brownian motion continuous time
  - e.g., modelling stock prices, brownian motion in 2-D to model a particle of pollen in a glass of water
e.g. taking a random walk along the integer - discrete time, discrete space, p(moving up) = 1/2, p(movingdown) = 1/2
e.g., four states of weather. transition probabilities between any two states
discrete-time, continuous space = "state-space"

Hidden Markov Model

Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
Acknowledge fact that there's hidden information that we're not seeing
Break up system into observed and hidden parts of the state
Model with hidden (latent) variables
HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence.
A state machine, but you don't know the state

HMM NLP considerations

Only information is the order of hte sentence
Captures probability of moving to part of speech
Transition probaility matrix - for that particular row, the sum of probabilities has to be 1 because that covers all the possibilities of transition
Incoming probabilities does not have to sum to 1
observation is a word emission
observation liklihood = how many times you see that word, divided by how many words in the corpus
E.g., when I start the sentence, what is the probabiltity that the part of speech will be an article.

@@ Line 15: / Line 15: @@
 * position of robot GPS, noisy measurements of position. what is the actual position in time t?
 * fill in the blank language: what is the word at the end of this ___________?
+* handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
 ==Definitions==
 * Q = a set of N states <math>Q=q_1 q_2 ... q_N</math>
+** Discrete, e.g., 26 letters
+** could be hidden
 * A = transition probability matrix A, where each state a_ij represents the probability of moving from state i to state j, and the sum of all state transitions sums to 1.
+** transition probabilities <math>T(ij)= p( z_{k+1} = j | z_k = i)</math>
 * Pi = An initial probability distribution over states, where pi_i is the probability that the Markov Chain will start in state i
 ** Some states may have pi=0, meaning cannot be initial state.
 * O = observations <math>O = o_1, o_2,..., o_n)</math>
+* B observation likelihoods expressing the probability of an observation o_t being generated from a state q_i
+** emission probabilities <math>e_i(x) = p( x| Z_k = i )</math>
 * V = vocabulary from which the observations can be drawn
 * <math>X_t</math> = Random variable that depends on <math>X_{t-1}, X_{t-2}, ... , X{t-m}</math> (fixed m)
-** Simplest case: m = 1
+* Simplifying assumptions
-* Simplifying assumptions: discrete time and discrete space, i.e., xi is discrete variable that happens at discrete times
+** discrete time and discrete space, i.e., xi is discrete variable that happens at discrete times
+** Simplest case: m = 1 (First-order markov model)
+** Output independence: other states don't effect observation
 * Discrete random variables X1, ..., xn  form a discrete time Markov Chain
 * Joint distribution <math>p(x_t|x_1, ..., x_{t-1}) = p(x_t|x_{t-1})</math>
 * Ergo <math>p(x_1, ..., x_n) = p(x_1) p(x_2|x_1) p(x_3|x_2) ... p(x_n|x_{n-1})</math>
-===different types (generalizations)===
+==Generalizations==
 * can also have second order markov chain where m = 2
 * can also have continuous time markov chain
@@ Line 37: / Line 47: @@
 * e.g., four states of weather. transition probabilities between any two states
 * discrete-time, continuous space = "state-space"
-* BUT: Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
-* Answer - to acknowledge fact that there's hidden information that we're not seeing
-* can break up system into observed and hidden parts of the state
-* can model with hidden (latent) variables
-===uses===
-* handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
-* parameters:
-# transition probabilities - T(ij)= p(zk+1= j| zk = i)
-# emission probabilities ei(x)= p(x|Zk = i) for i in set and x in its set .. e sub i is a prob dist. on space capitol X
-# initial distribution
-==Hidden Markov Model==
+== Hidden Markov Model ==
+* Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
+* Acknowledge fact that there's hidden information that we're not seeing
+* Break up system into observed and hidden parts of the state
+* Model with hidden (latent) variables
 * HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence.
 * A state machine, but you don't know the state
-* But there's a probability
-* NLP
+== HMM NLP considerations==
-** Only information is the order of the sentence
-** Capures probabl
-* can estimate parameters
-* goto baseline model for sequential data
-* some rand vars z1, ... , zn in discrete integers in some finite set (like 26 letters)
-* hidden vars x1, ..., xn in some set capital X
-* the trelis diagram: x's are observed data D = (x1, ..., xn), and z's are hidden /latent vars
-* joint distribution of all random variables p(x1,..,xn,z1,...,zn) factors = p(z1)*p(x1|z1)* PRODUCT( from k = 2 to n: p(zk|zk-1)*p(xk|zk) )
-=== NLP considerations===
 * Only information is the order of hte sentence
 * Captures probability of moving to part of speech

Difference between revisions of "Markov Models"

Revision as of 17:58, 26 October 2019

Contents

General

Examples

Definitions

Generalizations

Hidden Markov Model

HMM NLP considerations

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools