Markov Models

The future is independent of the past given the present
if know state of the world right now, then knowing state of the world in the past is not going to help you predict the future
uses: temporal data, or some sequence of data
- weather
- economic
- language
  - speech recognition
  - Mark V. Shaney
- music
  - automatically generated music
- etc.

formal def'n

sequential data D=(x1, ..., xn)
take random variables X1, .. ,Xn
there's clearly some dependency on the points that are nearby in time, can't call them iid's.
if everything is dependent, totally intractable problem
for the most accurate prediction of what's gonna happen in the near future is what's happening right now.
if looking xn + 1, just look at more recent data, don't look at data from distant past
recent past tells you more than distant past.
Xt depends on Xt-1, Xt-2, ... , Xt-m (fixed m)
Simplest case: m = 1
simplifying assumptions: discrete time (duh) discrete space, i.e., xi is discrete variable that happens at discrete times
discrete r.v.s X1, ..., xn form a discrete time Markov Chain
i.e., joint distribution p(xt|x1, ..., xt-1) = p(xt|xt-1)
AND p(x1,...,xn) = p(x1)*p(x2|x1)*p(x3|x2)*...*p(xn|xn-1)

can also have second order markov chain where m = 2
can also have continuous time markov chain
- poisson process
- brownian motion continuous time
  - e.g., modelling stock prices, brownian motion in 2-D to model a particle of pollen in a glass of water
e.g. taking a random walk along the integer - discrete time, discrete space, p(moving up) = 1/2, p(movingdown) = 1/2
e.g., four states of weather. transition probabilities between any two states
discrete-time, continuous space = "state-space"
BUT: Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
Answer - to acknowledge fact that there's hidden information that we're not seeing
can break up system into observed and hidden parts of the state
can model with hidden (latent) variables

A state machine, but you don't know the state
But there's a probability
NLP
- Only information is the order of hte sentence
- Capures probabl
can estimate parameters
goto baseline model for sequential data
some rand vars z1, ... , zn in discrete integers in some finite set (like 26 letters)
hidden vars x1, ..., xn in some set capital X
the trelis diagram: x's are observed data D = (x1, ..., xn), and z's are hidden /latent vars
joint distribution of all random variables p(x1,..,xn,z1,...,zn) factors = p(z1)*p(x1|z1)* PRODUCT( from k = 2 to n: p(zk|zk-1)*p(xk|zk) )

Only information is the order of hte sentence
Captures probability of moving to part of speech
Transition probaility matrix - for that particular row, the sum of probabilities has to be 1 because that covers all the possibilities of transition
Incoming probabilities does not have to sum to 1
observation is a word emission
observation liklihood = how many times you see that word, divided by how many words in the corpus
E.g., when I start the sentence, what is the probabiltity that the part of speech will be an article.

handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
parameters:

transition probabilities - T(ij)= p(zk+1= j| zk = i)
emission probabilities ei(x)= p(x|Zk = i) for i in set and x in its set .. e sub i is a prob dist. on space capitol X
initial distribution