Markov Models

General

The future is independent of the past given the present
if know state of the world right now, then knowing state of the world in the past is not going to help you predict the future
There's clearly some dependency on the points that are nearby in time, can't call them iid's.
If everything is dependent, totally intractable problem
For the most accurate prediction of what's gonna happen in the near future is what's happening right now.
If looking xn + 1, just look at more recent data, don't look at data from distant past
Recent past tells you more than distant past.
"A Markov chain makes a very strong assumption that if we want to predict the future in the sequence,all that matters is the current state." - Jurafsky
uses: temporal data, or some sequence of data. weather, economic, language, speech recognition, automatically generated music
"Mark V. Shaney" - parody usenet user, a play on the words "markov chain"

CO2 levels in atmosphere: y axis time sequence data with periodicity and some randomness
position of robot GPS, noisy measurements of position. what is the actual position in time t?
fill in the blank language: what is the word at the end of this ___________?
handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.

Q = a set of N states $Q=q_{1}q_{2}...q_{N}$ $Q=q_{1}q_{2}...q_{N}$
- Discrete, e.g., 26 letters
- could be hidden
A = transition probability matrix A, where each state a_ij represents the probability of moving from state i to state j, and the sum of all state transitions sums to 1.
- transition probabilities $T(ij)=p(z_{k+1}=j|z_{k}=i)$
Pi = An initial probability distribution over states, where pi_i is the probability that the Markov Chain will start in state i
- Some states may have pi=0, meaning cannot be initial state.
O = observations $O=o_{1},o_{2},...,o_{n})$
B observation likelihoods expressing the probability of an observation o_t being generated from a state q_i
- emission probabilities $e_{i}(x)=p(x|Z_{k}=i)$
V = vocabulary from which the observations can be drawn
$X_{t}$ = Random variable that depends on $X_{t-1},X_{t-2},...,X{t-m}$ (fixed m)
Simplifying assumptions
- discrete time and discrete space, i.e., xi is discrete variable that happens at discrete times
- Simplest case: m = 1 (First-order markov model)
- Output independence: past states don't affect current observation/emission.
Discrete random variables X1, ..., xn form a discrete time Markov Chain
Joint distribution $p(x_{t}|x_{1},...,x_{t-1})=p(x_{t}|x_{t-1})$
Ergo $p(x_{1},...,x_{n})=p(x_{1})p(x_{2}|x_{1})p(x_{3}|x_{2})...p(x_{n}|x_{n-1})$

can also have second order markov chain where m = 2
can also have continuous time markov chain
- poisson process
- brownian motion continuous time
  - e.g., modelling stock prices, brownian motion in 2-D to model a particle of pollen in a glass of water
e.g. taking a random walk along the integer - discrete time, discrete space, p(moving up) = 1/2, p(movingdown) = 1/2
e.g., four states of weather. transition probabilities between any two states
discrete-time, continuous space = "state-space"

Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
Acknowledge fact that there's hidden information that we're not seeing
Break up system into observed and hidden parts of the state
Model with hidden (latent) variables
HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence.
A state machine, but you don't know the state