Difference between revisions of "Markov Models"

From Colettapedia
Jump to navigation Jump to search
Line 2: Line 2:
 
* The future is independent of the past given the present
 
* The future is independent of the past given the present
 
* if know state of the world right now, then knowing state of the world in the past is not going to help you predict the future
 
* if know state of the world right now, then knowing state of the world in the past is not going to help you predict the future
* uses: temporal data, or some sequence of data
+
* "A Markov chain makes a very strong assumption that if we want to predict the future in the sequence,all that matters is the current state." - Jurafsky
** weather
+
* uses: temporal data, or some sequence of data. weather, economic, language, speech recognition, automatically generated music
** economic
+
* "Mark V. Shaney" - parody usenet user, a play on the words "markov chain"
** language
+
 
*** speech recognition
 
*** Mark V. Shaney
 
** music
 
*** automatically generated music
 
** etc.
 
 
===example data===
 
===example data===
 
====ex CO2 levels in atmosphere====
 
====ex CO2 levels in atmosphere====
Line 48: Line 43:
 
* can break up system into observed and hidden parts of the state
 
* can break up system into observed and hidden parts of the state
 
* can model with hidden (latent) variables
 
* can model with hidden (latent) variables
===Hidden Markov Model===
+
 
 +
===uses===
 +
* handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
 +
* parameters:
 +
# transition probabilities - T(ij)= p(zk+1= j| zk = i)
 +
# emission probabilities ei(x)= p(x|Zk = i) for i in set and x in its set .. e sub i is a prob dist. on space capitol X
 +
# initial distribution
 +
 
 +
==Hidden Markov Model==
 +
* HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence.
 
* A state machine, but you don't know the state
 
* A state machine, but you don't know the state
 
* But there's a probability
 
* But there's a probability
 
* NLP  
 
* NLP  
** Only information is the order of hte sentence
+
** Only information is the order of the sentence
 
** Capures probabl
 
** Capures probabl
 
* can estimate parameters
 
* can estimate parameters
Line 60: Line 64:
 
* the trelis diagram: x's are observed data D = (x1, ..., xn), and z's are hidden /latent vars
 
* the trelis diagram: x's are observed data D = (x1, ..., xn), and z's are hidden /latent vars
 
* joint distribution of all random variables p(x1,..,xn,z1,...,zn) factors = p(z1)*p(x1|z1)* PRODUCT( from k = 2 to n: p(zk|zk-1)*p(xk|zk) )
 
* joint distribution of all random variables p(x1,..,xn,z1,...,zn) factors = p(z1)*p(x1|z1)* PRODUCT( from k = 2 to n: p(zk|zk-1)*p(xk|zk) )
==== NLP considerations====
+
=== NLP considerations===
 
* Only information is the order of hte sentence
 
* Only information is the order of hte sentence
 
* Captures probability of moving to part of speech
 
* Captures probability of moving to part of speech
Line 68: Line 72:
 
* observation liklihood = how many times you see that word, divided by how many words in the corpus
 
* observation liklihood = how many times you see that word, divided by how many words in the corpus
 
* E.g., when I start the sentence, what is the probabiltity that the part of speech will be an article.
 
* E.g., when I start the sentence, what is the probabiltity that the part of speech will be an article.
 
===uses===
 
* handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
 
* parameters:
 
# transition probabilities - T(ij)= p(zk+1= j| zk = i)
 
# emission probabilities ei(x)= p(x|Zk = i) for i in set and x in its set .. e sub i is a prob dist. on space capitol X
 
# initial distribution
 

Revision as of 17:09, 26 October 2019

Markov Models

  • The future is independent of the past given the present
  • if know state of the world right now, then knowing state of the world in the past is not going to help you predict the future
  • "A Markov chain makes a very strong assumption that if we want to predict the future in the sequence,all that matters is the current state." - Jurafsky
  • uses: temporal data, or some sequence of data. weather, economic, language, speech recognition, automatically generated music
  • "Mark V. Shaney" - parody usenet user, a play on the words "markov chain"

example data

ex CO2 levels in atmosphere

  • x axis is time
  • y axis time sequence data with periodicity and some randomness

position of robot GPS

  • x & y-axes are lat long
  • noisy measurements of position
  • what is the actual position in time t?

fill in the blank language ex

  • what is the word at the end of this ___________?

formal def'n

  • sequential data D=(x1, ..., xn)
  • take random variables X1, .. ,Xn
  • there's clearly some dependency on the points that are nearby in time, can't call them iid's.
  • if everything is dependent, totally intractable problem
  • for the most accurate prediction of what's gonna happen in the near future is what's happening right now.
  • if looking xn + 1, just look at more recent data, don't look at data from distant past
  • recent past tells you more than distant past.
  • Xt depends on Xt-1, Xt-2, ... , Xt-m (fixed m)
  • Simplest case: m = 1
  • simplifying assumptions: discrete time (duh) discrete space, i.e., xi is discrete variable that happens at discrete times
  • discrete r.v.s X1, ..., xn form a discrete time Markov Chain
  • i.e., joint distribution p(xt|x1, ..., xt-1) = p(xt|xt-1)
  • AND p(x1,...,xn) = p(x1)*p(x2|x1)*p(x3|x2)*...*p(xn|xn-1)

different types (generalizations)

  • can also have second order markov chain where m = 2
  • can also have continuous time markov chain
    • poisson process
    • brownian motion continuous time
      • e.g., modelling stock prices, brownian motion in 2-D to model a particle of pollen in a glass of water
  • e.g. taking a random walk along the integer - discrete time, discrete space, p(moving up) = 1/2, p(movingdown) = 1/2
  • e.g., four states of weather. transition probabilities between any two states
  • discrete-time, continuous space = "state-space"
  • BUT: Can't expect to observe the perfect information about true state of the world (system) ... noisy observations, measurements
  • Answer - to acknowledge fact that there's hidden information that we're not seeing
  • can break up system into observed and hidden parts of the state
  • can model with hidden (latent) variables

uses

  • handwriting recognition - x is observed scribble (the strokes of the letter), z is the actual letter.
  • parameters:
  1. transition probabilities - T(ij)= p(zk+1= j| zk = i)
  2. emission probabilities ei(x)= p(x|Zk = i) for i in set and x in its set .. e sub i is a prob dist. on space capitol X
  3. initial distribution

Hidden Markov Model

  • HMM is a sequence model/classifier whose job is to assign a label or class to each unit in a sequence.
  • A state machine, but you don't know the state
  • But there's a probability
  • NLP
    • Only information is the order of the sentence
    • Capures probabl
  • can estimate parameters
  • goto baseline model for sequential data
  • some rand vars z1, ... , zn in discrete integers in some finite set (like 26 letters)
  • hidden vars x1, ..., xn in some set capital X
  • the trelis diagram: x's are observed data D = (x1, ..., xn), and z's are hidden /latent vars
  • joint distribution of all random variables p(x1,..,xn,z1,...,zn) factors = p(z1)*p(x1|z1)* PRODUCT( from k = 2 to n: p(zk|zk-1)*p(xk|zk) )

NLP considerations

  • Only information is the order of hte sentence
  • Captures probability of moving to part of speech
  • Transition probaility matrix - for that particular row, the sum of probabilities has to be 1 because that covers all the possibilities of transition
  • Incoming probabilities does not have to sum to 1
  • observation is a word emission
  • observation liklihood = how many times you see that word, divided by how many words in the corpus
  • E.g., when I start the sentence, what is the probabiltity that the part of speech will be an article.