Difference between revisions of "Maximum Likelihood Estimation"

From Colettapedia
Jump to navigation Jump to search
Tags: Mobile edit Mobile web edit
Tags: Mobile edit Mobile web edit
Line 1: Line 1:
 
==General==
 
==General==
 +
 
* Obtain an estimate for an unknown parameter theta using the data that we obtained from our sample.
 
* Obtain an estimate for an unknown parameter theta using the data that we obtained from our sample.
 
* Choose a value of theta that maximizes the likelihood of getting the data we observed.
 
* Choose a value of theta that maximizes the likelihood of getting the data we observed.
 
* Joint probability mass function: If the observations are independent you can just multiply the PDFs of the individual observations.
 
* Joint probability mass function: If the observations are independent you can just multiply the PDFs of the individual observations.
 
** <math>L(\theta)=\prod_{i=1}^n f(x_i;\theta)</math> (General formulation)
 
** <math>L(\theta)=\prod_{i=1}^n f(x_i;\theta)</math> (General formulation)
 +
  
 
==Bernoulli Distribution==
 
==Bernoulli Distribution==
* <math>f(x_i;p)=p^{x_i}(1-p)^{1-x_i}</math> for xi = 0 or 1 and 0 < p < 1.
+
 
* If the Xi are independent Bernoulli random variables with unknown parameter p, replace the general notation with the bernoulli notation:
+
* E.g., what is the estimate of mortality rate at a given hospital? Say each patient comes from a Bernoulli distribution
* <math>L(p)=p^{\sum x_i}(1-p)^{n-\sum x_i}</math>
+
* <math>Y_i \sim B( \theta )</math>, where theta is unknown parameter, therefore using greek letter
* <math>log L(p) = (\sum x_i) log(p) + (n- \sum x_i) log( 1-p)</math>
+
* <math>P( Y_i = 1 ) = \theta </math> for a single given person
 +
*<math> P( \mathbf{Y} = \mathbf{y} | \theta ) = P( Y_1 = y_1, Y_2=y_2, ... , Y_n= y_n | \theta )</math>  using vector form (using bold for vector notation)
 +
* <math> P( \mathbf{Y} = \mathbf{y} | \theta ) = P( Y_1 = y_1) ... P(Y_n= y_n | \theta ) = \prod_{i=1}^n P( Y_i = y_i | \theta )</math> because they are independent
 +
* <math> P( \mathbf{Y} = \mathbf{y} | \theta ) = \prod_{i=1}^n \theta^{y_i} (1-\theta)^{1-y_i}</math> using what we know from Bernoulli distributions
 +
** "The probability of observing the actual data we collected, conditioned on the value of the parameter theta."
 +
** Concept of likelihood implies thinking about this density function as a function of theta
 +
* <math>L( \theta | \mathbf{ y } ) = \prod_{i=1}^n \theta^{y_i} (1-\theta)^{1-y_i}</math>
 +
** The two functions look the same, whereas above is a function of y, given theta. Here the likelihood is a function of theta, given y. It's no longer a probability distribution, but it's still a function for theta.
 +
* <math>MLE: \hat{ \theta } = \textrm{argmax} L( \theta | \mathbf{ y } )</math>
 +
** To estimate theta, choose the theta that gives us the largest value of the likelihood. It makes the data the most likely to occur for the particular data we observed.
 +
* <math> l ( \theta ) = \textrm{log} L ( \theta | \mathbf{y} )</math>
 +
** Since logarithm is a monotonic function, if we maximize logarithm of the function, we also maximize the original function
 +
** Can drop "condition on y" notation here
 +
* <math> l ( \theta ) = \textrm{log} \left[  \prod \theta^{y_i} (1-\theta)^{1-y_i} \right] = \sum \textrm{log} \left[ \theta^{y_i} (1-\theta)^{1-y_i} \right] = \sum \left[  y_i \textrm{log} \theta + (1-y_i) \textrm{log} (1-\theta) \right]  </math>
 +
* <math> l ( \theta ) = \left( \sum y_i \right) \textrm{log} \theta + \left( \sum (1-y_i) \right)  \textrm{log} (1-\theta)</math>
 +
* <math> l '( \theta ) =  \frac{1}{ \theta } \sum y_i - \frac{ 1 } {1- \theta } \sum (1-y_i) = 0</math>
 +
** Here we take derivative and set = 0.
 +
* <math>0 = \frac{ \sum y_i }{ \hat{ \theta } } - \frac{ \sum (1-y_i) }{ 1- \hat{ \theta } }</math>
 +
** The hat implies parameter estimate
 +
* <math>\hat{ \theta } = \frac{ }{ n } \sum y_i </math>
 +
* Approx Ci for 95% <math>\hat{ \theta } \pm 1.96 \sqrt{ \frac{ \hat{ \theta } (1- \hat{ \theta } ) } {n} } </math>
  
  
Line 15: Line 37:
 
* Suppose we have samples from an exponential distribution with parameter lambda:
 
* Suppose we have samples from an exponential distribution with parameter lambda:
 
** <math>X_i \sim \textrm{Exp}( \lambda ) </math>, assuming i.i.d.
 
** <math>X_i \sim \textrm{Exp}( \lambda ) </math>, assuming i.i.d.
* Recall that the density is the product of <math>f( x_{\textrm{undertilde}} | \lambda ) = \prod_{i=1}^n \lambda e^{- \lambda x_i } = \lambda^n e ^{-\lambda \sum x_i}</math>
+
* Recall that the density is the product of <math>f( \mathbf{x} | \lambda ) = \prod_{i=1}^n \lambda e^{- \lambda x_i } = \lambda^n e ^{-\lambda \sum x_i}</math>
* <math>L( \lambda | x_{\textrm{undertilde}} ) =  \lambda^n e ^{-\lambda \sum x_i}</math>
+
* <math>L( \lambda | \mathbf{x} ) =  \lambda^n e ^{-\lambda \sum x_i}</math>

Revision as of 17:59, 11 May 2020

General

  • Obtain an estimate for an unknown parameter theta using the data that we obtained from our sample.
  • Choose a value of theta that maximizes the likelihood of getting the data we observed.
  • Joint probability mass function: If the observations are independent you can just multiply the PDFs of the individual observations.
    • (General formulation)


Bernoulli Distribution

  • E.g., what is the estimate of mortality rate at a given hospital? Say each patient comes from a Bernoulli distribution
  • , where theta is unknown parameter, therefore using greek letter
  • for a single given person
  • using vector form (using bold for vector notation)
  • because they are independent
  • using what we know from Bernoulli distributions
    • "The probability of observing the actual data we collected, conditioned on the value of the parameter theta."
    • Concept of likelihood implies thinking about this density function as a function of theta
    • The two functions look the same, whereas above is a function of y, given theta. Here the likelihood is a function of theta, given y. It's no longer a probability distribution, but it's still a function for theta.
    • To estimate theta, choose the theta that gives us the largest value of the likelihood. It makes the data the most likely to occur for the particular data we observed.
    • Since logarithm is a monotonic function, if we maximize logarithm of the function, we also maximize the original function
    • Can drop "condition on y" notation here
    • Here we take derivative and set = 0.
    • The hat implies parameter estimate
  • Approx Ci for 95%


Exponential Distribution

  • Suppose we have samples from an exponential distribution with parameter lambda:
    • , assuming i.i.d.
  • Recall that the density is the product of