Difference between revisions of "Maximum Likelihood Estimation"
Jump to navigation
Jump to search
Tags: Mobile edit Mobile web edit |
Tags: Mobile edit Mobile web edit |
||
Line 1: | Line 1: | ||
==General== | ==General== | ||
+ | |||
* Obtain an estimate for an unknown parameter theta using the data that we obtained from our sample. | * Obtain an estimate for an unknown parameter theta using the data that we obtained from our sample. | ||
* Choose a value of theta that maximizes the likelihood of getting the data we observed. | * Choose a value of theta that maximizes the likelihood of getting the data we observed. | ||
* Joint probability mass function: If the observations are independent you can just multiply the PDFs of the individual observations. | * Joint probability mass function: If the observations are independent you can just multiply the PDFs of the individual observations. | ||
** <math>L(\theta)=\prod_{i=1}^n f(x_i;\theta)</math> (General formulation) | ** <math>L(\theta)=\prod_{i=1}^n f(x_i;\theta)</math> (General formulation) | ||
+ | |||
==Bernoulli Distribution== | ==Bernoulli Distribution== | ||
− | * <math> | + | |
− | * | + | * E.g., what is the estimate of mortality rate at a given hospital? Say each patient comes from a Bernoulli distribution |
− | * <math> | + | * <math>Y_i \sim B( \theta )</math>, where theta is unknown parameter, therefore using greek letter |
− | * <math> | + | * <math>P( Y_i = 1 ) = \theta </math> for a single given person |
+ | *<math> P( \mathbf{Y} = \mathbf{y} | \theta ) = P( Y_1 = y_1, Y_2=y_2, ... , Y_n= y_n | \theta )</math> using vector form (using bold for vector notation) | ||
+ | * <math> P( \mathbf{Y} = \mathbf{y} | \theta ) = P( Y_1 = y_1) ... P(Y_n= y_n | \theta ) = \prod_{i=1}^n P( Y_i = y_i | \theta )</math> because they are independent | ||
+ | * <math> P( \mathbf{Y} = \mathbf{y} | \theta ) = \prod_{i=1}^n \theta^{y_i} (1-\theta)^{1-y_i}</math> using what we know from Bernoulli distributions | ||
+ | ** "The probability of observing the actual data we collected, conditioned on the value of the parameter theta." | ||
+ | ** Concept of likelihood implies thinking about this density function as a function of theta | ||
+ | * <math>L( \theta | \mathbf{ y } ) = \prod_{i=1}^n \theta^{y_i} (1-\theta)^{1-y_i}</math> | ||
+ | ** The two functions look the same, whereas above is a function of y, given theta. Here the likelihood is a function of theta, given y. It's no longer a probability distribution, but it's still a function for theta. | ||
+ | * <math>MLE: \hat{ \theta } = \textrm{argmax} L( \theta | \mathbf{ y } )</math> | ||
+ | ** To estimate theta, choose the theta that gives us the largest value of the likelihood. It makes the data the most likely to occur for the particular data we observed. | ||
+ | * <math> l ( \theta ) = \textrm{log} L ( \theta | \mathbf{y} )</math> | ||
+ | ** Since logarithm is a monotonic function, if we maximize logarithm of the function, we also maximize the original function | ||
+ | ** Can drop "condition on y" notation here | ||
+ | * <math> l ( \theta ) = \textrm{log} \left[ \prod \theta^{y_i} (1-\theta)^{1-y_i} \right] = \sum \textrm{log} \left[ \theta^{y_i} (1-\theta)^{1-y_i} \right] = \sum \left[ y_i \textrm{log} \theta + (1-y_i) \textrm{log} (1-\theta) \right] </math> | ||
+ | * <math> l ( \theta ) = \left( \sum y_i \right) \textrm{log} \theta + \left( \sum (1-y_i) \right) \textrm{log} (1-\theta)</math> | ||
+ | * <math> l '( \theta ) = \frac{1}{ \theta } \sum y_i - \frac{ 1 } {1- \theta } \sum (1-y_i) = 0</math> | ||
+ | ** Here we take derivative and set = 0. | ||
+ | * <math>0 = \frac{ \sum y_i }{ \hat{ \theta } } - \frac{ \sum (1-y_i) }{ 1- \hat{ \theta } }</math> | ||
+ | ** The hat implies parameter estimate | ||
+ | * <math>\hat{ \theta } = \frac{ }{ n } \sum y_i </math> | ||
+ | * Approx Ci for 95% <math>\hat{ \theta } \pm 1.96 \sqrt{ \frac{ \hat{ \theta } (1- \hat{ \theta } ) } {n} } </math> | ||
Line 15: | Line 37: | ||
* Suppose we have samples from an exponential distribution with parameter lambda: | * Suppose we have samples from an exponential distribution with parameter lambda: | ||
** <math>X_i \sim \textrm{Exp}( \lambda ) </math>, assuming i.i.d. | ** <math>X_i \sim \textrm{Exp}( \lambda ) </math>, assuming i.i.d. | ||
− | * Recall that the density is the product of <math>f( | + | * Recall that the density is the product of <math>f( \mathbf{x} | \lambda ) = \prod_{i=1}^n \lambda e^{- \lambda x_i } = \lambda^n e ^{-\lambda \sum x_i}</math> |
− | * <math>L( \lambda | | + | * <math>L( \lambda | \mathbf{x} ) = \lambda^n e ^{-\lambda \sum x_i}</math> |
Revision as of 17:59, 11 May 2020
General
- Obtain an estimate for an unknown parameter theta using the data that we obtained from our sample.
- Choose a value of theta that maximizes the likelihood of getting the data we observed.
- Joint probability mass function: If the observations are independent you can just multiply the PDFs of the individual observations.
- (General formulation)
Bernoulli Distribution
- E.g., what is the estimate of mortality rate at a given hospital? Say each patient comes from a Bernoulli distribution
- , where theta is unknown parameter, therefore using greek letter
- for a single given person
- using vector form (using bold for vector notation)
- because they are independent
- using what we know from Bernoulli distributions
- "The probability of observing the actual data we collected, conditioned on the value of the parameter theta."
- Concept of likelihood implies thinking about this density function as a function of theta
-
- The two functions look the same, whereas above is a function of y, given theta. Here the likelihood is a function of theta, given y. It's no longer a probability distribution, but it's still a function for theta.
-
- To estimate theta, choose the theta that gives us the largest value of the likelihood. It makes the data the most likely to occur for the particular data we observed.
-
- Since logarithm is a monotonic function, if we maximize logarithm of the function, we also maximize the original function
- Can drop "condition on y" notation here
-
- Here we take derivative and set = 0.
-
- The hat implies parameter estimate
- Approx Ci for 95%
Exponential Distribution
- Suppose we have samples from an exponential distribution with parameter lambda:
- , assuming i.i.d.
- Recall that the density is the product of