Maximum Likelihood Estimation

From Colettapedia

Revision as of 17:59, 11 May 2020 by Colettace (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

General

Obtain an estimate for an unknown parameter theta using the data that we obtained from our sample.
Choose a value of theta that maximizes the likelihood of getting the data we observed.
Joint probability mass function: If the observations are independent you can just multiply the PDFs of the individual observations.
- $L(\theta )=\prod _{i=1}^{n}f(x_{i};\theta )$ (General formulation)

Bernoulli Distribution

E.g., what is the estimate of mortality rate at a given hospital? Say each patient comes from a Bernoulli distribution
$Y_{i}\sim B(\theta )$ , where theta is unknown parameter, therefore using greek letter
$P(Y_{i}=1)=\theta$ for a single given person
$P(\mathbf {Y} =\mathbf {y} |\theta )=P(Y_{1}=y_{1},Y_{2}=y_{2},...,Y_{n}=y_{n}|\theta )$ using vector form (using bold for vector notation)
$P(\mathbf {Y} =\mathbf {y} |\theta )=P(Y_{1}=y_{1})...P(Y_{n}=y_{n}|\theta )=\prod _{i=1}^{n}P(Y_{i}=y_{i}|\theta )$ because they are independent
$P(\mathbf {Y} =\mathbf {y} |\theta )=\prod _{i=1}^{n}\theta ^{y_{i}}(1-\theta )^{1-y_{i}}$ $P(\mathbf {Y} =\mathbf {y} |\theta )=\prod _{i=1}^{n}\theta ^{y_{i}}(1-\theta )^{1-y_{i}}$ using what we know from Bernoulli distributions
- "The probability of observing the actual data we collected, conditioned on the value of the parameter theta."
- Concept of likelihood implies thinking about this density function as a function of theta
$L(\theta |\mathbf {y} )=\prod _{i=1}^{n}\theta ^{y_{i}}(1-\theta )^{1-y_{i}}$ $L(\theta |\mathbf {y} )=\prod _{i=1}^{n}\theta ^{y_{i}}(1-\theta )^{1-y_{i}}$
- The two functions look the same, whereas above is a function of y, given theta. Here the likelihood is a function of theta, given y. It's no longer a probability distribution, but it's still a function for theta.
$MLE:{\hat {\theta }}={\textrm {argmax}}L(\theta |\mathbf {y} )$ $MLE:{\hat {\theta }}={\textrm {argmax}}L(\theta |\mathbf {y} )$
- To estimate theta, choose the theta that gives us the largest value of the likelihood. It makes the data the most likely to occur for the particular data we observed.
$l(\theta )={\textrm {log}}L(\theta |\mathbf {y} )$ $l(\theta )={\textrm {log}}L(\theta |\mathbf {y} )$
- Since logarithm is a monotonic function, if we maximize logarithm of the function, we also maximize the original function
- Can drop "condition on y" notation here
$l(\theta )={\textrm {log}}\left[\prod \theta ^{y_{i}}(1-\theta )^{1-y_{i}}\right]=\sum {\textrm {log}}\left[\theta ^{y_{i}}(1-\theta )^{1-y_{i}}\right]=\sum \left[y_{i}{\textrm {log}}\theta +(1-y_{i}){\textrm {log}}(1-\theta )\right]$
$l(\theta )=\left(\sum y_{i}\right){\textrm {log}}\theta +\left(\sum (1-y_{i})\right){\textrm {log}}(1-\theta )$
$l'(\theta )={\frac {1}{\theta }}\sum y_{i}-{\frac {1}{1-\theta }}\sum (1-y_{i})=0$ $l'(\theta )={\frac {1}{\theta }}\sum y_{i}-{\frac {1}{1-\theta }}\sum (1-y_{i})=0$
- Here we take derivative and set = 0.
$0={\frac {\sum y_{i}}{\hat {\theta }}}-{\frac {\sum (1-y_{i})}{1-{\hat {\theta }}}}$ $0={\frac {\sum y_{i}}{\hat {\theta }}}-{\frac {\sum (1-y_{i})}{1-{\hat {\theta }}}}$
- The hat implies parameter estimate
${\hat {\theta }}={\frac {}{n}}\sum y_{i}$
Approx Ci for 95% ${\hat {\theta }}\pm 1.96{\sqrt {\frac {{\hat {\theta }}(1-{\hat {\theta }})}{n}}}$

Exponential Distribution

Suppose we have samples from an exponential distribution with parameter lambda:
- $X_{i}\sim {\textrm {Exp}}(\lambda )$ , assuming i.i.d.
Recall that the density is the product of $f(\mathbf {x} |\lambda )=\prod _{i=1}^{n}\lambda e^{-\lambda x_{i}}=\lambda ^{n}e^{-\lambda \sum x_{i}}$
$L(\lambda |\mathbf {x} )=\lambda ^{n}e^{-\lambda \sum x_{i}}$

Retrieved from "https://chriscoletta.com/index.php?title=Maximum_Likelihood_Estimation&oldid=3700"