Maximum Likelihood Estimation
Jump to navigation
Jump to search
General
 Obtain an estimate for an unknown parameter theta using the data that we obtained from our sample.
 Choose a value of theta that maximizes the likelihood function of getting the data we observed.
 Maximum likelihood principle
 Likelihood function is viewed as a function of the parameters theta, and the data parameter y are considered fixed. Thus when we take derivative of the loglikelihood function, it is with respect to theta only.
 Joint probability mass function: If the observations are independent you can just multiply the PDFs of the individual observations.
 (General formulation)
 Sample mean is involved in MLE calculation for several models if data is IID
 MLE for Bernoulli(p), Poisson( lambda ), or Normal( mu, sigma^2 ), x bar is the MLE for p, lambda and mu
Bernoulli Distribution
 E.g., what is the estimate of mortality rate at a given hospital? Say each patient comes from a Bernoulli distribution
 , where theta is unknown parameter, therefore using greek letter
 for a single given person
 using vector form (using bold for vector notation)
 because they are independent
 using what we know from Bernoulli distributions
 "The probability of observing the actual data we collected, conditioned on the value of the parameter theta."
 Concept of likelihood implies thinking about this density function as a function of theta

 The two functions look the same, whereas above is a function of y, given theta. Here the likelihood is a function of theta, given y. It's no longer a probability distribution, but it's still a function for theta.

 To estimate theta, choose the theta that gives us the largest value of the likelihood. It makes the data the most likely to occur for the particular data we observed.

 Since logarithm is a monotonic function, if we maximize logarithm of the function, we also maximize the original function
 Can drop "condition on y" notation here

 Here we take derivative and set = 0.

 The hat implies parameter estimate
 Approx Ci for 95%

 Under certain regularity conditions, we can say that the MLE is approximately normally distributed with mean at the true value of theta, and variance of one over the Fisher Information evaluated at theta hat
 Fisher Information is a measure of how much information about theta is in each data point. It's a function of theta
 For Bernoulli random variable
 Information is larger when theta is near zero or one, and it's smallest when theta is near one half.
 "This makes sense, because if you're flipping a coin, and you're getting a mix of heads and tails, that tells you a little bit less than if you're getting nearly all heads or nearly all tails. That's a lot more informative about the value of theta. "
Exponential Distribution
 Suppose we have samples from an exponential distribution with parameter lambda:
 , assuming i.i.d.

 Step 1: state the density function

 Step 2: turn it into a (nonlog) likelihood function

 Take log likelihood and drop "conditioned on x" notation

 Take take derivative and set = 0

 MLE for lambda is 1 over sample average, which makes sense because the mean for an exponential distribution is 1 over lambda