Class 08 - 2016-10-31

From Colettapedia
Jump to navigation Jump to search

General

  • Two characteristics to data
  1. Measure of Central Tendency
  2. Measure of dispersion with respect to 1 above.
  • Two types of error can arise:
  1. Bias: error from sampling nonrandomly
  2. Error: error from the chance that a sample (randomly selected) is not representative of the population
  • Central Limit Theorem: If the sample size is sufficiently large (over 30) then the mean of the x-bar has a sampling distribution that is approximately bell shaped, regardless of the shape of the distribution.
  • Standard error of the sampling distribution of x-bar is sigma/sqrt(n)

Chapter 5

  • Discrete distribution => values of x are countable

5.1: Probability Distribution for a discrete variable

  • A mutually exclusive list of all the possible numerical outcomes along with the probability occurrence of each outcome
  • Expected Value of a discrete Variable: The measure of central tendency
  • Variance of a discrete variable (lowercase sigma squared): multiply each possible squared difference by its corresponding probability
    • std deviation of a discrete variable is the sqrt of variance

5.2: Covariance of a Probability Distribution and its application in Finance

  • Covariance of a probability distribution measures the strength of the relationship between 2 variables
  • Do 2 stocks move together or not? Inverse relationship means diversified portfolio.

5.3 Binomial distribution

  • pass

5.4 Poisson Distribution

  • Area of opportunity - the time or space interval
  • Characteristic lambda = mean or expected number of events per unit
    • variance also = lambda, std dev
  • Examples:
    • surface defects on a new refrigerator
    • number of network failures in a day
    • number of people arriving at a bank
    • number of fleas on a dog
  • Use poisson distrib if these 4 cases hold:
    • You are interested in counting the number of times a particular event occurs in a given area of opportunity (defined by time, length, surface area, etc)
    • Equal probability distribution across all area of opportunity
    • events are independent
    • probability that two or more events will occur in an area of opportunity approaches zero as the area of opportunity gets smaller.

5.5 Hyper geometric

  • pass


Chapter 6: Normal Distribution and other distributions

  • Continuous distribution: values of x are not countable, but rather measurable
  • standard normal is z-score-ified
  • NORMSINV is the inverse of the CDF of the standard normal distribution
    • scipy.stats.norm.ppf object
      • Uses mean=0 and stddev=1, which is the "standard" normal distribution.
      • Use a different mean and standard deviation by specifying the loc and scale arguments
      • scipy.stats.norm.cdf is inverse
      • The acronym ppf stands for percent point function, which is another name for the quantile function.

6.2 Normal Distribution

  • Total interquartile range is 4/3 standard deviations
  • Middle 50% contained within mu +/- 2/3 sigma
  • The range is equal to 6 standard deviations

6.3: Evaluating Normality

  • To determine whether a set of data can be approximated by the normal distribution, you compare the characteristics of the data with the theoretical properties of the normal distribution or construct a normal probability plot
  • For some variables, the descriptive characteristics of the data are inconsistent with the properties of the normal distribution
  • Stem-and-leaf display or boxplot for small datasets, histogram for large datasets.
  • Are the mean and median equal?
  • Is interquartile range approximately 1.33 times the standard deviation? Is range 6 times stdev?
  • Evaluate of the values are distributed. Do 2/3 of the values (68%) lie between mu +/- 1 sigma? Do 4/5 of the values lie between mu +/- 1.28? Do 19/20 of the values (95.5%) lie between mu +/- 2 sigma? 99.7% of values lie within 3 sigma.
  • Normal probability plot: Shows whether data is left/right skewed or normal
  • Quantile-quantile plot: Value on the y axis, z value on the x axis

Chapter 7: Sampling Distributions

  • Trying to make inferences that are based on statistics calculated from samples
  • Sample mean (statistic) estimates a population mean (parameter)
  • Sample proportion (statistic) estimates the population proportion (parameter)
  • Reach conclusion about the POPULATION not the sample
  • Sampling distribution: The Distribution of results if you actually selected all possible samples. The single result you obtain in practice is just one of the results in the sampling distribution.

7.2: Sampling distribution of the mean

  • The distribution of all possible sample means if you select all possible samples of a given size
  • The sample mean is unbiased estimator of the population mean because the mean of all the possible sample means (of given size n) is equal to the population mean
  • Standard error of the mean expresses how sample means vary from sample to sample.
  • As the sample size increases, the standard error of the mean decreases by a factor equal to the square root fo the sample size
  • Central Limit Theorem: Sample size of 30 prodces normal distribution of means no matter what the population distribution is.