Class 08 - 2016-10-31
Jump to navigation
Jump to search
Contents
General
- Two characteristics to data
- Measure of Central Tendency
- Measure of dispersion with respect to 1 above.
- Two types of error can arise:
- Bias: error from sampling nonrandomly
- Error: error from the chance that a sample (randomly selected) is not representative of the population
- Central Limit Theorem: If the sample size is sufficiently large (over 30) then the mean of the x-bar has a sampling distribution that is approximately bell shaped, regardless of the shape of the distribution.
- Standard error of the sampling distribution of x-bar is sigma/sqrt(n)
Chapter 5
- Discrete distribution => values of x are countable
5.1: Probability Distribution for a discrete variable
- A mutually exclusive list of all the possible numerical outcomes along with the probability occurrence of each outcome
- Expected Value of a discrete Variable: The measure of central tendency
- Variance of a discrete variable (lowercase sigma squared): multiply each possible squared difference by its corresponding probability
- std deviation of a discrete variable is the sqrt of variance
5.2: Covariance of a Probability Distribution and its application in Finance
- Covariance of a probability distribution measures the strength of the relationship between 2 variables
- Do 2 stocks move together or not? Inverse relationship means diversified portfolio.
5.3 Binomial distribution
- pass
5.4 Poisson Distribution
- Area of opportunity - the time or space interval
- Characteristic lambda = mean or expected number of events per unit
- variance also = lambda, std dev
- Examples:
- surface defects on a new refrigerator
- number of network failures in a day
- number of people arriving at a bank
- number of fleas on a dog
- Use poisson distrib if these 4 cases hold:
- You are interested in counting the number of times a particular event occurs in a given area of opportunity (defined by time, length, surface area, etc)
- Equal probability distribution across all area of opportunity
- events are independent
- probability that two or more events will occur in an area of opportunity approaches zero as the area of opportunity gets smaller.
5.5 Hyper geometric
- pass
Chapter 6: Normal Distribution and other distributions
- Continuous distribution: values of x are not countable, but rather measurable
- standard normal is z-score-ified
- NORMSINV is the inverse of the CDF of the standard normal distribution
- scipy.stats.norm.ppf object
- Uses mean=0 and stddev=1, which is the "standard" normal distribution.
- Use a different mean and standard deviation by specifying the loc and scale arguments
- scipy.stats.norm.cdf is inverse
- The acronym ppf stands for percent point function, which is another name for the quantile function.
- scipy.stats.norm.ppf object
6.2 Normal Distribution
- Total interquartile range is 4/3 standard deviations
- Middle 50% contained within mu +/- 2/3 sigma
- The range is equal to 6 standard deviations
6.3: Evaluating Normality
- To determine whether a set of data can be approximated by the normal distribution, you compare the characteristics of the data with the theoretical properties of the normal distribution or construct a normal probability plot
- For some variables, the descriptive characteristics of the data are inconsistent with the properties of the normal distribution
- Stem-and-leaf display or boxplot for small datasets, histogram for large datasets.
- Are the mean and median equal?
- Is interquartile range approximately 1.33 times the standard deviation? Is range 6 times stdev?
- Evaluate of the values are distributed. Do 2/3 of the values (68%) lie between mu +/- 1 sigma? Do 4/5 of the values lie between mu +/- 1.28? Do 19/20 of the values (95.5%) lie between mu +/- 2 sigma? 99.7% of values lie within 3 sigma.
- Normal probability plot: Shows whether data is left/right skewed or normal
- Quantile-quantile plot: Value on the y axis, z value on the x axis
Chapter 7: Sampling Distributions
- Trying to make inferences that are based on statistics calculated from samples
- Sample mean (statistic) estimates a population mean (parameter)
- Sample proportion (statistic) estimates the population proportion (parameter)
- Reach conclusion about the POPULATION not the sample
- Sampling distribution: The Distribution of results if you actually selected all possible samples. The single result you obtain in practice is just one of the results in the sampling distribution.
7.2: Sampling distribution of the mean
- The distribution of all possible sample means if you select all possible samples of a given size
- The sample mean is unbiased estimator of the population mean because the mean of all the possible sample means (of given size n) is equal to the population mean
- Standard error of the mean expresses how sample means vary from sample to sample.
- As the sample size increases, the standard error of the mean decreases by a factor equal to the square root fo the sample size
- Central Limit Theorem: Sample size of 30 prodces normal distribution of means no matter what the population distribution is.