Class 08 - 2016-10-31

General

Two characteristics to data

Measure of Central Tendency
Measure of dispersion with respect to 1 above.

Two types of error can arise:

Bias: error from sampling nonrandomly
Error: error from the chance that a sample (randomly selected) is not representative of the population

Central Limit Theorem: If the sample size is sufficiently large (over 30) then the mean of the x-bar has a sampling distribution that is approximately bell shaped, regardless of the shape of the distribution.
Standard error of the sampling distribution of x-bar is sigma/sqrt(n)

Chapter 5

Discrete distribution => values of x are countable

5.1: Probability Distribution for a discrete variable

A mutually exclusive list of all the possible numerical outcomes along with the probability occurrence of each outcome
Expected Value of a discrete Variable: The measure of central tendency
Variance of a discrete variable (lowercase sigma squared): multiply each possible squared difference by its corresponding probability
- std deviation of a discrete variable is the sqrt of variance

5.2: Covariance of a Probability Distribution and its application in Finance

Covariance of a probability distribution measures the strength of the relationship between 2 variables
Do 2 stocks move together or not? Inverse relationship means diversified portfolio.

5.3 Binomial distribution

pass

5.4 Poisson Distribution

Area of opportunity - the time or space interval
Characteristic lambda = mean or expected number of events per unit
- variance also = lambda, std dev
Examples:
- surface defects on a new refrigerator
- number of network failures in a day
- number of people arriving at a bank
- number of fleas on a dog
Use poisson distrib if these 4 cases hold:
- You are interested in counting the number of times a particular event occurs in a given area of opportunity (defined by time, length, surface area, etc)
- Equal probability distribution across all area of opportunity
- events are independent
- probability that two or more events will occur in an area of opportunity approaches zero as the area of opportunity gets smaller.

5.5 Hyper geometric

pass

Chapter 6: Normal Distribution and other distributions

Continuous distribution: values of x are not countable, but rather measurable
standard normal is z-score-ified
NORMSINV is the inverse of the CDF of the standard normal distribution
- scipy.stats.norm.ppf object
  - Uses mean=0 and stddev=1, which is the "standard" normal distribution.
  - Use a different mean and standard deviation by specifying the loc and scale arguments
  - scipy.stats.norm.cdf is inverse
  - The acronym ppf stands for percent point function, which is another name for the quantile function.

6.2 Normal Distribution

Total interquartile range is 4/3 standard deviations
Middle 50% contained within mu +/- 2/3 sigma
The range is equal to 6 standard deviations

6.3: Evaluating Normality

To determine whether a set of data can be approximated by the normal distribution, you compare the characteristics of the data with the theoretical properties of the normal distribution or construct a normal probability plot
For some variables, the descriptive characteristics of the data are inconsistent with the properties of the normal distribution
Stem-and-leaf display or boxplot for small datasets, histogram for large datasets.
Are the mean and median equal?
Is interquartile range approximately 1.33 times the standard deviation? Is range 6 times stdev?
Evaluate of the values are distributed. Do 2/3 of the values (68%) lie between mu +/- 1 sigma? Do 4/5 of the values lie between mu +/- 1.28? Do 19/20 of the values (95.5%) lie between mu +/- 2 sigma? 99.7% of values lie within 3 sigma.
Normal probability plot: Shows whether data is left/right skewed or normal
Quantile-quantile plot: Value on the y axis, z value on the x axis

Chapter 7: Sampling Distributions

Trying to make inferences that are based on statistics calculated from samples
Sample mean (statistic) estimates a population mean (parameter)
Sample proportion (statistic) estimates the population proportion (parameter)
Reach conclusion about the POPULATION not the sample
Sampling distribution: The Distribution of results if you actually selected all possible samples. The single result you obtain in practice is just one of the results in the sampling distribution.

7.2: Sampling distribution of the mean

The distribution of all possible sample means if you select all possible samples of a given size
The sample mean is unbiased estimator of the population mean because the mean of all the possible sample means (of given size n) is equal to the population mean
Standard error of the mean expresses how sample means vary from sample to sample.
As the sample size increases, the standard error of the mean decreases by a factor equal to the square root fo the sample size
Central Limit Theorem: Sample size of 30 prodces normal distribution of means no matter what the population distribution is.

Class 08 - 2016-10-31

Contents

General

Chapter 5

5.1: Probability Distribution for a discrete variable

5.2: Covariance of a Probability Distribution and its application in Finance

5.3 Binomial distribution

5.4 Poisson Distribution

5.5 Hyper geometric

Chapter 6: Normal Distribution and other distributions

6.2 Normal Distribution

6.3: Evaluating Normality

Chapter 7: Sampling Distributions

7.2: Sampling distribution of the mean

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools