Ensemble Learning Methods

From Colettapedia
Jump to navigation Jump to search

Jay Hyer

3 sides to every story - your, mine and the truth.

math - arbitrarily many sides to story, then there's the truth

D - is all possible things - the story - the whole world of all observations D' - what we observed - obverved data f is target or ground truth function H is the hypothesis space, narrow this down to fit phenomena we want to predict, hopefully hypothesis space overlaps observed h sub svm , statistical model

trying to narrow down to single function

come to conclusion with model then pick the one we feel fits the training the data best

Boosting = weighted av, learning the weights BAgging = majority vote Stacking = meta learning

how does our model fit data that we find in the future?

ensemble learning take many different guesses - produce many models - increase coverage over hypothesis set

reduce variance by averaging

can't make chicken soup aout of chicken litter

perhaps your model is biased in a certain way

overkill analytics - without understanding the inherent biases of these models

secret to ensemble learning is diversity augment input data (boosting) resampling(bqagging) different input variables (random forests)

training error - "training error is for training wheels"

learning algorithm produces a learner.

"Weak learner" slightly better than random, simple to produce "strong learner" closely predicts target function, can be expensive ITO CPU, feature engineering

overfit - like adding more polynomials

take many weak learners and combine them to create a strong learner

BOOSTING. Ex: AdaBoost algorithm: (assume classification problem) 1. weight value for every observation, initial values are equal 2. exaggerating the outliers, bias the input originally developed to do binary classificationn decision tree - yes no - kaggle dataset - Titanic dataset

-the outlier is taking a lot of attention - boosting is sensitive to outliers -class imbalance

-overlearning - you teach to the test, not to the subject - the more complicated your model getsthe model fits the training data better, but the model fitting the test error gets parabolic weak learner can't overfit a dataset, it overfits a datapoint, exaggerating the outliers

BAGGING: bootstrapping and aggregating bootstrapping - take sample of observed data - certain observations are repeated, certain observations are left out - arbitrarily bootstrap samples then aggregate bootstrap samples


iterative:= (train algorithm on bootstrap sample) m times - then vote lot ove overlap between datasets unlike adaboost one outcome doesn't affect the next

out of bag error - what didn't get picked like wndchrm -n100

random forests = good for binary, multiclass, or even regression observation runs through the tree and ends at a leaf

STACKING - beyond averaging how do you combine the work of different teams working independings how do you combine different learning algorithms

1. set of base learners - dont change input data by resampling 2. hastie et al 2009 chapter 9 zhou 2012 chapter 2

OUTLIERS

how often did pairwise observation end up at the same node in the tree - a measure of proximity what's an outlier within my class - class imbalance

most datasets are imbalanced - have a very small population for a specific class, ~10% cost of a false negative can be very high my hypothesis set didn't overlap

an ensemble of models created from many downsampled models. 1. downsample = 100,000 observations, thro away most of that data and down to 200, 100 w/, 100 w/o 2. different observations and pair with the positive, use each sample to train a model, train models on balanced sets and combine them.

extreme imbalance -