Random Forests

(Louppe, 2015) is a PhD thesis by Gilles Louppe; implemented in scikit-learn

an in-depth analysis of random forests, consistently calling into question each and every part of the algorithm, in order to shed new light on its learning capabilities, inner workings and interpretability

we show that subsampling both samples and features simul- taneously provides on par performance while lowering at the same time the memory requirements. Overall this paradigm highlights an intriguing practical fact: there is often no need to build single mod- els over immensely large datasets. Good performance can often be achieved by building models on (very) small random parts of the data and then combining them all in an ensemble, thereby avoiding all practical burdens of making large data fit into memory.

Louppe, G. (2015). Understanding Random Forests: From Theory to Practice.

btbytes.com

Explorer

Random Forests

Graph View

Backlinks