Random Forests
not to get lost in
Created:
Louppe, Gilles. “Understanding random forests: From theory to practice,” 2015. http://arxiv.org/abs/1407.7502. is a PhD thesis by Gilles Louppe; implemented in scikit-learn
an in-depth analysis of random forests, consistently calling into question each and every part of the algorithm, in order to shed new light on its learning capabilities, inner workings and interpretability
we show that subsampling both samples and features simul- taneously provides on par performance while lowering at the same time the memory requirements. Overall this paradigm highlights an intriguing practical fact: there is often no need to build single mod- els over immensely large datasets. Good performance can often be achieved by building models on (very) small random parts of the data and then combining them all in an ensemble, thereby avoiding all practical burdens of making large data fit into memory.