Mombu the Microsoft Forum sponsored links

Go Back   Mombu the Microsoft Forum > Microsoft > structure chart of random forest
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
1 1st August 20:10
marc brugger
External User
 
Posts: 1
Default structure chart of random forest



Hello,

I'm a student at the University of Applied Sciences Solothurn
Northwestern Switzerland.

I have to do some Work on Leo Breimans "Random Forests". I only found
information about that algoritm on Breiman's Website
(http://stat-www.berkeley.edu/users/breiman/).

Can anybody provide me with a structure chart that describes how this
algoritm works?

Which ich the formule that he uses to calculate the Gini value?

Thank you very much.

Marc
  Reply With Quote


  sponsored links


2 6th August 18:52
aleks jakulin
External User
 
Posts: 1
Default structure chart of random forest



MAIN:
for each i in [1..N]:
1. create an independent bootstrap resample B of the training data
2. fit a TREE for the resample B
3. add the tree into the ensemble

TREE:
1. if the outcome can be predicted exactly RETURN the outcome;
otherwise:
2. create a random sample S of size m from the set of the input
variables
2. pick the best split-point among these S variables (using gini
index, entropy, etc.)
3. fit a TREE for each branch
4. return the statement "if condition then return T1(input) else
T2(input)"

CLASSIFICATION:
1. classify with each tree in the ensemble
2. output the majority vote (classification) or average (regression)

The idea is that the trees are unbiased and uncorrelated. The
unbiasedness is achieved by not pruning and by growing the trees to
the end. The uncorrelatedness is achieved by random resampling of data
and random resampling of subsets of variables.


Gini index roughly corresponds to entropy. If you have a set
of k outcomes with probabilities [p_1,p_2,...,p_k], the Gini
index is:

G(X) := 1 - Sum[p_i^2, {i,1,k}]

For tree-splitting applications, you're interested in conditional
probabilities of the outcome O given the splitting of the tree
S: [s_1,...,s_l]:

G(O|S) = Sum[ P(s_i) * G(O|s_i), {i,1,l}]

But there are several definitions. The one I give is sometimes
referred to as "weighted".

--
mag. Aleks Jakulin
http://www.ailab.si/aleks/
Artificial Intelligence Laboratory,
Faculty of Computer and Information Science,
University of Ljubljana, Slovenia.
  Reply With Quote
Reply


Thread Tools
Display Modes




Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666