TensorForest Estimator

TensorForest is a highly scalable implementation of random forests built by combining a variety of online HoeffdingTree algorithms with the extremely randomized approach.

Google published the details of the TensorForest implementation in the following paper: TensorForest: Scalable Random Forests on TensorFlow by Thomas Colthurst, D. Sculley, Gibert Hendry, Zack Nado, presented at Machine Learning Systems Workshop at the Conference on Neural Information Processing Systems ( NIPS) 2016. The paper is available at the following link: https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxtbHN5c25pcHMyMDE2fGd4OjFlNTRiOWU2OGM2YzA4MjE.

TensorForest estimators are used to implementing the following algorithm:

Initialize the variables and sets
    Tree = [root]
    Fertile = {root}
    Stats(root) = 0
    Splits[root] = []

Divide training data into batches.
For each batch of training data:
    Compute leaf assignment for each feature vector
    Update the leaf stats in Stats
    For each  in Fertile set:
        if |Splits| < max_splits
            then add the split on a randomly selected feature to Splits
        else if is fertile and |Splits| = max_splits
            then update the split stats for 
    Calculate the fertile leaves that are finished. 
    For every non-stale finished leaf:
        turn the leaf into an internal node with its best scoring split 
        remove the leaf from Fertile
        add the leaf's two children to Tree as leaves
    If |Fertile| < max_fertile
        Then add the max_fertile ? |Fertile| leaves with 
        the highest weighted leaf scores to Fertile and 
        initialize their Splits and split statistics. 
Until |Tree| = max_nodes or |Tree| stays the same for max_batches_to_grow batches

More details of this algorithm implementation can be found in the TensorForest paper.

官术网_书友最值得收藏!

TensorFlow Machine Learning Projects

TensorForest Estimator