官术网_书友最值得收藏!

Decision trees

Decision trees are a class of supervised learning algorithms like a flow chart that consists of a sequence of nodes, where the values for a sample are used to make a decision on the next node to go to.  

The following example gives a very good idea of how decision trees are a class of supervised learning algorithms:

As with most classification algorithms, there are two stages to using them:

  • The first stage is the training stage, where a tree is built using training data. While the nearest neighbor algorithm from the previous chapter did not have a training phase, it is needed for decision trees. In this way, the nearest neighbor algorithm is a lazy learner, only doing any work when it needs to make a prediction. In contrast, decision trees, like most classification methods, are eager learners, undertaking work at the training stage and therefore needing to do less in the predicting stage.
  • The second stage is the predicting stage, where the trained tree is used to predict the classification of new samples. Using the previous example tree, a data point of ["is raining", "very windy"] would be classed as bad weather.

There are many algorithms for creating decision trees. Many of these algorithms are iterative. They start at the base node and decide the best feature to use for the first decision, then go to each node and choose the next best feature, and so on. This process is stopped at a certain point when it is decided that nothing more can be gained from extending the tree further.

The scikit-learn package implements the Classification and Regression Trees (CART) algorithm as its default dDecision tree class, which can use both categorical and continuous features.

主站蜘蛛池模板: 洞口县| 东阳市| 古田县| 铅山县| 青冈县| 和田县| 宜黄县| 北安市| 泾阳县| 宝山区| 祁阳县| 瑞丽市| 北京市| 抚顺市| 开原市| 三门县| 崇明县| 武清区| 宜州市| 南川市| 永胜县| 远安县| 漳浦县| 平湖市| 塔城市| 桐梓县| 平远县| 浙江省| 满洲里市| 固阳县| 太白县| 涪陵区| 莒南县| 镇宁| 鲁甸县| 普宁市| 车致| 沂水县| 奎屯市| 潞城市| 色达县|