官术网_书友最值得收藏!

Tree visualization

Let us take a look at the code to visualize a tree as follows:

In []: 
labels = df.label.astype('category').cat.categories 
labels = list(labels) 
labels 
Out[]: 
[u'platyhog', u'rabbosaurus']  

Define a variable to store all the names for the features:

In []: 
feature_names = map(lambda x: x.encode('utf-8'), features.columns.get_values()) 
feature_names 
Out[]: 
['length', 
 'fluffy', 
 'color_light black', 
 'color_pink gold', 
 'color_purple polka-dot', 
 'color_space gray'] 

Then, create the graph object using the export_graphviz function:

In []: 
import pydotplus  
dot_data = tree.export_graphviz(tree_model, out_file=None,  
                                feature_names=feature_names,   
                                 class_names=labels,   
                                 filled=True, rounded=True,   
                                 special_characters=True) 
dot_data 
Out[]: 
u'digraph Tree {nnode [shape=box, style="filled, rounded", color="black", fontname=helvetica] ;nedge [fontname=helvetica] ;n0 [label=<length &le; 26.6917<br/>entropy = 0.9971<br/>samples = 700<br/>value = [372, ... 
In []: 
graph = pydotplus.graph_from_dot_data(dot_data.encode('utf-8')) 
graph.write_png('tree1.png') 
Out[]: 
True 

Put a markdown to the next cell to see the newly-created file as follows:

![](tree1.png) 
Figure 2.5: Decision tree structure and a close-up of its fragment

The preceding diagram shows what our decision tree looks like. During the training, it grows upside-down. Data (features) travels through it from its root (top) to the leaves (bottom). To predict the label for a sample from our dataset using this classifier, we should start from the root, and move until we reach the leaf. In each node, one feature is compared to some value; for example, in the root node, the tree checks if the length is < 26.0261. If the condition is met, we move along the left branch; if not, along the right.

Let's look closer at a part of the tree. In addition to the condition in each node, we have some useful information:

  • Entropy value
  • Number of samples in the training set which supports this node
  • How many samples support each outcome
  • The most likely outcome at this stage
主站蜘蛛池模板: 报价| 新田县| 余庆县| 吉林市| 抚顺市| 平罗县| 育儿| 荃湾区| 衡南县| 峨眉山市| 玉山县| 马公市| 静宁县| 望都县| 香港| 奉节县| 启东市| 阿瓦提县| 蚌埠市| 丹寨县| 长宁县| 会昌县| 深泽县| 余江县| 托里县| 巧家县| 巴楚县| 法库县| 鹿邑县| 南充市| 白银市| 阜新市| 常山县| 盐亭县| 泰州市| 嘉义县| 伽师县| 苏州市| 徐水县| 六枝特区| 石泉县|