官术网_书友最值得收藏!

Moving towards a standard workflow

Estimators scikit-learn have two and predict(). We train the algorithm using the
predict() method on our testing set. We evaluate it using the predict() method on our testing set.

  1. First, we need to create these training and testing sets. As before, import and run the train_test_split function:
from sklearn.cross_validation import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=14)
  1. Then, we import the nearest neighbor class and create an instance for it. We leave the parameters as defaults for now and will test other values later in this chapter. By default, the algorithm will choose the five nearest neighbors to predict the class of a testing sample:

from sklearn.neighbors import KNeighborsClassifier estimator = KNeighborsClassifier()
  1. After creating our estimator, we must then fit it on our training dataset. For the nearest neighbor class, this training step simply records our dataset, allowing us to find the nearest neighbor for a new data point, by comparing that point to the training dataset:
estimator.fit(X_train, y_train)
  1. We then train the algorithm with our test set and evaluate with our testing set:
y_predicted = estimator.predict(X_test) 
accuracy = np.mean(y_test == y_predicted) * 100
print("The accuracy is {0:.1f}%".format(accuracy))

This model scores 86.4 percent accuracy, which is impressive for a default algorithm and just a few lines of code! Most scikit-learn default parameters are chosen deliberately to work well with a range of datasets. However, you should always aim to choose parameters based on knowledge of the application experiment. We will use strategies for doing this parameter search in later chapters.

主站蜘蛛池模板: 金湖县| 台东市| 杨浦区| 张北县| 兰州市| 汉川市| 当涂县| 河西区| 凤翔县| 垫江县| 建湖县| 松溪县| 贵州省| 胶州市| 白银市| 渑池县| 黔江区| 阿克苏市| 英山县| 略阳县| 时尚| 明溪县| 沂水县| 平远县| 宣威市| 滦平县| 福安市| 南昌县| 余干县| 桐柏县| 山阴县| 东至县| 多伦县| 瑞安市| 兴海县| 桃江县| 博野县| 彩票| 湘潭县| 镇宁| 治县。|