官术网_书友最值得收藏!

Splitting the data into training and test sets

We learned in the previous chapter that it is essential to keep training and test data separate. We can easily split the data using one of scikit-learn's many helper functions:

In [11]: X_train, X_test, y_train, y_test = model_selection.train_test_split(
... data, target, test_size=0.1, random_state=42
... )

Here we want to split the data into 90 percent training data and 10 percent test data, which we specify with test_size=0.1. By inspecting the return arguments, we note that we ended up with exactly 90 training data points and 10 test data points:

In [12]: X_train.shape, y_train.shape
Out[12]: ((90, 4), (90,))
In [13]: X_test.shape, y_test.shape
Out[13]: ((10, 4), (10,))
主站蜘蛛池模板: 承德市| 泽州县| 玉环县| 施秉县| 阿瓦提县| 吉木萨尔县| 长岛县| 建湖县| 鹤壁市| 新野县| 宝清县| 霍林郭勒市| 定陶县| 云阳县| 莱西市| 开化县| 天水市| 普宁市| 慈利县| 满城县| 宜章县| 富平县| 上蔡县| 崇明县| 谢通门县| 穆棱市| 九龙坡区| 东乡| 东安县| 霸州市| 永安市| 诸城市| 友谊县| 乐清市| 马公市| 延寿县| 浙江省| 义乌市| 吉水县| 永济市| 依安县|