官术网_书友最值得收藏!

Splitting the data into training and test sets

We learned in the previous chapter that it is essential to keep training and test data separate. We can easily split the data using one of scikit-learn's many helper functions:

In [11]: X_train, X_test, y_train, y_test = model_selection.train_test_split(
... data, target, test_size=0.1, random_state=42
... )

Here we want to split the data into 90 percent training data and 10 percent test data, which we specify with test_size=0.1. By inspecting the return arguments, we note that we ended up with exactly 90 training data points and 10 test data points:

In [12]: X_train.shape, y_train.shape
Out[12]: ((90, 4), (90,))
In [13]: X_test.shape, y_test.shape
Out[13]: ((10, 4), (10,))
主站蜘蛛池模板: 海林市| 施甸县| 宜兴市| 繁昌县| 鄂托克前旗| 杭州市| 屏东县| 革吉县| 定陶县| 山西省| 申扎县| 冷水江市| 奉新县| 博客| 铁力市| 临澧县| 都兰县| 桐城市| 新泰市| 玛沁县| 凤城市| 民县| 桂阳县| 安陆市| 临清市| 驻马店市| 曲靖市| 保亭| 安多县| 万州区| 金门县| 凤山县| 文昌市| 汝州市| 交城县| 怀宁县| 新宾| 阿拉尔市| 思南县| 武宣县| 扶余县|