官术网_书友最值得收藏!

書名： Machine Learning for Cybersecurity Cookbook
作者名： Emmanuel Tsukerman
本章字數(shù)： 170字
更新時間： 2021-06-24 12:28:55

How to do it...

The following steps demonstrate how to take a dataset, consisting of features X and labels y, and split these into a training and testing subset:

Start by importing the train_test_split module and the pandas library, and read your features into X and labels into y:

from sklearn.model_selection import train_test_split
import pandas as pd

df = pd.read_csv("north_korea_missile_test_database.csv")
y = df["Missile Name"]
X = df.drop("Missile Name", axis=1)

Next, randomly split the dataset and its labels into a training set consisting 80% of the size of the original dataset and a testing set 20% of the size:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=31
)

We apply the train_test_split method once more, to obtain a validation set, X_val and y_val:

X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.25, random_state=31
)

We end up with a training set that's 60% of the size of the original data, a validation set of 20%, and a testing set of 20%.

The following screenshot shows the output:

主站蜘蛛池模板：平安县| 张家港市| 西乌珠穆沁旗| 杨浦区| 依安县| 科尔| 宜州市| 克山县| 乡城县| 宁南县| 灵丘县| 乌鲁木齐市| 韶山市| 大同县| 临沂市| 英吉沙县| 津市市| 夏河县| 邵阳县| 庆安县| 临沧市| 临城县| 苏尼特左旗| 鄂伦春自治旗| 岳阳县| 嘉鱼县| 波密县| 碌曲县| 怀柔区| 凤城市| 巴彦县| 淮南市| 南陵县| 通城县| 铜山县| 奈曼旗| 清丰县| 龙山县| 曲阜市| 林芝县| 彝良县|