- Hands-On Neural Networks
- Leonardo De Marchi Laura Mitchell
- 411字
- 2021-06-24 14:00:12
Evaluating the model
To evaluate an algorithm, it's necessary to judge the performance of the algorithm on data that was not used to train the model. For this reason, it's common to split the data in the training and test set. The training set is used to train the model, which means that it's used to find the parameters of our algorithm. For example, training a decision tree will determine the values and variables that will create the split of the branches of the tree. The test set must remain totally hidden from the training. That means that all operations such as features engineering or feature scaling must be trained on the training set only and applied to the test set, as in the following example.
Usually, the training set will be 70-80% of the dataset, while the test set will be the rest:
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.linear_model import LinearRegression
from sklearn import datasets
# import some data
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.3, random_state=0)
scaler = preprocessing.StandardScaler().fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_test_transformed = scaler.transform(X_train)
clf = LinearRegression().fit(X_train_transformed, y_train)
predictions = clf.predict(X_test_transformed)
print('Predictions: ', predictions)
The most common way to evaluate a supervised learning algorithm offline is cross-validation. This technique consists of dividing the dataset into test and training multiple times and use one part for training and one for testing.
This allows to not only check for overfitting but also to evaluate the variance in our loss
For problems where it's not possible to randomly divide the data, such as in a time series, scikit-learn has other splitting methods, such as the TimeSeriesSplit
class.
In Keras, it's possible to specify a simple way to split in train/test directly during fit:
hist = model.fit(x, y, validation_split=0.2)
If the data does not fit in memory, it's also possible to use train_on_batch and test_on_batch.
For image data, in Keras, it is also possible to use the folder structure to create train and test and specify the labels. To accomplish this, it is important to use the flow_from_directory function, which will load the data with the labels and train/test split as specified. We will need to have the following directory structure:
data/
train/
category1/
001.jpg
002.jpg
...
category2/
003.jpg
004.jpg
...
validation/
category1/
0011.jpg
0022.jpg
...
category2/
0033.jpg
0044.jpg
...
Use the following function:
flow_from_directory(directory, target_size=(96, 96), color_mode='rgb', classes=None, class_mode='categorical', batch_size=128, shuffle=True, seed=11, save_to_dir=None, save_prefix='output', save_format='jpg', follow_links=False, subset=None, interpolation='nearest')
- 計算機應用
- Dreamweaver CS3網頁制作融會貫通
- UTM(統(tǒng)一威脅管理)技術概論
- JMAG電機電磁仿真分析與實例解析
- 最簡數據挖掘
- 嵌入式操作系統(tǒng)
- Godot Engine Game Development Projects
- Excel 2010函數與公式速查手冊
- Mastering Geospatial Analysis with Python
- Learn QGIS
- Hands-On Business Intelligence with Qlik Sense
- 智能小車機器人制作大全(第2版)
- JSP網絡開發(fā)入門與實踐
- Cisco UCS Cookbook
- 大數據:從基礎理論到最佳實踐