官术网_书友最值得收藏!

Evaluating supervised learning algorithms

When performing predictive modeling, otherwise known as supervised learning, performance is directly tied to the model’s ability to exploit structure in the data and use that structure to make appropriate predictions. In general, we can further break down supervised learning into two more specific types, classification (predicting qualitative responses) and regression (predicting quantitative responses).

When we are evaluating classification problems, we will directly calculate the accuracy of a logistic regression model using a five-fold cross-validation:

# Example code for evaluating a classification problem
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='accuracy')
scores
>> [.765, .67, .8, .62, .99]

Similarly, when evaluating a regression problem, we will use the mean squared error (MSE) of a linear regression using a five-fold cross-validation:

# Example code for evaluating a regression problem
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
X = some_data_in_tabular_format
y = response_variable
lr = LinearRegression()
scores = cross_val_score(lr, X, y, cv=5, scoring='mean_squared_error')
scores
>> [31.543, 29.5433, 32.543, 32.43, 27.5432]

We will use these two linear models instead of newer, more advanced models for their speed and their low variance. This way, we can be surer that any increase in performance is directly related to the feature engineering procedure and not to the model’s ability to pick up on obscure and hidden patterns.

主站蜘蛛池模板: 卫辉市| 潜江市| 襄城县| 万宁市| 绍兴县| 咸宁市| 桐乡市| 巴中市| 泊头市| 沁源县| 东港市| 纳雍县| 晋州市| 静宁县| 乌苏市| 廊坊市| 通榆县| 平塘县| 麟游县| 方正县| 清镇市| 调兵山市| 如皋市| 嘉鱼县| 洮南市| 揭阳市| 雷波县| 彰化市| 遂川县| 白城市| 五指山市| 张家港市| 汉中市| 龙泉市| 龙门县| 咸丰县| 清镇市| 台江县| 信丰县| 攀枝花市| 高碑店市|