官术网_书友最值得收藏!

Supervised learning

Oftentimes, we hear about feature engineering in the specific context of supervised learning, otherwise known as predictive analytics. Supervised learning algorithms specifically deal with the task of predicting a value, usually one of the attributes of the data, using the other attributes of the data. Take, for example, the dataset representing the network intrusion:

This is the same dataset as before, but let's dissect it further in the context of predictive analytics.

Notice that we have four attributes of this dataset: DateTime, Protocol, Urgent, and Malicious. Suppose now that the malicious attribute contains values that represent whether or not the observation was a malicious intrusion attempt. So in our very small dataset of four network connections, the first, second, and fourth connection were malicious attempts to intrude a network.

Suppose further that given this dataset, our task is to be able to take in three of the attributes (datetime, protocol, and urgent) and be able to accurately predict the value of malicious. In laymen’s terms, we want a system that can map the values of datetime, protocol, and urgent to the values in malicious. This is exactly how a supervised learning problem is set up:

Network_features = pd.DataFrame({'datetime': ['6/2/2018', '6/2/2018', '6/2/2018', '6/3/2018'], 'protocol': ['tcp', 'http', 'http', 'http'], 'urgent': [False, True, True, False]})
Network_response = pd.Series([True, True, False, True])
Network_features
>>
datetime protocol urgent 0 6/2/2018 tcp False 1 6/2/2018 http True 2 6/2/2018 http True 3 6/3/2018 http False
Network_response
>>
0 True 1 True 2 False 3 True dtype: bool

When we are working with supervised learning, we generally call the attribute (usually only one of them, but that is not necessary) of the dataset that we are attempting to predict the response of. The remaining attributes of the dataset are then called the features.

Supervised learning can also be considered the class of algorithms attempting to exploit the structure in data. By this, we mean that the machine learning algorithms try to extract patterns in usually very nice and neat data. As discussed earlier, we should not always expect data to come in tidy; this is where feature engineering comes in.

But if we are not predicting something, what good is machine learning you may ask? I’m glad you did. Before machine learning can exploit the structure of data, sometimes we have to alter or even create structure. That’s where unsupervised learning becomes a valuable tool.

主站蜘蛛池模板: 白玉县| 五台县| 北碚区| 乾安县| 土默特左旗| 蓬安县| 洮南市| 潞城市| 北流市| 谢通门县| 阿拉尔市| 盐亭县| 阿荣旗| 宝兴县| 嵊泗县| 英吉沙县| 东海县| 永州市| 云南省| 三原县| 潞西市| 镇原县| 库尔勒市| 阳高县| 台中市| 东乡族自治县| 新田县| 无锡市| 阳信县| 台东市| 辽阳市| 邵武市| 潜江市| 西吉县| 河北省| 正安县| 徐水县| 马鞍山市| 兴文县| 得荣县| 柳州市|