- Feature Engineering Made Easy
- Sinan Ozdemir Divya Susarla
- 406字
- 2021-06-25 22:45:50
Supervised learning
Oftentimes, we hear about feature engineering in the specific context of supervised learning, otherwise known as predictive analytics. Supervised learning algorithms specifically deal with the task of predicting a value, usually one of the attributes of the data, using the other attributes of the data. Take, for example, the dataset representing the network intrusion:

This is the same dataset as before, but let's dissect it further in the context of predictive analytics.
Notice that we have four attributes of this dataset: DateTime, Protocol, Urgent, and Malicious. Suppose now that the malicious attribute contains values that represent whether or not the observation was a malicious intrusion attempt. So in our very small dataset of four network connections, the first, second, and fourth connection were malicious attempts to intrude a network.
Suppose further that given this dataset, our task is to be able to take in three of the attributes (datetime, protocol, and urgent) and be able to accurately predict the value of malicious. In laymen’s terms, we want a system that can map the values of datetime, protocol, and urgent to the values in malicious. This is exactly how a supervised learning problem is set up:
Network_features = pd.DataFrame({'datetime': ['6/2/2018', '6/2/2018', '6/2/2018', '6/3/2018'], 'protocol': ['tcp', 'http', 'http', 'http'], 'urgent': [False, True, True, False]})
Network_response = pd.Series([True, True, False, True])
Network_features
>>
datetime protocol urgent 0 6/2/2018 tcp False 1 6/2/2018 http True 2 6/2/2018 http True 3 6/3/2018 http False
Network_response
>>
0 True 1 True 2 False 3 True dtype: bool
When we are working with supervised learning, we generally call the attribute (usually only one of them, but that is not necessary) of the dataset that we are attempting to predict the response of. The remaining attributes of the dataset are then called the features.
Supervised learning can also be considered the class of algorithms attempting to exploit the structure in data. By this, we mean that the machine learning algorithms try to extract patterns in usually very nice and neat data. As discussed earlier, we should not always expect data to come in tidy; this is where feature engineering comes in.
But if we are not predicting something, what good is machine learning you may ask? I’m glad you did. Before machine learning can exploit the structure of data, sometimes we have to alter or even create structure. That’s where unsupervised learning becomes a valuable tool.
- 程序員修煉之道:從小工到專家
- 企業數字化創新引擎:企業級PaaS平臺HZERO
- 數據庫技術與應用教程(Access)
- Developing Mobile Games with Moai SDK
- Java Data Science Cookbook
- Visual Studio 2015 Cookbook(Second Edition)
- 數據庫開發實踐案例
- PySpark大數據分析與應用
- 數亦有道:Python數據科學指南
- Apache Kylin權威指南
- Chef Essentials
- 數字IC設計入門(微課視頻版)
- 大數據技術原理與應用:概念、存儲、處理、分析與應用
- Access數據庫開發從入門到精通
- SQL Server 2008寶典(第2版)