- Deep Learning By Example
- Ahmed Menshawy
- 344字
- 2021-06-24 18:52:43
Feature engineering
Feature engineering is one of the key components that contribute to the model's performance. A simple model with the right features can perform better than a complicated one with poor features. You can think of the feature engineering process as the most important step in determining your predictive model's success or failure. Feature engineering will be much easier if you understand the data.
Feature engineering is used extensively by anyone who uses machine learning to solve only one question, which is: how do you get the most out of your data samples for predictive modeling? This is the problem that the process and practice of feature engineering solves, and the success of your data science skills starts by knowing how to represent your data well.
Predictive modeling is a formula or rule that transforms a list of features or input variables (x1, x2,..., xn) into an output/target of interest (y). So, what is feature engineering? It's the process of creating new input variables or features (z1, z2, ..., zn) from existing input variables (x1, x2,..., xn). We don't just create any new features; the newly created features should contribute and be relevant to the model's output. Creating such features that will be relevant to the model's output will be an easy process with knowledge of the domain (such as marketing, medical, and so on). Even if machine learning practitioners interact with some domain experts during this process, the outcome of the feature engineering process will be much better.
An example where domain knowledge can be helpful is modeling the likelihood of rain, given a set of input variables/features (temperature, wind speed, and percentage of cloud cover). For this specific example, we can construct a new binary feature called overcast, where its value equals 1 or no whenever the percentage of cloud cover is less than 20%, and equals 0 or yes otherwise. In this example, domain knowledge was essential to specify the threshold or cut-off percentage. The more thoughtful and useful the inputs, the better the reliability and predictivity of your model.
- 后稀缺:自動化與未來工作
- Mastering Proxmox(Third Edition)
- 21天學(xué)通JavaScript
- Java實(shí)用組件集
- 商戰(zhàn)數(shù)據(jù)挖掘:你需要了解的數(shù)據(jù)科學(xué)與分析思維
- OpenStack Cloud Computing Cookbook(Second Edition)
- 中國戰(zhàn)略性新興產(chǎn)業(yè)研究與發(fā)展·智能制造裝備
- 大數(shù)據(jù)技術(shù)基礎(chǔ):基于Hadoop與Spark
- 精通數(shù)據(jù)科學(xué):從線性回歸到深度學(xué)習(xí)
- Linux Shell編程從初學(xué)到精通
- 學(xué)練一本通:51單片機(jī)應(yīng)用技術(shù)
- PostgreSQL 10 High Performance
- MySQL Management and Administration with Navicat
- 傳感器應(yīng)用技術(shù)
- Containerization with Ansible 2