- Machine Learning for OpenCV
- Michael Beyeler
- 248字
- 2021-07-02 19:47:26
Representing Data and Engineering Features
In the last chapter, we built our very first supervised learning models and applied them to some classic datasets, such as the Iris and the Boston datasets. However, in the real world, data rarely comes in a neat <n_samples x n_features> feature matrix that is part of a pre-packaged database. Instead, it is our own responsibility to find a way to represent the data in a meaningful way. The process of finding the best way to represent our data is known as feature engineering, and it is one of the main tasks of data scientists and machine learning practitioners trying to solve real-world problems.
I know you would rather jump right to the end and build the deepest neural network mankind has ever seen. But, trust me, this stuff is important! Representing our data in the right way can have a much greater influence on the performance of our supervised model than the exact parameters we choose. And we get to invent our own features, too.
In this chapter, we will therefore go over some common feature engineering tasks. Specifically, we want to answer the following questions:
- What are some common preprocessing techniques that everyone uses but nobody talks about?
- How do we represent categorical variables, such as the names of products, of colors, or of fruits?
- How would we even go about representing text?
- What is the best way to encode images, and what do SIFT and SURF stand for?
Let's start from the top.
- Expert C++
- 算法訓練營:入門篇(全彩版)
- JavaFX Essentials
- Kali Linux Wireless Penetration Testing Beginner's Guide(Third Edition)
- R大數據分析實用指南
- Python機器學習基礎教程
- Python深度學習:基于TensorFlow
- Scratch趣味編程:陪孩子像搭積木一樣學編程
- Programming Microsoft Dynamics? NAV 2015
- 從零開始學Selenium自動化測試:基于Python:視頻教學版
- 超簡單:Photoshop+JavaScript+Python智能修圖與圖像自動化處理
- 算法秘籍
- Python預測分析與機器學習
- Drupal 8 Development:Beginner's Guide(Second Edition)
- 零基礎C語言學習筆記