- Hands-On Unsupervised Learning with Python
- Giuseppe Bonaccorso
- 549字
- 2021-07-02 12:32:00
Why do we need machine learning?
Data is everywhere. At this very moment, thousands of systems are collecting records that make up the history of specific services, together with logs, user interactions, and many other context-dependent elements. Only a decade ago, most companies couldn't even manage 1% of their data efficiently. For this reason, databases were periodically pruned and only important data used to be retained in permanent storage servers.
Conversely, nowadays almost every company can exploit cloud infrastructures that scale in order to cope with the increasing volume of incoming data. Tools such as Apache Hadoop or Apache Spark allow both data scientists and engineers to implement complex pipelines involving extremely large volumes of data. At this point, all the barriers have been torn down and a democratized process is in place. However, what is the actual value of these large datasets? From a business viewpoint, the information is valuable only when it can help make the right decisions, reducing uncertainty and providing better contextual insight. This means that, without the right tools and knowledge, a bunch of data is only a cost to the company that needs to be limited to increase the margins.
Machine learning is a large branch of computer science (in particular, artificial intelligence), which aims to implement descriptive and predictive models of reality by exploiting existing datasets. As this book is dedicated to practical unsupervised solutions, we are going to focus only on algorithms that describe the context by looking for hidden causes and relationships. However, even if only from a theoretical viewpoint, it's helpful to show the main differences between machine learning problems. Only complete awareness (not limited to mere technical aspects) of the goals can lead to a rational answer to the initial question, Why do we need machine learning?
We can start by saying that human beings have extraordinary cognitive abilities, which have inspired many systems, but they lack analytical skills when the number of elements increases significantly. For example, if you're a teacher who is meeting his/her class for the first time, you'll be able to compute a rough estimate of the percentage of female students after taking a glance at the entire group. Usually, the estimate is likely to be accurate and close to the actual count, even if the estimation is made by two or more individuals. However, if we repeat the experiment with the entire population of a school gathered in a courtyard, the distinction of gender will not be evident. This is because all students are clearly visible in the class; however, telling the sexes apart in the courtyard is limited by certain factors (for example, taller people can hide shorter ones). Getting rid of the analogy, we can say that a large amount of data usually carries a lot of information. In order to extract and categorize the information, it's necessary to take an automated approach.
Before moving to the next section, let's discuss the concepts of descriptive, diagnostic, predictive, and prescriptive analyses, originally defined by Gartner. However, in this case, we want to focus on a system (for example, a generic context) that we are analyzing in order to gain more and more control over its behavior.
The complete process is represented in the following diagram:

- Aftershot Pro:Non-destructive photo editing and management
- 深入理解Spring Cloud與實戰
- Python GUI Programming:A Complete Reference Guide
- 基于Proteus和Keil的C51程序設計項目教程(第2版):理論、仿真、實踐相融合
- 深入淺出SSD:固態存儲核心技術、原理與實戰
- 嵌入式技術基礎與實踐(第5版)
- 施耐德SoMachine控制器應用及編程指南
- Deep Learning with PyTorch
- 微軟互聯網信息服務(IIS)最佳實踐 (微軟技術開發者叢書)
- R Deep Learning Essentials
- Practical Machine Learning with R
- Machine Learning Solutions
- 面向對象分析與設計(第3版)(修訂版)
- 單片機開發與典型工程項目實例詳解
- LPC1100系列處理器原理及應用