官术网_书友最值得收藏!

Chapter 1. Expanding Your Data Mining Toolbox

When faced with sensory information, human beings naturally want to find patterns to explain, differentiate, categorize, and predict. This process of looking for patterns all around us is a fundamental human activity, and the human brain is quite good at it. With this skill, our ancient ancestors became better at hunting, gathering, cooking, and organizing. It is no wonder that pattern recognition and pattern prediction were some of the first tasks humans set out to computerize, and this desire continues in earnest today. Depending on the goals of a given project, finding patterns in data using computers nowadays involves database systems, artificial intelligence, statistics, information retrieval, computer vision, and any number of other various subfields of computer science, information systems, mathematics, or business, just to name a few. No matter what we call this activity – knowledge discovery in databases, data mining, data science – its primary mission is always to find interesting patterns.

Despite this humble-sounding mission, data mining has existed for long enough and has built up enough variation in how it is implemented that it has now become a large and complicated field to master. We can think of a cooking school, where every beginner chef is first taught how to boil water and how to use a knife before moving to more advanced skills, such as making puff pastry or deboning a raw chicken. In data mining, we also have common techniques that even the newest data miners will learn: How to build a classifier and how to find clusters in data. The title of this book, however, is Mastering Data Mining with Python, and so, as a mastering-level book, the aim is to teach you some of the techniques you may not have seen in earlier data mining projects.

In this first chapter, we will cover the following topics:

  • What is data mining? We will situate data mining in the growing field of other similar concepts, and we will learn a bit about the history of how this discipline has grown and changed.
  • How do we do data mining? Here, we compare several processes or methodologies commonly used in data mining projects.
  • What are the techniques used in data mining? In this section, we will summarize each of the data analysis techniques that are typically included in a definition of data mining, and we will highlight the more exotic or underappreciated techniques that we will be covering in this mastering-level book.
  • How do we set up a data mining work environment? Finally, we will walk through setting up a Python-based development environment that we will use to complete the projects in the rest of this book.
主站蜘蛛池模板: 大关县| 彝良县| 延寿县| 梧州市| 吴川市| 大安市| 盐津县| 徐汇区| 嘉黎县| 新绛县| 萍乡市| 黔东| 社旗县| 大渡口区| 徐汇区| 嘉鱼县| 基隆市| 定兴县| 井陉县| 福州市| 原阳县| 铜梁县| 东兰县| 湘乡市| 普兰店市| 广德县| 永康市| 繁峙县| 大渡口区| 贵阳市| 界首市| 涿鹿县| 镇远县| 来凤县| 永年县| 三穗县| 通州市| 普兰县| 景德镇市| 临汾市| 龙海市|