官术网_书友最值得收藏!

Preface

The internet contains a wealth of data. This data is both provided through structured APIs as well as by content delivered directly through websites. While the data in APIs is highly structured, information found in web pages is often unstructured and requires collection, extraction, and processing to be of value. And collecting data is just the start of the journey, as that data must also be stored, mined, and then exposed to others in a value-added form.

With this book, you will learn many of the core tasks needed in collecting various forms of information from websites. We will cover how to collect it, how to perform several common data operations (including storage in local and remote databases), how to perform common media-based tasks such as converting images an videos to thumbnails, how to clean unstructured data with NTLK, how to examine several data mining and visualization tools, and finally core skills in building a microservices-based scraper and API that can, and will, be run on the cloud.

Through a recipe-based approach, we will learn independent techniques to solve specific tasks involved in not only scraping but also data manipulation and management, data mining, visualization, microservices, containers, and cloud operations. These recipes will build skills in a progressive and holistic manner, not only teaching how to perform the fundamentals of scraping but also taking you from the results of scraping to a service offered to others through the cloud. We will be building an actual web-scraper-as-a-service using common tools in the Python, container, and cloud ecosystems.

主站蜘蛛池模板: 茂名市| 延寿县| 平谷区| 西宁市| 福泉市| 永清县| 罗定市| 陵水| 文成县| 潢川县| 墨竹工卡县| 屏东县| 上虞市| 德保县| 资阳市| 塘沽区| 怀宁县| 惠州市| 英山县| 湖南省| 平阴县| 乐亭县| 富源县| 宁海县| 五大连池市| 乌海市| 长垣县| 庆云县| 临夏市| 侯马市| 洛宁县| 敦煌市| 怀仁县| 永春县| 夏邑县| 深圳市| 化隆| 公主岭市| 同江市| 容城县| 得荣县|