- Python Web Scraping Cookbook
- Michael Heydt
- 130字
- 2021-06-30 18:43:59
Introduction
The key aspects for effective scraping are understanding how content and data are stored on web servers, identifying the data you want to retrieve, and understanding how the tools support this extraction. In this chapter, we will discuss website structures and the DOM, introduce techniques to parse, and query websites with lxml, XPath, and CSS. We will also look at how to work with websites developed in other languages and different encoding types such as Unicode.
Ultimately, understanding how to find and extract data within an HTML document comes down to understanding the structure of the HTML page, its representation in the DOM, the process of querying the DOM for specific elements, and how to specify which elements you want to retrieve based upon how the data is represented.
- 面向云平臺(tái)的物聯(lián)網(wǎng)多源異構(gòu)信息融合方法
- 網(wǎng)絡(luò)安全技術(shù)與解決方案(修訂版)
- Practical Web Design
- 世界互聯(lián)網(wǎng)發(fā)展報(bào)告·2019
- Unity Artificial Intelligence Programming
- 夢(mèng)工廠之材質(zhì)N次方:Maya材質(zhì)手冊(cè)
- Microsoft Power Platform Enterprise Architecture
- 5G技術(shù)核心與增強(qiáng):從R15到R17
- 小型局域網(wǎng)組建
- 物聯(lián)網(wǎng)M2M開發(fā)技術(shù):基于無線CPU-Q26XX
- React Design Patterns and Best Practices(Second Edition)
- Hands-On Reactive Programming in Spring 5
- Building Microservices with Spring
- 邊緣計(jì)算及其資源管理技術(shù)
- 計(jì)算機(jī)聯(lián)鎖及信號(hào)微機(jī)監(jiān)測(cè)系統(tǒng)