官术网_书友最值得收藏!

Introduction

The key aspects for effective scraping are understanding how content and data are stored on web servers, identifying the data you want to retrieve, and understanding how the tools support this extraction. In this chapter, we will discuss website structures and the DOM, introduce techniques to parse, and query websites with lxml, XPath, and CSS. We will also look at how to work with websites developed in other languages and different encoding types such as Unicode.

Ultimately, understanding how to find and extract data within an HTML document comes down to understanding the structure of the HTML page, its representation in the DOM, the process of querying the DOM for specific elements, and how to specify which elements you want to retrieve based upon how the data is represented.

主站蜘蛛池模板: 七台河市| 航空| 绥棱县| 甘肃省| 河南省| 镶黄旗| 静安区| 旬邑县| 孝昌县| 资中县| 万年县| 灵丘县| 安塞县| 阿拉尔市| 太湖县| 高雄市| 勃利县| 茂名市| 马边| 舒城县| 大余县| 库伦旗| 民勤县| 集安市| 醴陵市| 龙州县| 麟游县| 滦平县| 昌平区| 体育| 金昌市| 延长县| 北海市| 武宣县| 苍梧县| 晋中市| 闵行区| 岢岚县| 汝城县| 建瓯市| 巩义市|