官术网_书友最值得收藏!

How it works

XPath is a element of the XSLT (eXtensible Stylesheet Language Transformation) standard and provides the ability to select nodes in an XML document. HTML is a variant of XML, and hence XPath can work on on HTML document (although HTML can be improperly formed and mess up XPath parsing in those cases).

XPath itself is designed to model the structure of XML nodes, attributes, and properties. The syntax provides means of finding items in the XML that match the expression. This can include matching or logical comparison of any of the nodes, attributes, values, or text in the XML document.

XPath expressions can be combined to form very complex paths within the document. It is also possible to navigate the document based upon relative positions, which helps greatly in finding data based upon relative positions instead of absolute positions within the DOM.

Understanding XPath is essential for knowing how to parse HTML and perform web scraping. And as we will see, it underlies, and provides an implementation for, many of the higher level libraries such as lxml.

主站蜘蛛池模板: 斗六市| 金溪县| 巫山县| 石城县| 乌鲁木齐市| 南宁市| 昭觉县| 行唐县| 凉城县| 新邵县| 高陵县| 灌阳县| 襄樊市| 新宁县| 茂名市| 华亭县| 杂多县| 阿鲁科尔沁旗| 贵州省| 伊宁县| 霍城县| 巫山县| 潞城市| 宁武县| 于都县| 彩票| 康马县| 阳东县| 广昌县| 彭水| 定边县| 五指山市| 望城县| 根河市| 峨山| 大渡口区| 乐昌市| 唐海县| 南宁市| 专栏| 鹤山市|