- Python Web Scraping Cookbook
- Michael Heydt
- 122字
- 2021-06-30 18:44:02
How to do it...
Now let's start playing with XPath and CSS selectors. The following selects all <tr> elements with a class equal to "planet":
In [2]: [(v, v.xpath("@name")) for v in tree.cssselect('tr.planet')]
Out[2]:
[(<Element tr at 0x10d3a2278>, ['Mercury']),
(<Element tr at 0x10c16ed18>, ['Venus']),
(<Element tr at 0x10e445688>, ['Earth']),
(<Element tr at 0x10e477228>, ['Mars']),
(<Element tr at 0x10e477408>, ['Jupiter']),
(<Element tr at 0x10e477458>, ['Saturn']),
(<Element tr at 0x10e4774a8>, ['Uranus']),
(<Element tr at 0x10e4774f8>, ['Neptune']),
(<Element tr at 0x10e477548>, ['Pluto'])]
Data for the Earth can be found in several ways. The following gets the row based on id:
In [3]: tr = tree.cssselect("tr#planet3")
...: tr[0], tr[0].xpath("./td[2]/text()")[0].strip()
...:
Out[3]: (<Element tr at 0x10e445688>, 'Earth')
The following uses an attribute with a specific value:
In [4]: tr = tree.cssselect("tr[name='Pluto']")
...: tr[0], tr[0].xpath("td[2]/text()")[0].strip()
...:
Out[5]: (<Element tr at 0x10e477548>, 'Pluto')
Note that unlike XPath, the @ symbol need not be used to specify an attribute.
推薦閱讀
- 物聯(lián)網(wǎng)標(biāo)準(zhǔn)化指南
- 自動(dòng)駕駛網(wǎng)絡(luò):自智時(shí)代的網(wǎng)絡(luò)架構(gòu)
- C++黑客編程揭秘與防范
- 物聯(lián)網(wǎng)安全(原書第2版)
- 面向物聯(lián)網(wǎng)的嵌入式系統(tǒng)開發(fā):基于CC2530和STM32微處理器
- 射頻通信系統(tǒng)
- SSL VPN : Understanding, evaluating and planning secure, web/based remote access
- 4G小基站系統(tǒng)原理、組網(wǎng)及應(yīng)用
- 網(wǎng)絡(luò)設(shè)計(jì)與應(yīng)用(第2版)
- 物聯(lián)網(wǎng)工程導(dǎo)論(第3版)
- Hands-On Bitcoin Programming with Python
- Professional Scala
- Python API Development Fundamentals
- 天下一家:網(wǎng)絡(luò)聯(lián)通世界(科學(xué)新導(dǎo)向叢書)
- Alfresco Share