- Python Web Scraping Cookbook
- Michael Heydt
- 318字
- 2021-06-30 18:44:05
Getting ready
We will be using the planets data page and converting that data into CSV and JSON files. Let's start by loading the planets data from the page into a list of python dictionary objects. The following code (found in (03/get_planet_data.py) provides a function that performs this task, which will be reused throughout the chapter:
import requests
from bs4 import BeautifulSoup
def get_planet_data():
html = requests.get("http://localhost:8080/planets.html").text
soup = BeautifulSoup(html, "lxml")
planet_trs = soup.html.body.div.table.findAll("tr", {"class": "planet"})
def to_dict(tr):
tds = tr.findAll("td")
planet_data = dict()
planet_data['Name'] = tds[1].text.strip()
planet_data['Mass'] = tds[2].text.strip()
planet_data['Radius'] = tds[3].text.strip()
planet_data['Description'] = tds[4].text.strip()
planet_data['MoreInfo'] = tds[5].findAll("a")[0]["href"].strip()
return planet_data
planets = [to_dict(tr) for tr in planet_trs]
return planets
if __name__ == "__main__":
print(get_planet_data())
Running the script gives the following output (briefly truncated):
03 $python get_planet_data.py
[{'Name': 'Mercury', 'Mass': '0.330', 'Radius': '4879', 'Description': 'Named Mercurius by the Romans because it appears to move so swiftly.', 'MoreInfo': 'https://en.wikipedia.org/wiki/Mercury_(planet)'}, {'Name': 'Venus', 'Mass': '4.87', 'Radius': '12104', 'Description': 'Roman name for the goddess of love. This planet was considered to be the brightest and most beautiful planet or star in the\r\n heavens. Other civilizations have named it for their god or goddess of love/war.', 'MoreInfo': 'https://en.wikipedia.org/wiki/Venus'}, {'Name': 'Earth', 'Mass': '5.97', 'Radius': '12756', 'Description': "The name Earth comes from the Indo-European base 'er,'which produced the Germanic noun 'ertho,' and ultimately German 'erde,'\r\n Dutch 'aarde,' Scandinavian 'jord,' and English 'earth.' Related forms include Greek 'eraze,' meaning\r\n 'on the ground,' and Welsh 'erw,' meaning 'a piece of land.'", 'MoreInfo': 'https://en.wikipedia.org/wiki/Earth'}, {'Name': 'Mars', 'Mass': '0.642', 'Radius': '6792', 'Description': 'Named by the Romans for their god of war because of its red, bloodlike color. Other civilizations also named this planet\r\n from this attribute; for example, the Egyptians named it "Her Desher," meaning "the red one."', 'MoreInfo':
...
It may be required to install csv, json and pandas. You can do that with the following three commands:
pip install csv
pip install json
pip install pandas
推薦閱讀
- 數據通信網絡實踐:基礎知識與交換機技術
- 光網絡評估及案例分析
- 互聯網基礎資源技術與應用發展態勢(2021—2023)
- NB-IoT物聯網技術解析與案例詳解
- 智慧光網絡:關鍵技術、應用實踐和未來演進
- 2小時讀懂物聯網
- Kong網關:入門、實戰與進階
- 無線傳感器網絡定位技術
- Hands-On Bitcoin Programming with Python
- TCP/IP基礎(第2版)
- Enterprise ApplicationDevelopment with Ext JSand Spring
- 新媒體交互藝術
- 物聯網
- 物聯網:感知、傳輸與應用
- Learning IoT with Particle Photon and Electron