官术网_书友最值得收藏!

  • Python Web Scraping Cookbook
  • Michael Heydt
  • 235字
  • 2021-06-30 18:44:05

Working with CSV and JSON data

Extracting data from HTML pages is done using the techniques in the previous chapter, primarily using XPath through various tools and also with Beautiful Soup. While we will focus primarily on HTML, HTML is a variant of XML (eXtensible Markup Language).  XML one was the most popular for  of expressing data on the web, but other have become popular, and even exceeded XML in popularity. 

Two common formats that you will see are JSON (JavaScript Object Notation) and CSV (Comma Separated Values).  CSV is easy to create and a common form for many spreadsheet applications, so many web sites provide data in that for, or you will need to convert scraped data to that format for further storage or collaboration. JSON really has become the preferred format, due to its easy within programming languages such as JavaScript (and Python), and many database now support it as a native data format.

In this recipe let's examine converting scraped data to CSV and JSON, as well as writing the data to files and also reading those data files from remote servers. The tools we will examine are the Python CSV and JSON libraries. We will also examine using pandas for these techniques.


Also implicit in these examples is the conversion of XML data to CSV and JSON, so we won't have a dedicated section for those examples.
主站蜘蛛池模板: 道孚县| 临猗县| 沽源县| 秭归县| 通山县| 绥化市| 静安区| 永顺县| 长泰县| 怀来县| 济南市| 全州县| 张家港市| 富平县| 湘潭县| 襄汾县| 白玉县| 独山县| 辽中县| 和顺县| 临高县| 永嘉县| 阳东县| 从江县| 怀安县| 静安区| 黄骅市| 德清县| 文安县| 兰西县| 扎赉特旗| 安顺市| 福泉市| 昭平县| 时尚| 石景山区| 噶尔县| 夹江县| 彩票| 仁怀市| 清远市|