- 精通Scrapy網(wǎng)絡(luò)爬蟲
- 劉碩
- 528字
- 2020-11-28 14:59:40
4.1 Item和Field
Scrapy提供了以下兩個(gè)類,用戶可以使用它們自定義數(shù)據(jù)類(如書籍信息),封裝爬取到的數(shù)據(jù):
● Item基類
自定義數(shù)據(jù)類(如BookItem)的基類。
● Field類
用來描述自定義數(shù)據(jù)類包含哪些字段(如name、price等)。
自定義一個(gè)數(shù)據(jù)類,只需繼承Item,并創(chuàng)建一系列Field對(duì)象的類屬性(類似于在Django中自定義Model)即可。以定義書籍信息BookItem為例,它包含兩個(gè)字段,分別為書的名字name和書的價(jià)格price,代碼如下:
>>> from scrapy import Item, Field >>> class BookItem(Item): ... name=Field() ... price=Field()
Item支持字典接口,因此BookItem在使用上和Python字典類似,可按以下方式創(chuàng)建BookItem對(duì)象:
>>> book1 = BookItem(name='Needful Things', price=45.0) >>> book1 {'name': 'Needful Things', 'price': 45.0} >>> book2 = BookItem() >>> book2 {} >>> book2['name'] = 'Life of Pi' >>> book2['price'] = 32.5 {'name': 'Life of Pi', 'price': 32.5}
對(duì)字段進(jìn)行賦值時(shí),BookItem內(nèi)部會(huì)對(duì)字段名進(jìn)行檢測(cè),如果賦值一個(gè)沒有定義的字段,就會(huì)拋出異常(防止因用戶粗心而導(dǎo)致錯(cuò)誤):
>>> book = BookItem() >>> book['name'] = 'Memoirs of a Geisha' >>>book['prize']=43.0 # 粗心,把price拼寫成了prize. Traceback (most recent call last): ... KeyError: 'BookItem does not support field: prize'
訪問BookItem對(duì)象中的字段與訪問字典類似,示例如下:
>>> book = BookItem(name='Needful Things', price=45.0) >>> book['name'] 'Needful Things' >>> book.get('price', 60.0) 45.0 >>> list(book.items()) [('price', 45.0), ('name', 'Needful Things')]
接下來,我們改寫第1章example項(xiàng)目中的代碼,使用Item和Field定義BookItem類,用其封裝爬取到的書籍信息項(xiàng)目目錄下的items.py文件供用戶實(shí)現(xiàn)各種自定義的數(shù)據(jù)類,在items.py中實(shí)現(xiàn)BookItem,代碼如下:
from scrapy import Item, Field class BookItem(Item): name = Field() price = Field()
修改之前的BooksSpider,使用BookItem替代Python字典,代碼如下:
from ..items import BookItem class BooksSpider(scrapy.Spider): ... def parse(self, response): for sel in response.css('article.product_pod'): book = BookItem() book['name'] = sel.xpath('./h3/a/@title').extract_first() book['price'] = sel.css('p.price_color::text').extract_first() yield book ...
推薦閱讀
- OpenStack Cloud Computing Cookbook(Third Edition)
- AngularJS入門與進(jìn)階
- 零起步玩轉(zhuǎn)掌控板與Mind+
- Learn to Create WordPress Themes by Building 5 Projects
- 深入理解Django:框架內(nèi)幕與實(shí)現(xiàn)原理
- Interactive Data Visualization with Python
- HBase從入門到實(shí)戰(zhàn)
- R語(yǔ)言編程指南
- Python機(jī)器學(xué)習(xí)經(jīng)典實(shí)例
- Visual C++數(shù)字圖像處理技術(shù)詳解
- RISC-V體系結(jié)構(gòu)編程與實(shí)踐(第2版)
- Mastering Data Mining with Python:Find patterns hidden in your data
- Python極簡(jiǎn)講義:一本書入門數(shù)據(jù)分析與機(jī)器學(xué)習(xí)
- 一本書講透Java線程:原理與實(shí)踐
- C語(yǔ)言從入門到精通