- Python Web Scraping(Second Edition)
- Katharine Jarmul Richard Lawson
- 296字
- 2021-07-09 19:42:45
Scraping versus crawling
Depending on the information you are after and the site content and structure, you may need to either build a web scraper or a website crawler. What is the difference?
A web scraper is usually built to target a particular website or sites and to garner specific information on those sites. A web scraper is built to access these specific pages and will need to be modified if the site changes or if the information location on the site is changed. For example, you might want to build a web scraper to check the daily specials at your favorite local restaurant, and to do so you would scrape the part of their site where they regularly update that information.
In contrast, a web crawler is usually built in a generic way; targeting either websites from a series of top-level domains or for the entire web. Crawlers can be built to gather more specific information, but are usually used to crawl the web, picking up small and generic bits of information from many different sites or pages and following links to other pages.
In addition to crawlers and scrapers, we will also cover web spiders in Chapter 8, Scrapy. Spiders can be used for crawling a specific set of sites or for broader crawls across many sites or even the Internet.
Generally, we will use specific terms to reflect our use cases; as you develop your web scraping, you may notice distinctions in technologies, libraries, and packages you may want to use. In these cases, your knowledge of the differences in these terms will help you select an appropriate package or technology based on the terminology used (such as, is it only for scraping? Is it also for spiders?).
- Mastering NetBeans
- vSphere High Performance Cookbook
- Learning Bayesian Models with R
- C#程序設計(慕課版)
- Mastering LibGDX Game Development
- C語言程序設計實驗指導 (第2版)
- FPGA Verilog開發實戰指南:基于Intel Cyclone IV(進階篇)
- Processing創意編程指南
- H5+移動營銷設計寶典
- AutoCAD基礎教程
- Python計算機視覺與深度學習實戰
- Python數據可視化之matplotlib實踐
- 新手學ASP.NET 3.5網絡開發
- Puppet Essentials
- 移動智能系統測試原理與實踐