官术网_书友最值得收藏!

Scraping versus crawling

Depending on the information you are after and the site content and structure, you may need to either build a web scraper or a website crawler. What is the difference?

A web scraper is usually built to target a particular website or sites and to garner specific information on those sites. A web scraper is built to access these specific pages and will need to be modified if the site changes or if the information location on the site is changed. For example, you might want to build a web scraper to check the daily specials at your favorite local restaurant, and to do so you would scrape the part of their site where they regularly update that information. 

In contrast, a web crawler is usually built in a generic way; targeting either websites from a series of top-level domains or for the entire web. Crawlers can be built to gather more specific information, but are usually used to crawl the web, picking up small and generic bits of information from many different sites or pages and following links to other pages.

In addition to crawlers and scrapers, we will also cover web spiders in Chapter 8Scrapy. Spiders can be used for crawling a specific set of sites or for broader crawls across many sites or even the Internet.

Generally, we will use specific terms to reflect our use cases; as you develop your web scraping, you may notice distinctions in technologies, libraries, and packages you may want to use. In these cases, your knowledge of the differences in these terms will help you select an appropriate package or technology based on the terminology used (such as, is it only for scraping? Is it also for spiders?).

主站蜘蛛池模板: 平利县| 砀山县| 长治市| 日土县| 武川县| 安泽县| 西充县| 娱乐| 宾阳县| 眉山市| 华容县| 蚌埠市| 格尔木市| 宜都市| 青海省| 曲松县| 苏尼特左旗| 余庆县| 肥乡县| 股票| 安新县| 策勒县| 历史| 库伦旗| 和田市| 济源市| 涿鹿县| 桑植县| 鸡西市| 阿克| 连山| 开江县| 黔西| 彭山县| 杨浦区| 汉阴县| 当涂县| 六枝特区| 巨野县| 观塘区| 平武县|