- Python Web Scraping(Second Edition)
- Katharine Jarmul Richard Lawson
- 117字
- 2021-07-09 19:42:44
Crawling your first website
In order to scrape a website, we first need to download its web pages containing the data of interest, a process known as crawling. There are a number of approaches that can be used to crawl a website, and the appropriate choice will depend on the structure of the target website. This chapter will explore how to download web pages safely, and then introduce the following three common approaches to crawling a website:
- Crawling a sitemap
- Iterating each page using database IDs
- Following web page links
We have so far used the terms scraping and crawling interchangeably, but let's take a moment to define the similarities and differences in these two approaches.
推薦閱讀
- Embedded Linux Projects Using Yocto Project Cookbook
- JavaScript全程指南
- SEO智慧
- GameMaker Programming By Example
- Java EE 8 Application Development
- Integrating Facebook iOS SDK with Your Application
- Learning Probabilistic Graphical Models in R
- ServiceNow:Building Powerful Workflows
- 響應(yīng)式Web設(shè)計:HTML5和CSS3實戰(zhàn)(第2版)
- Web編程基礎(chǔ):HTML5、CSS3、JavaScript(第2版)
- Python編程快速上手2
- JSP應(yīng)用與開發(fā)技術(shù)(第3版)
- Flink原理深入與編程實戰(zhàn):Scala+Java(微課視頻版)
- 網(wǎng)絡(luò)工程方案設(shè)計與實施(第二版)
- Vue.js項目開發(fā)實戰(zhàn)