舉報(bào)

會(huì)員
Go Web Scraping Quick Start Guide
Webscrapingistheprocessofextractinginformationfromthewebusingvarioustoolsthatperformscrapingandcrawling.Goisemergingasthelanguageofchoiceforscrapingusingavarietyoflibraries.Thisbookwillquicklyexplaintoyou,howtoscrapedatadatafromvariouswebsitesusingGolibrariessuchasCollyandGoquery.ThebookstartswithanintroductiontotheusecasesofbuildingawebscraperandthemainfeaturesoftheGoprogramminglanguage,alongwithsettingupaGoenvironment.ItthenmovesontoHTTPrequestsandresponsesandtalksabouthowGohandlesthem.Youwillalsolearnaboutanumberofbasicwebscrapingetiquettes.Youwillbetaughthowtonavigatethroughawebsite,usingabreadth-firstandthenadepth-firstsearch,aswellasfindandfollowlinks.Youwillgettoknowaboutthewaystotrackhistoryinordertoavoidloopsandtoprotectyourwebscraperusingproxies.FinallythebookwillcovertheGoconcurrencymodel,andhowtorunscrapersinparallel,alongwithlarge-scaledistributedwebscraping.
目錄(137章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Go Web Scraping Quick Start Guide
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Conventions used
- Get in touch
- Reviews
- Introducing Web Scraping and Go
- What is web scraping?
- Why do you need a web scraper?
- Search engines
- Price comparison
- Building datasets
- What is Go?
- Why is Go a good fit for web scraping?
- Go is fast
- Go is safe
- Go is simple
- How to set up a Go development environment
- Go language and tools
- Git
- Editor
- Summary
- The Request/Response Cycle
- What do HTTP requests look like?
- HTTP request methods
- HTTP headers
- Query parameters
- Request body
- What do HTTP responses look like?
- Status line
- Response headers
- Response body
- What are HTTP status codes?
- 100–199 range
- 200–299 range
- 300–399 range
- 400–499 range
- 500–599 range
- What do HTTP requests/responses look like in Go?
- A simple request example
- Summary
- Web Scraping Etiquette
- What is a robots.txt file?
- What is a User-Agent string?
- Example
- How to throttle your scraper
- How to use caching
- Cache-Control
- Expires
- Etag
- Caching content in Go
- Summary
- Parsing HTML
- What is the HTML format?
- Syntax
- Structure
- Searching using the strings package
- Example – Counting links
- Example – Doctype check
- Searching using the regexp package
- Example – Finding links
- Example – Finding prices
- Searching using XPath queries
- Example – Daily deals
- Example – Collecting products
- Searching using Cascading Style Sheets selectors
- Example – Daily deals
- Example – Collecting products
- Summary
- Web Scraping Navigation
- Following links
- Example – Daily deals
- Submitting forms
- Example – Submitting searches
- Example – POST method
- Avoiding loops
- Breadth-first versus depth-first crawling
- Depth-first
- Breadth-first
- Navigating with JavaScript
- Example – Book reviews
- Summary
- Protecting Your Web Scraper
- Virtual private servers
- Proxies
- Public and shared proxies
- Dedicated proxies
- Price
- Location
- Type
- Anonymity
- Proxies in Go
- Virtual private networks
- Boundaries
- Whitelists
- Blacklists
- Summary
- Scraping with Concurrency
- What is concurrency
- Concurrency pitfalls
- Race conditions
- Deadlocks
- The Go concurrency model
- Goroutines
- Channels
- sync package helpers
- Conditions
- Atomic counters
- Summary
- Scraping at 100x
- Components of a web scraping system
- Queue
- Cache
- Storage
- Logs
- Scraping HTML pages with colly
- Scraping JavaScript pages with chrome-protocol
- Example – Amazon Daily Deals
- Distributed scraping with dataflowkit
- The Fetch service
- The Parse service
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時(shí)間:2021-07-02 13:58:34
推薦閱讀
- 智慧城市:大數(shù)據(jù)、互聯(lián)網(wǎng)時(shí)代的城市治理(第4版)
- Oracle SOA Suite 11g Performance Tuning Cookbook
- Building RESTful Web Services with Spring 5(Second Edition)
- IPv6網(wǎng)絡(luò)切片:使能千行百業(yè)新體驗(yàn)
- 邁向自智網(wǎng)絡(luò)時(shí)代:IP自動(dòng)駕駛網(wǎng)絡(luò)
- 移動(dòng)物聯(lián)網(wǎng):商業(yè)模式+案例分析+應(yīng)用實(shí)戰(zhàn)
- LwIP應(yīng)用開發(fā)實(shí)戰(zhàn)指南:基于STM32
- 信息技術(shù)安全評估準(zhǔn)則:源流、方法與實(shí)踐
- Python Web Scraping Cookbook
- 智能物聯(lián)安防視頻技術(shù)基礎(chǔ)與應(yīng)用
- 一本書讀懂移動(dòng)物聯(lián)網(wǎng)
- LiveCode Mobile Development Beginner's Guide
- VMware vSphere 5.0虛擬化架構(gòu)實(shí)戰(zhàn)指南
- WLAN技術(shù)問答
- 5G與車聯(lián)網(wǎng):基于移動(dòng)通信的車聯(lián)網(wǎng)技術(shù)與智能網(wǎng)聯(lián)汽車
- Recurrent Neural Networks with Python Quick Start Guide
- Building Applications with Spring 5 and Kotlin
- 校園網(wǎng)絡(luò)規(guī)劃與架設(shè)
- Developing Web Applications with Oracle ADF Essentials
- Advanced Penetration Testing for Highly-Secured Environments:The Ultimate Security Guide
- Moodle for Mobile Learning
- Igor Pro實(shí)用教程:圖表繪制、數(shù)據(jù)分析與程序設(shè)計(jì)
- Angular 6 for Enterprise:Ready Web Applications
- Cisco IPSec VPN實(shí)戰(zhàn)指南
- 網(wǎng)絡(luò)協(xié)議本質(zhì)論
- Cisco局域網(wǎng)交換機(jī)配置手冊(第2版)
- 網(wǎng)絡(luò)治理:中國經(jīng)驗(yàn)和路徑選擇
- 網(wǎng)絡(luò)安全與攻防策略:現(xiàn)代威脅應(yīng)對之道(原書第2版)
- Microsoft Windows Server AppFabric Cookbook
- DEM插值算法適應(yīng)性理論與方法