- Python Web Scraping Cookbook
- Michael Heydt
- 188字
- 2021-06-30 18:44:06
Storing data using AWS S3
There are many cases where we just want to save content that we scrape into a local copy for archive purposes, backup, or later bulk analysis. We also might want to save media from those sites for later use. I've built scrapers for advertisement compliance companies, where we would track and download advertisement based media on web sites to ensure proper usage, and also to store for later analysis, compliance and transcoding.
The storage required for these types of systems can be immense, but with the advent of cloud storage services such as AWS S3 (Simple Storage Service), this becomes much easier and more cost effective than managing a large SAN (Storage Area Network) in your own IT department. Plus, S3 can also automatically move data from hot to cold storage, and then to long-term storage, such as a glacier, which can save you much more money.
We won't get into all of those details, but simply look at storing our planets.html file into an S3 bucket. Once you can do this, you can save any content you want to year hearts desire.
- 物聯(lián)網(wǎng)工程規(guī)劃技術(shù)
- 物聯(lián)網(wǎng)安全:理論、實(shí)踐與創(chuàng)新
- 工業(yè)控制網(wǎng)絡(luò)安全技術(shù)與實(shí)踐
- 重新定義Spring Cloud實(shí)戰(zhàn)
- Building RESTful Web Services with Spring 5(Second Edition)
- 大話社交網(wǎng)絡(luò)
- 物聯(lián)網(wǎng)之霧:基于霧計(jì)算的智能硬件快速反應(yīng)與安全控制
- jQuery Mobile Web Development Essentials
- 局域網(wǎng)組成實(shí)踐
- 深入理解計(jì)算機(jī)網(wǎng)絡(luò)
- Guide to NoSQL with Azure Cosmos DB
- Hands-On Microservices:Monitoring and Testing
- Microsoft System Center 2012 Configuration Manager:Administration Cookbook
- CDN技術(shù)詳解
- Learn Node.js by Building 6 Projects.