書(shū)名： Python Web Scraping Cookbook
作者名： Michael Heydt
本章字?jǐn)?shù)： 188字
更新時(shí)間： 2021-06-30 18:44:06

Storing data using AWS S3

There are many cases where we just want to save content that we scrape into a local copy for archive purposes, backup, or later bulk analysis. We also might want to save media from those sites for later use. I've built scrapers for advertisement compliance companies, where we would track and download advertisement based media on web sites to ensure proper usage, and also to store for later analysis, compliance and transcoding.

The storage required for these types of systems can be immense, but with the advent of cloud storage services such as AWS S3 (Simple Storage Service), this becomes much easier and more cost effective than managing a large SAN (Storage Area Network) in your own IT department. Plus, S3 can also automatically move data from hot to cold storage, and then to long-term storage, such as a glacier, which can save you much more money.

We won't get into all of those details, but simply look at storing our planets.html file into an S3 bucket. Once you can do this, you can save any content you want to year hearts desire.

官术网_书友最值得收藏!

Python Web Scraping Cookbook

Storing data using AWS S3