官术网_书友最值得收藏!

Introduction

A common practice in scraping is the download, storage, and further processing of media content (non-web pages or data files). This media can include images, audio, and video.  To store the content locally (or in a service like S3) and do it correctly, we need to know what the type of media is, and it's not enough to trust the file extension in the URL.  We will learn how to download and correctly represent the media type based on information from the web server.

Another common task is the generation of thumbnails of images, videos, or even a page of a website.  We will examine several techniques of how to generate thumbnails and make website page screenshots.  Many times these are used on a new website as thumbnail links to the scraped media that is now stored locally.

Finally, it is often the need to be able to transcode media, such as converting non-MP4 videos to MP4, or changing the bit-rate or resolution of a video.  Another scenario is to extract only the audio from a video file.  We won't look at video transcoding, but we will rip MP3 audio out of an MP4 file using ffmpeg.  It's a simple step from there to also transcode video with ffmpeg.

主站蜘蛛池模板: 平利县| 涟源市| 定边县| 连江县| 岳池县| 绿春县| 刚察县| 瑞金市| 江北区| 云安县| 太原市| 临邑县| 开封县| 德阳市| 黄骅市| 松原市| 绥滨县| 长沙市| 建宁县| 博客| 康保县| 裕民县| 腾冲县| 雅江县| 扬中市| 双流县| 新源县| 措勤县| 麻城市| 新余市| 吉木乃县| 隆子县| 仁寿县| 宣武区| 义乌市| 荥经县| 靖远县| 无锡市| 慈溪市| 宣城市| 沙洋县|