官术网_书友最值得收藏!

Scraping and crawling

Scraping (or web scraping) is a technique to extract information from websites. When we do not have access to APIs, we can only retrieve visible information from HTML generated on a web page. In order to perform the task, we need a scraper that is able to extract information that we need and structure it in a predefined format. The next step is to build a crawler—a tool to follow links on a website and extract the information from all sub pages. When we decide to build a scraping strategy, we have to take into consideration the terms and conditions, as some websites do not allow scraping.

Python offers very useful tools to create scrapers and crawlers, such as beautifulsoup and scrapy.

pip3 install bs4, scrapy 
主站蜘蛛池模板: 监利县| 兴山县| 东乡县| 东阳市| 南通市| 平阳县| 台中县| 兴业县| 高州市| 肥西县| 田东县| 迁西县| 汾西县| 浙江省| 阆中市| 黄骅市| 沈丘县| 西盟| 准格尔旗| 吴江市| 洮南市| 台南县| 遂平县| 涿鹿县| 九龙城区| 乌鲁木齐市| 宿迁市| 柳河县| 柳江县| 于都县| 宁陕县| 六安市| 阜康市| 青铜峡市| 彭水| 易门县| 万山特区| 辽阳市| 汉中市| 琼中| 普安县|