官术网_书友最值得收藏!

Getting and reading data

The first step is to retrieve a dataset and open it with a program capable of manipulating the data. The simplest way of retrieving a dataset is to find a data file. Python and R can be used to open, read, modify, and save data stored in static files. In Chapter 3, Reading, Exploring, and Modifying Data - Part I, I will introduce the JSON data format and show how to use Python to read, write and modify JSON data. In Chapter 4Reading, Exploring, and Modifying Data - Part II, I will walk through how to use Python to work with data files in the CSV and XML data formats. In Chapter 6, Cleaning Numerical Data - An Introduction to R and Rstudio, I will introduce R and Rstudio, and show how to use R to read and manipulate data. 

Larger data sources are often made available through web interfaces called application programming interfaces (APIs). APIs allow you to retrieve specific bits of data from a larger collection of data. Web APIs can be great resources for data that is otherwise hard to get. In Chapter 8, Getting Data from the Web, I discuss APIs in detail and walk through the use of Python to extract data from APIs.

Another possible source of data is a database. I won't go into detail on the use of databases in this book, though in Chapter 9, Working with Large Datasets, I will show how to interact with a particular database using Python.

Databases are collections of data that are organized to optimize the quick retrieval of data. They can be particularly useful when we need to work incrementally on very large datasets, and of course may be a source of data.
主站蜘蛛池模板: 舟曲县| 凌源市| 高碑店市| 天峨县| 永修县| 莱芜市| 云南省| 敦化市| 大宁县| 丽水市| 莱西市| 肇东市| 莆田市| 行唐县| 诏安县| 公主岭市| 荃湾区| 灌阳县| 江永县| 登封市| 夹江县| 西乡县| 讷河市| 邯郸市| 久治县| 江永县| 宁强县| 泾川县| 卢湾区| 来宾市| 沙田区| 海盐县| 利川市| 桃江县| 保亭| 禄劝| 丽水市| 含山县| 潮州市| 海城市| 扶余县|