官术网_书友最值得收藏!

Reading the data – variations and examples

Before we delve deeper into the realm of data, let us familiarize ourselves with a few terms that will appear frequently from now on.

Data frames

A data frame is one of the most common data structures available in Python. Data frames are very similar to the tables in a spreadsheet or a SQL table. In Python vocabulary, it can also be thought of as a dictionary of series objects (in terms of structure). A data frame, like a spreadsheet, has index labels (analogous to rows) and column labels (analogous to columns). It is the most commonly used pandas object and is a 2D structure with columns of different or same types. Most of the standard operations, such as aggregation, filtering, pivoting, and so on which can be applied on a spreadsheet or the SQL table can be applied to data frames using methods in pandas.

The following screenshot is an illustrative picture of a data frame. We will learn more about working with them as we progress in the chapter:

Fig. 2.1 A data frame

Delimiters

A delimiter is a special character that separates various columns of a dataset from one another. The most common (one can go to the extent of saying that it is a default delimiter) delimiter is a comma (,). A .csv file is called so because it has comma separated values. However, a dataset can have any special character as its delimiter and one needs to know how to juggle and manage them in order to do an exhaustive and exploratory analysis and build a robust predictive model. Later in this chapter, we will learn how to do that.

主站蜘蛛池模板: 石渠县| 天全县| 额敏县| 淳化县| 额敏县| 宕昌县| 萍乡市| 双鸭山市| 塔河县| 喀什市| 安宁市| 汶川县| 墨玉县| 雅江县| 扶绥县| 江口县| 尼木县| 姚安县| 桐柏县| 临漳县| 两当县| 公安县| 乃东县| 绥阳县| 牡丹江市| 溆浦县| 曲靖市| 拉孜县| 大厂| 镶黄旗| 建阳市| 江津市| 荥阳市| 东阳市| 山阴县| 白河县| 桐乡市| 吉水县| 杂多县| 大英县| 澜沧|