官术网_书友最值得收藏!

The pandas Series

The pandas Series is the base data structure of pandas. A series is similar to a NumPy array, but it differs by having an index, which allows for much richer lookup of items instead of just a zero-based array index value.

The following creates a series from a Python list.:

The output consists of two columns of information. The first is the index and the second is the data in the Series. Each row of the output represents the index label (in the first column) and then the value associated with that label.

Because this Series was created without specifying an index (something we will do next), pandas automatically creates an integer index with labels starting at 0 and increasing by one for each data item.

The values of a Series object can then be accessed by using the [] operator, passing the label for the value you require. The following gets the value for the label 1:

This looks very much like normal array access in many programming languages. But as we will see, the index does not have to start at 0, nor increment by one, and can be many other data types than just an integer. This ability to associate flexible indexes in this manner is one of the great superpowers of pandas.

Multiple items can be retrieved by specifying their labels in a Python list. The following retrieves the values at labels 1 and 3:

A Series object can be created with a user-defined index by using the index parameter and specifying the index labels. The following creates a Series with the same values but with an index consisting of string values:

Data in the Series object can now be accessed by those alphanumeric index labels. The following retrieves the values at index labels 'a' and 'd':

It is still possible to refer to the elements of this Series object by their numerical 0-based position. :

We can examine the index of a Series using the .index property:

The index is itself actually a pandas object, and this output shows us the values of the index and the data type used for the index. In this case, note that the type of the data in the index (referred to as the dtype) is object and not string. We will examine how to change this later in the book.

A common usage of a Series in pandas is to represent a time series that associates date/time index labels with values. The following demonstrates this by creating a date range using the pd.date_range() pandas function:

This has created a special index in pandas called DatetimeIndex, which is a specialized type of pandas index that is optimized to index data with dates and times.

Now let's create a Series using this index. The data values represent high temperatures on specific days:

This type of series with a DateTimeIndex is referred to as a time series.

We can look up a temperature on a specific data by using the date as a string:

Two Series objects can be applied to each other with an arithmetic operation. The following code creates a second Series and calculates the difference in temperature between the two:

The result of an arithmetic operation (+, -, /, *, ...) on two Series objects that are non-scalar values returns another Series object.

Since the index is not integer, we can also look up values by 0-based value:

Finally, pandas provides many descriptive statistical methods. As an example, the following returns the mean of the temperature differences:

主站蜘蛛池模板: 尚志市| 衡南县| 临洮县| 锡林浩特市| 柘荣县| 射洪县| 大宁县| 丹阳市| 西峡县| 察隅县| 抚顺县| 阳曲县| 延吉市| 平江县| 石门县| 芦溪县| 新巴尔虎左旗| 修文县| 长治市| 桑日县| 冕宁县| 拜泉县| 台江县| 咸丰县| 集安市| 白水县| 兖州市| 聂荣县| 高碑店市| 鄱阳县| 四平市| 阜新市| 漠河县| 新昌县| 榆中县| 日喀则市| 岱山县| 旬邑县| 阿坝县| 南昌县| 壤塘县|