- Python:Data Analytics and Visualization
- Phuong Vo.T.H Martin Czygan Ashish Kumar Kirthi Raman
- 458字
- 2021-07-09 18:51:39
Working with missing data
In this section, we will discuss missing, NaN
, or null
values, in Pandas data structures. It is a very common situation to arrive with missing data in an object. One such case that creates missing data is reindexing:
>>> df8 = pd.DataFrame(np.arange(12).reshape(4,3), columns=['a', 'b', 'c']) a b c 0 0 1 2 1 3 4 5 2 6 7 8 3 9 10 11 >>> df9 = df8.reindex(columns = ['a', 'b', 'c', 'd']) a b c d 0 0 1 2 NaN 1 3 4 5 NaN 2 6 7 8 NaN 4 9 10 11 NaN >>> df10 = df8.reindex([3, 2, 'a', 0]) a b c 3 9 10 11 2 6 7 8 a NaN NaN NaN 0 0 1 2
To manipulate missing values, we can use the isnull()
or notnull()
functions to detect the missing values in a Series object, as well as in a DataFrame object:
>>> df10.isnull() a b c 3 False False False 2 False False False a True True True 0 False False False
On a Series, we can drop all null
data and index values by using the dropna
function:
>>> s4 = pd.Series({'001': 'Nam', '002': 'Mary', '003': 'Peter'}, index=['002', '001', '024', '065']) >>> s4 002 Mary 001 Nam 024 NaN 065 NaN dtype: object >>> s4.dropna() # dropping all null value of Series object 002 Mary 001 Nam dtype: object
With a DataFrame object, it is a little bit more complex than with Series. We can tell which rows or columns we want to drop and also if all entries must be null
or a single null
value is enough. By default, the function will drop any row containing a missing value:
>>> df9.dropna() # all rows will be dropped Empty DataFrame Columns: [a, b, c, d] Index: [] >>> df9.dropna(axis=1) a b c 0 0 1 2 1 3 4 5 2 6 7 8 3 9 10 11
Another way to control missing values is to use the supported parameters of functions that we introduced in the previous section. They are also very useful to solve this problem. In our experience, we should assign a fixed value in missing cases when we create data objects. This will make our objects cleaner in later processing steps. For example, consider the following:
>>> df11 = df8.reindex([3, 2, 'a', 0], fill_value = 0) >>> df11 a b c 3 9 10 11 2 6 7 8 a 0 0 0 0 0 1 2
We can alse use the fillna
function to fill a custom value in missing values:
>>> df9.fillna(-1) a b c d 0 0 1 2 -1 1 3 4 5 -1 2 6 7 8 -1 3 9 10 11 -1
- 虛擬儀器設計測控應用典型實例
- 機密計算:原理與技術(網絡空間安全技術叢書)
- 自動控制工程設計入門
- LabVIEW虛擬儀器從入門到測控應用130例
- Verilog HDL數字系統設計入門與應用實例
- Getting Started with Oracle SOA B2B Integration:A Hands-On Tutorial
- Drupal 7 Multilingual Sites
- 教父母學會上網
- 系統安裝與重裝
- AI 3.0
- 基于單片機的嵌入式工程開發詳解
- Blender 3D Printing by Example
- 網站前臺設計綜合實訓
- Mastering Predictive Analytics with scikit:learn and TensorFlow
- Learning Cassandra for Administrators