- Python:Data Analytics and Visualization
- Phuong Vo.T.H Martin Czygan Ashish Kumar Kirthi Raman
- 438字
- 2021-07-09 18:51:44
Interacting with data in binary format
We can read and write binary serialization of Python objects with the pickle module, which can be found in the standard library. Object serialization can be useful, if you work with objects that take a long time to create, like some machine learning models. By pickling such objects, subsequent access to this model can be made faster. It also allows you to distribute Python objects in a standardized way.
Pandas includes support for pickling out of the box. The relevant methods are the read_pickle()
and to_pickle()
functions to read and write data from and to files easily. Those methods will write data to disk in the pickle format, which is a convenient short-term storage format:
>>> df_ex3.to_pickle('example_data/ex_06-03.out') >>> pd.read_pickle('example_data/ex_06-03.out') 1 2 3 4 0 Nam 7 1 male hcm Mai 11 1 female hcm Lan 25 3 female hn Hung 42 3 male tn Nghia 26 3 male dn Vinh 39 3 male vl Hong 28 4 female dn
HDF5
HDF5 is not a database, but a data model and file format. It is suited for write-one, read-many datasets. An HDF5 file includes two kinds of objects: data sets, which are array-like collections of data, and groups, which are folder-like containers what hold data sets and other groups. There are some interfaces for interacting with HDF5 format in Python, such as h5py
which uses familiar NumPy and Python constructs, such as dictionaries and NumPy array syntax. With h5py
, we have high-level interface to the HDF5 API which helps us to get started. However, in this book, we will introduce another library for this kind of format called PyTables, which works well with Pandas objects:
>>> store = pd.HDFStore('hdf5_store.h5') >>> store <class 'pandas.io.pytables.HDFStore'> File path: hdf5_store.h5 Empty
We created an empty HDF5 file, named hdf5_store.h5
. Now, we can write data to the file just like adding key-value pairs to a dict
:
>>> store['ex3'] = df_ex3 >>> store['name'] = df_ex2[0] >>> store['hometown'] = df_ex3[4] >>> store <class 'pandas.io.pytables.HDFStore'> File path: hdf5_store.h5 /ex3 frame (shape->[7,4]) /hometown series (shape->[1]) /name series (shape->[1])
Objects stored in the HDF5 file can be retrieved by specifying the object keys:
>>> store['name'] 0 Nam 1 Mai 2 Lan 3 Hung 4 Nghia 5 Vinh 6 Hong Name: 0, dtype: object
Once we have finished interacting with the HDF5 file, we close it to release the file handle:
>>> store.close() >>> store <class 'pandas.io.pytables.HDFStore'> File path: hdf5_store.h5 File is CLOSED
There are other supported functions that are useful for working with the HDF5 format. You should explore ,in more detail, two libraries – pytables
and h5py
– if you need to work with huge quantities of data.
- 高效能辦公必修課:Word圖文處理
- 大數據導論:思維、技術與應用
- Splunk 7 Essentials(Third Edition)
- LabVIEW虛擬儀器從入門到測控應用130例
- Mastering Salesforce CRM Administration
- 工業機器人工程應用虛擬仿真教程:MotoSim EG-VRC
- 電腦上網直通車
- CorelDRAW X4中文版平面設計50例
- Python Algorithmic Trading Cookbook
- STM32嵌入式微控制器快速上手
- 電腦主板現場維修實錄
- 運動控制系統應用與實踐
- 工業機器人運動仿真編程實踐:基于Android和OpenGL
- 教育機器人的風口:全球發展現狀及趨勢
- 電氣控制與PLC原理及應用(歐姆龍機型)