官术网_书友最值得收藏!

Summary

We finished covering the basics of interacting with data in different commonly used storage mechanisms from the simple ones, such as text files, over more structured ones, such as HDF5, to more sophisticated data storage systems, such as MongoDB and Redis. The most suitable type of storage will depend on your use case. The choice of the data storage layer technology plays an important role in the overall design of data processing systems. Sometimes, we need to combine various database systems to store our data, such as complexity of the data, performance of the system or computation requirements.

Practice exercises

  • Take a data set of your choice and design storage options for it. Consider text files, HDF5, a document database, and a data structure store as possible persistent options. Also evaluate how difficult (by some metric, for examples, how many lines of code) it would be to update or delete a specific item. Which storage type is the easiest to set up? Which storage type supports the most flexible queries?
  • In Chapter 3, Data Analysis with Pandas we saw that it is possible to create hierarchical indices with Pandas. As an example, assume that you have data on each city with more than 1 million inhabitants and that we have a two level index, so we can address inpidual cities, but also whole countries. How would you represent this hierarchical relationship with the various storage options presented in this chapter: text files, HDF5, MongoDB, and Redis? What do you believe would be most convenient to work with in the long run?
主站蜘蛛池模板: 凌海市| 赤水市| 建昌县| 新龙县| 沙田区| 图木舒克市| 安塞县| 镇雄县| 大英县| 长乐市| 昂仁县| 保德县| 祥云县| 石渠县| 长兴县| 盈江县| 武义县| 大关县| 扎赉特旗| 乌什县| 安化县| 随州市| 白山市| 黎平县| 巴楚县| 西宁市| 杭州市| 监利县| 台东县| 沧州市| 霞浦县| 社旗县| 德钦县| 彭州市| 梅州市| 三穗县| 大埔县| 双流县| 兰考县| 湖南省| 襄城县|