官术网_书友最值得收藏!

Data manipulation

Data is distributed all over the planet. It is stored in different formats. It has widely varied levels of quality. Because of this there is a need for tools and processes for pulling data together and into a form that can be used for decision making. This requires many different tasks and capabilities from a tool that manipulates data in preparation for analysis. The features needed from such a tool include:

  • Programmability for reuse and sharing
  • Access to data from external sources
  • Storing data locally
  • Indexing data for efficient retrieval
  • Alignment of data in different sets based upon attributes
  • Combining data in different sets
  • Transformation of data into other representations
  • Cleaning data from cruft
  • Effective handling of bad data
  • Grouping data into common baskets
  • Aggregation of data of like characteristics
  • Application of functions to calculate meaning or perform transformations
  • Query and slicing to explore pieces of the whole
  • Restructuring into other forms
  • Modeling distinct categories of data such as categorical, continuous, discrete, and time series
  • Resampling data to different frequencies

There are many data manipulation tools in existence. Each differs in support for the items on this list, how they are deployed, and how they are utilized by their users. These tools include relational databases (SQL Server, Oracle), spreadsheets (Excel), event processing systems (such as Spark), and more generic tools such as R and pandas.

主站蜘蛛池模板: 宜宾县| 永顺县| 韶山市| 卢湾区| 大姚县| 花莲县| 化德县| 微山县| 林甸县| 淮北市| 平遥县| 宝坻区| 河北区| 乌鲁木齐市| 五河县| 桂林市| 光泽县| 庄河市| 贵阳市| 南澳县| 蒙阴县| 温州市| 册亨县| 洛扎县| 敖汉旗| 武鸣县| 冀州市| 武清区| 清苑县| 都昌县| 梁平县| 镇坪县| 绥德县| 盐边县| 耒阳市| 和静县| 德州市| 顺平县| 惠水县| 醴陵市| 汽车|