官术网_书友最值得收藏!

Developing a better approach to understanding data

Whether you are a data developer, systems analyst, programmer/developer, or data scientist, or other business or technology professional, you need to be able to develop a comprehensive relationship with the data you are working with or designing an application or database schema for.

Some might rely on the data specifications provided to you as part of the overall project plan or requirements, and still, some (usually those with more experience) may supplement their understanding by performing some generic queries on the data, either way, this seldom is enough.

In fact, in industry case studies, unclear, misunderstood, or incomplete requirements or specifications consistently rank in the top five as reasons for project failure or added risk.

Profiling data is a process, characteristic of data science, aimed at establishing data intimacy (or a more clear and concise grasp of the data and its inward relationships). Profiling data also establishes context to which there are several general contextual categories, which can be used to augment or increase the value and understanding of data for any purpose or project.

These categories include the following:

  • Definitions and explanations: These help gain additional information or attributes about data points within your data
  • Comparisons: This help add a comparable value to a data point within your data
  • Contrasts: This help add an opposite to a data point to see whether it perhaps determines a different perspective
  • Tendencies: These are typical mathematical calculations, summaries, or aggregations
  • Dispersion: This includes mathematical calculations (or summaries) such as range, variance, and standard deviation, describing the average of a dataset (or group within the data)
Think of data profiling as the process you may have used for examining data in a data file and collecting statistics and information about that data. Those statistics most likely drove the logic implemented in a program or how you related data in tables of a database.
主站蜘蛛池模板: 江孜县| 诏安县| 东乌珠穆沁旗| 依安县| 宜兴市| 八宿县| 伊川县| 东乌珠穆沁旗| 乌鲁木齐县| 湘西| 江达县| 天水市| 永定县| 新源县| 古田县| 万载县| 宁明县| 蕲春县| 巴南区| 香格里拉县| 清新县| 哈巴河县| 凤翔县| 乌鲁木齐县| 常宁市| 沁阳市| 通河县| 会理县| 宁国市| 玉溪市| 文水县| 咸阳市| 吴旗县| 日土县| 普洱| 积石山| 老河口市| 新巴尔虎右旗| 咸阳市| 湘乡市| 于田县|