官术网_书友最值得收藏!

Data review

When you have successfully loaded your data into Watson Analytics, you should review it and assess its quality.

The IBM Watson Analytics documentation describes data quality as:

Data quality assesses the degree to which a data set is suitable for analysis. A shorthand representation of this assessment is the data quality score. The score is measured on a scale of 0-100, with 100 representing the highest possible data quality.

Further:

The data quality score for a data set is computed by averaging the data quality score for every column in the data set. Several factors affect the data quality score for an individual field or column.

The factors that can affect the data quality score include:

  • Missing values: Records for which no data are entered.
  • Constant values: Some fields have the same value recorded for every field.
  • Imbalance: Occurs in a categorical field when records are not equally distributed across categories.
  • Influential categories: Those categories that are significantly different from other categories.
  • Outliers: Extreme values.
  • Skewness: Skewness measures how symmetrical a continuous field is distributed. Skewed fields have lower data quality scores.
主站蜘蛛池模板: 乐安县| 景德镇市| 调兵山市| 南江县| 山东| 德江县| 大港区| 葫芦岛市| 右玉县| 镇巴县| 任丘市| 庐江县| 防城港市| 璧山县| 江川县| 沿河| 武汉市| 弥勒县| 始兴县| 诸暨市| 项城市| 襄垣县| 荃湾区| 旌德县| 郸城县| 潮安县| 遂溪县| 福海县| 正安县| 西安市| 嘉祥县| 建水县| 永嘉县| 塘沽区| 天镇县| 高碑店市| 林西县| 潼关县| 揭阳市| 利辛县| 贡觉县|