- IBM Watson Projects
- James Miller
- 186字
- 2021-07-16 17:31:19
Data review
When you have successfully loaded your data into Watson Analytics, you should review it and assess its quality.
The IBM Watson Analytics documentation describes data quality as:
Data quality assesses the degree to which a data set is suitable for analysis. A shorthand representation of this assessment is the data quality score. The score is measured on a scale of 0-100, with 100 representing the highest possible data quality.
Further:
The data quality score for a data set is computed by averaging the data quality score for every column in the data set. Several factors affect the data quality score for an individual field or column.
The factors that can affect the data quality score include:
- Missing values: Records for which no data are entered.
- Constant values: Some fields have the same value recorded for every field.
- Imbalance: Occurs in a categorical field when records are not equally distributed across categories.
- Influential categories: Those categories that are significantly different from other categories.
- Outliers: Extreme values.
- Skewness: Skewness measures how symmetrical a continuous field is distributed. Skewed fields have lower data quality scores.
推薦閱讀
- 32位嵌入式系統(tǒng)與SoC設(shè)計(jì)導(dǎo)論
- 大學(xué)計(jì)算機(jī)基礎(chǔ):基礎(chǔ)理論篇
- 大數(shù)據(jù)管理系統(tǒng)
- 數(shù)據(jù)中心建設(shè)與管理指南
- Cloud Analytics with Microsoft Azure
- 樂(lè)高創(chuàng)意機(jī)器人教程(中級(jí) 下冊(cè) 10~16歲) (青少年iCAN+創(chuàng)新創(chuàng)意實(shí)踐指導(dǎo)叢書(shū))
- Photoshop CS3特效處理融會(huì)貫通
- 網(wǎng)絡(luò)化分布式系統(tǒng)預(yù)測(cè)控制
- Implementing Splunk 7(Third Edition)
- Docker on Amazon Web Services
- Cloud Security Automation
- Linux系統(tǒng)下C程序開(kāi)發(fā)詳解
- SQL Server數(shù)據(jù)庫(kù)應(yīng)用基礎(chǔ)(第2版)
- 智能制造系統(tǒng)及關(guān)鍵使能技術(shù)
- Python語(yǔ)言從入門(mén)到精通