官术网_书友最值得收藏!

Math and statistics

Statistics and other math skills are essential in several phases of the data science project. Even in the beginning of data exploration, you'll be dividing the features of your data observations into categories:

  • Categorical
  • Numeric:
    • Discrete 
    • Continuous 

Categorical values describe the item and represent an attribute of the item. Imagine you have a dataset about cars: car brand would be a typical categorical value, and color would be another. 

On the other side, we have numerical values that can be split into two different categories—discrete and continuous. Discrete values describe the amount of observations, such as how many people purchased a product, and so on. Continuous values have an infinite number of possible values and use real numbers for the representation. In a nutshell, discrete variables are like points plotted on a chart, and a continuous variable can be plotted as a line. 

Another classification of the data is the measurement-level point of view. We can split data into two primary categories:

  • Qualitative:
    • Nominal
    • Ordinal
  • Quantitative:
    • Interval
    • Ratio

Nominal variables can't be ordered and only describe an attribute. An example would be the color of a product; this describes how the product looks, but you can't put any ordering scheme on the color saying that red is bigger than green, and so on. Ordinal variables describe the feature with a categorical value and provide an ordering system; for example: Education—elementary, high school, university degree, and so on.

With quantitative values, it's a different story. The major difference is that ratio has a true zero. Imagine the attribute was a length. If the length is 0, you know there's no length. But this does not apply to temperature, since there's an interval of possible values for the temperature, where 0°C or 0°F does not mean the beginning of the scale for the temperature (as absolute zero, or beginning of the scale is 273.15° C or -459.67° F). With °K, it would actually be a ratio type of the quantitative value, since the scale really begins with 0°K. So, as you can see, any number can be an interval or a ratio value, but it depends on the context! 

主站蜘蛛池模板: 六枝特区| 宣武区| 喀喇沁旗| 大关县| 迭部县| 迁西县| 济南市| 蓬莱市| 嵊泗县| 泸定县| 崇文区| 华阴市| 昆山市| 东乌珠穆沁旗| 福贡县| 易门县| 北川| 钟山县| 丹东市| 崇仁县| 扬州市| 舟曲县| 灌南县| 云和县| 安岳县| 贡嘎县| 新巴尔虎左旗| 江北区| 晋宁县| 凤阳县| 平江县| 宜兴市| 天门市| 辉南县| 新绛县| 阳原县| 兴山县| 铅山县| 遵化市| 神木县| 读书|