官术网_书友最值得收藏!

Categorical data

Earlier, we explained how variables in your data can be either independent or dependent. Another type of variable definition is a categorical variable. This type of variable is one that can take on one of a limited, and typically fixed, number of possible values, thus assigning each individual to a particular category.

Often, the collected data's meaning is unclear. Categorical data is a method that a data scientist can use to put meaning to the data.

For example, if a numeric variable is collected (let's say the values found are 4, 10, and 12), the meaning of the variable becomes clear if the values are categorized. Let's suppose that based upon an analysis of how the data was collected, we can group (or categorize) the data by indicating that this data describes university students, and there is the following number of players:

  • 4 tennis players
  • 10 soccer players
  • 12 football players

Now, because we grouped the data into categories, the meaning becomes clear.

Some other examples of categorized data might be individual pet preferences (grouped by the type of pet), or vehicle ownership (grouped by the style of a car owned), and so on.

So, categorical data, as the name suggests, is data grouped into some sort of category or multiple categories. Some data scientists refer to categories as sub-populations of data.

Categorical data can also be data that is collected as a yes or no answer. For example, hospital admittance data may indicate that patients either smoke or do not smoke.
主站蜘蛛池模板: 北碚区| 元江| 民勤县| 拜泉县| 讷河市| 闽侯县| 漯河市| 高雄县| 北京市| 白水县| 常熟市| 安图县| 双流县| 潞西市| 新野县| 米泉市| 陇南市| 奉贤区| 米林县| 福海县| 黄龙县| 锡林郭勒盟| 襄城县| 应城市| 拉孜县| 扶绥县| 南江县| 阳谷县| 灵武市| 八宿县| 闸北区| 九台市| 隆德县| 红原县| 南乐县| 肇东市| 阿图什市| 长治市| 莎车县| 旺苍县| 海林市|