官术网_书友最值得收藏!

Mathematical operations allowed

We have a few new abilities to work with at the ordinal level compared to the nominal level. At the ordinal level, we may still do basic counts as we did at the nominal level, but we can also introduce comparisons and orderings into the mix. For this reason, we may utilize new graphs at this level. We may use bar and pie charts like we did at the nominal level, but because we now have ordering and comparisons, we can calculate medians and percentiles. With medians and percentiles, stem-and-leaf plots, as well as box plots, are possible.

Some examples of data at the ordinal level include:

  • Using a Likert scale (rating something on a scale from one to ten, for example)

  • Grade levels on an exam (F, D, C, B, A)

For a real-world example of data at the ordinal scale, let's bring in a new dataset. This dataset holds key insights into how much people enjoy the San Francisco International Airport or SFO. This dataset is also publicly available on SF's open database (https://data.sfgov.org/Transportation/2013-SFO-Customer-Survey/mjr8-p6m5):

# load in the data set
customer = pd.read_csv('../data/2013_SFO_Customer_survey.csv')

This CSV has many, many columns:

customer.shape

(3535, 95)

95 columns, to be exact. For more information on the columns available for this dataset, check out the data dictionary on the website (https://data.sfgov.org/api/views/mjr8-p6m5/files/FHnAUtMCD0C8CyLD3jqZ1-Xd1aap8L086KLWQ9SKZ_8?download=true&filename=AIR_DataDictionary_2013-SFO-Customer-Survey.pdf)

For now, let's focus on a single column, Q7A_ART. As described by the publicly available data dictionary, Q7A_ART is about artwork and exhibitions. The possible choices are 0, 1, 2, 3, 4, 5, 6 and each number has a meaning:

  • 1: Unacceptable
  • 2: Below Average
  • 3: Average
  • 4: Good
  • 5: Outstanding
  • 6: Have Never Used or Visited
  • 0: Blank

We can represent it as follows:

art_ratings = customer['Q7A_ART']
art_ratings.describe()


count 3535.000000 mean 4.300707 std 1.341445 min 0.000000 25% 3.000000 50% 4.000000 75% 5.000000 max 6.000000 Name: Q7A_ART, dtype: float64

The pandas is considering the column numerical because it is full of numbers, however, we must remember that even though the cells' values are numbers, those numbers represent a category, and therefore this data belongs to the qualitative side, and more specifically, ordinal. If we remove the 0 and 6 category, we are left with five ordinal categories which basically resemble the star rating of restaurant ratings:

# only consider ratings 1-5
art_ratings = art_ratings[(art_ratings >=1) & (art_ratings <=5)]

We will then cast the values as strings:

# cast the values as strings
art_ratings = art_ratings.astype(str)

art_ratings.describe()

count 2656 unique 5 top 4 freq 1066 Name: Q7A_ART, dtype: object

Now that we have our ordinal data in the right format, let's look at some visualizations:

# Can use pie charts, just like in nominal level
art_ratings.value_counts().plot(kind='pie')

The following is the result of the preceding code:

We can also visualize this as a bar chart as follows:

# Can use bar charts, just like in nominal level
art_ratings.value_counts().plot(kind='bar')

The following is the output of the preceding code:

However, now we can also introduce box plots since we are at the ordinal level:

# Boxplots are available at the ordinal level
art_ratings.value_counts().plot(kind='box')

The following is the output of the preceding code:

This box plot would not be possible for the Grade column in the salary data, as finding a median would not be possible.

主站蜘蛛池模板: 日喀则市| 云梦县| 丰台区| 逊克县| 札达县| 靖边县| 鄂托克旗| 百色市| 深圳市| 磴口县| 广宗县| 南投市| 达州市| 望江县| 建湖县| 子长县| 浠水县| 昌都县| 华宁县| 洪雅县| 嵊州市| 昌都县| 铜鼓县| 大方县| 卢龙县| 信丰县| 南召县| 光泽县| 柞水县| 澄迈县| 星子县| 滦平县| 云龙县| 张家口市| 子长县| 门头沟区| 南昌市| 黄大仙区| 宜城市| 民丰县| 新民市|