- Mastering Python for Data Science
- Samir Madhavan
- 658字
- 2021-07-16 20:14:19
A z-score
A z-score, in simple terms, is a score that expresses the value of a distribution in standard deviation with respect to the mean. Let's take a look at the following formula that calculates the z-score:

Here, X is the value in the distribution, μ is the mean of the distribution, and σ is the standard deviation of the distribution
Let's try to understand this concept from the perspective of a school classroom.
A classroom has 60 students in it and they have just got their mathematics examination score. We simulate the score of these 60 students with a normal distribution using the following command:
>>> classscore >>> classscore = np.random.normal(50, 10, 60).round() [ 56. 52. 60. 65. 39. 49. 41. 51. 48. 52. 47. 41. 60. 54. 41. 46. 37. 50. 50. 55. 47. 53. 38. 42. 42. 57. 40. 45. 35. 39. 67. 56. 35. 45. 47. 52. 48. 53. 53. 50. 61. 60. 57. 53. 56. 68. 43. 35. 45. 42. 33. 43. 49. 54. 45. 54. 48. 55. 56. 30.]
The NumPy package has a random module that has a normal function, where 50 is given as the mean of the distribution, 10 is the standard deviation of the distribution, and 60 is the number of values to be generated. You can plot the normal distribution with the following commands:
>>> plt.hist(classscore, 30, normed=True) #Number of breaks is 30 >>> plt.show()

The score of each student can be converted to a z-score using the following functions:
>>> stats.zscore(classscore) [ 0.86008868 0.38555699 1.33462036 1.92778497 -1.15667098 0.02965823 -0.91940514 0.26692407 -0.08897469 0.38555699 -0.20760761 -0.91940514 1.33462036 0.62282284 -0.91940514 -0.32624053 -1.39393683 0.14829115 0.14829115 0.74145576 -0.20760761 0.50418992 -1.2753039 -0.80077222 -0.80077222 0.9787216 -1.03803806 -0.44487345 -1.63120267 -1.15667098 2.16505081 0.86008868 -1.63120267 -0.44487345 -0.20760761 0.38555699 -0.08897469 0.50418992 0.50418992 0.14829115 1.45325329 1.33462036 0.9787216 0.50418992 0.86008868 2.28368373 -0.6821393 -1.63120267 -0.44487345 -0.80077222 -1.86846851 -0.6821393 0.02965823 0.62282284 -0.44487345 0.62282284 -0.08897469 0.74145576 0.86008868 -2.22436727]
So, a student with a score of 60 out of 100 has a z-score of 1.334. To make more sense of the z-score, we'll use the standard normal table.
This table helps in determining the probability of a score.
We would like to know what the probability of getting a score above 60 would be.

The standard normal table can help us in determining the probability of the occurrence of the score, but we do not have to perform the cumbersome task of finding the value by looking through the table and finding the probability. This task is made simple by the cdf
function, which is the cumulative distribution function:
>>> prob = 1 - stats.norm.cdf(1.334) >>> prob 0.091101928265359899
The cdf
function gives the probability of getting values up to the z-score of 1.334
, and doing a minus one of it will give us the probability of getting a z-score, which is above it. In other words, 0.09 is the probability of getting marks above 60.
Let's ask another question, "how many students made it to the top 20% of the class?"
Here, we'll have to work backwards to determine the marks at which all the students above it are in the top 20% of the class:

Now, to get the z-score at which the top 20% score marks, we can use the ppf
function in SciPy:
>>> stats.norm.ppf(0.80) 0.8416212335729143
The z-score for the preceding output that determines whether the top 20% marks are at 0.84 is as follows:
>>> (0.84 * classscore.std()) + classscore.mean() 55.942594176524267
We multiply the z-score with the standard deviation and then add the result with the mean of the distribution. This helps in converting the z-score to a value in the distribution. The 55.83
marks means that students who have marks more than this are in the top 20% of the distribution.
The z-score is an essential concept in statistics, which is widely used. Now you can understand that it is basically used in standardizing any distribution so that it can be compared or inferences can be derived from it.
- Python快樂編程:人工智能深度學(xué)習(xí)基礎(chǔ)
- Modular Programming with Python
- SQL Server 2012數(shù)據(jù)庫(kù)技術(shù)及應(yīng)用(微課版·第5版)
- Mastering Natural Language Processing with Python
- Xcode 7 Essentials(Second Edition)
- Learning Informatica PowerCenter 10.x(Second Edition)
- Ray分布式機(jī)器學(xué)習(xí):利用Ray進(jìn)行大模型的數(shù)據(jù)處理、訓(xùn)練、推理和部署
- Building Minecraft Server Modifications
- Python:Master the Art of Design Patterns
- 好好學(xué)Java:從零基礎(chǔ)到項(xiàng)目實(shí)戰(zhàn)
- Mudbox 2013 Cookbook
- 網(wǎng)頁(yè)設(shè)計(jì)與制作
- ArcPy and ArcGIS(Second Edition)
- Scala編程(第4版)
- 劍指大數(shù)據(jù):企業(yè)級(jí)電商數(shù)據(jù)倉(cāng)庫(kù)項(xiàng)目實(shí)戰(zhàn)(精華版)