書名： Machine Learning With Go
作者名： Daniel Whitenack
本章字數： 213字
更新時間： 2021-07-08 10:37:32

Distributions

A distribution is a representation of how often values appear within a dataset. Let's say, for instance, that one thing you are tracking as a data scientist is the daily sales of a certain product or service, and you have a long list (which you could represent as a vector or part of a matrix) of these daily sales numbers. These sales numbers are part of our dataset, and they include one day with sales of $121, another day with sales of $207, and so on.

There will be one sales number that is the lowest out of the one we have accumulated. There will also be one sales number that is the highest out of the one we have accumulated, and the rest of the sales numbers that are somewhere in between (at least if we assume no exact duplicates). The following image represents these low, high, and in-between values of sales along a line:

This is, thus, a distribution of sales, or at least one representation of the distribution of sales. Note that this distribution has areas where there are more numbers and areas where the numbers are a little sparse. Additionally, note that there seems to be a tendency for numbers to be near the center of the distribution.

官术网_书友最值得收藏!

Machine Learning With Go

Distributions