官术网_书友最值得收藏!

Distributions

A distribution is a representation of how often values appear within a dataset. Let's say, for instance, that one thing you are tracking as a data scientist is the daily sales of a certain product or service, and you have a long list (which you could represent as a vector or part of a matrix) of these daily sales numbers. These sales numbers are part of our dataset, and they include one day with sales of $121, another day with sales of $207, and so on.

There will be one sales number that is the lowest out of the one we have accumulated. There will also be one sales number that is the highest out of the one we have accumulated, and the rest of the sales numbers that are somewhere in between (at least if we assume no exact duplicates). The following image represents these low, high, and in-between values of sales along a line:

This is, thus, a distribution of sales, or at least one representation of the distribution of sales. Note that this distribution has areas where there are more numbers and areas where the numbers are a little sparse. Additionally, note that there seems to be a tendency for numbers to be near the center of the distribution.

主站蜘蛛池模板: 外汇| 轮台县| 宿迁市| 剑阁县| 广元市| 义马市| 东明县| 锦州市| 白河县| 洛阳市| 棋牌| 登封市| 彰化市| 福安市| 喀什市| 九龙县| 冀州市| 德清县| 平利县| 庄河市| 行唐县| 盐山县| 鄂伦春自治旗| 新巴尔虎左旗| 云梦县| 青岛市| 黎川县| 绥德县| 依安县| 涟水县| 宁河县| 正镶白旗| 丰城市| 石台县| 安乡县| 遂溪县| 崇明县| 武川县| 阳原县| 屏山县| 西平县|