官术网_书友最值得收藏!

The probability space and general theory

When probability is discussed, it's often referred to in terms of the probability of a certain event happening. Is it going to rain? Will the price of apples go up or down? In the context of machine learning, probabilities tell us the likelihood of events such as a comment being classified as positive vs. negative, or whether a fraudulent transaction will happen on a credit card. We measure probability by defining what we refer to as the probability space. A probability space is a measure of how and why of the probabilities of certain events. Probability spaces are defined by three characteristics: 

  1. The sample space, which tells us the possible outcomes or a situation 
  2. A defined set of events; such as two fraudulent credit card transactions
  3. The measure of probability of each of these events

While probability spaces are a subject worthy of studying in their own right, for our own understanding, we'll stick to this basic definition. 

In probability theory, the idea of independence is essential. Independence is a state where a random variable does not change based on the value of another random variable. This is an important assumption in deep learning, as non–independent features can often intertwine and affect the predictive power of our models.

In statistical terms, a collection of data about an event is a sample, which is drawn from a theoretical superset of data called a population that represents everything that is known about a grouping or event. For instance, if we were poll people on the street about whether they believe in Political View A or Political View B, we would be generating a random sample from the population, which would be entire population of the city, state, or country where we are polling.

Now let's say we wanted to use this sample to predict the likelihood of a person having one of the two political views, but we mostly polled people who were at an event supporting Political View A. In this case, we may have a biased sample. When sampling, it is important to take a random sample to decrease bias, otherwise any statistical analysis or modeling that we do with sample will be biased as well. 

主站蜘蛛池模板: 屏东县| 临猗县| 大厂| 金川县| 临朐县| 紫云| 南丹县| 永新县| 泽普县| 乳源| 五台县| 阳山县| 吉木萨尔县| 阳新县| 定西市| 麦盖提县| 英山县| 巴东县| 兴山县| 大竹县| 封开县| 中牟县| 中超| 永春县| 沂源县| 多伦县| 牟定县| 洞头县| 榆社县| 黄龙县| 平江县| 洛扎县| 桃园市| 贺州市| 英吉沙县| 琼结县| 大名县| 湘乡市| 长治县| 桂阳县| 柞水县|