官术网_书友最值得收藏!

Regression and Classification Problems

We see classification and regression problems all around us in our daily life. The chances of rain from https://weather.com, our emails getting filtered into the spam mailbox and inbox, our personal and home loans getting accepted or rejected, deciding to pick our next holiday destination, exploring the options for buying a new house, investment decisions to gain short- and long-term benefits, purchasing the next book from Amazon; the list goes on and on. The world around us today is increasingly being run by algorithms that help us with our choices (which is not always a good thing).

As discussed in Chapter 2, Exploratory Analysis of Data, we will use the Minto Pyramid principle called Situation–Complication–Question (SCQ) to define our problem statement. The following table shows the SCQ approach for Beijing's PM2.5 problem:

Figure 3.3: Applying SCQ on Beijing's PM2.5 problem.

Now, in the SCQ construct described in the previous table, we can do a simple correlation analysis to establish the factors affecting the PM2.5 levels or create a predictive problem (prediction means finding an approximate function that maps from input variables to an output) that estimates the PM2.5 levels using all the factors. For the clarity of terminology, we will refer to factors as input variables. Then, PM2.5 becomes the dependent variable (often referred to as output variable). The dependent variable could be either categorical or continuous.

For example, in the email classification into SPAM/NOT SPAM problem, the dependent variable is categorical. The following table highlights some critical differences between regression and classification problems:

Figure 3.4: Difference between regression and classification problems.

主站蜘蛛池模板: 来宾市| 弥勒县| 西林县| 博野县| 体育| 会理县| 南丹县| 金平| 油尖旺区| 普定县| 璧山县| 东港市| 拜城县| 武陟县| 册亨县| 柘城县| 扎赉特旗| 麻江县| 双柏县| 聊城市| 榕江县| 湄潭县| 裕民县| 望江县| 福州市| 祁阳县| 留坝县| 乐清市| 建瓯市| 景德镇市| 南城县| 文昌市| 黄石市| 永济市| 麻阳| 沙坪坝区| 普定县| 扬中市| 通化县| 玉溪市| 安龙县|