官术网_书友最值得收藏!

Using classification models to predict class labels

With these tools in hand, we can now take on our first real classification example.

Consider the small town of Randomville, where people are crazy about their two sports teams, the Randomville Reds and the Randomville Blues. The Reds had been around for a long time, and people loved them. But then some out-of-town millionaire came along and bought the Reds' top scorer and started a new team, the Blues. To the discontent of most Reds fans, the top scorer would go on to win the championship title with the Blues. Years later he would return to the Reds, despite some backlash from fans who could never forgive him for his earlier career choices. But anyway, you can see why fans of the Reds don't necessarily get along with fans of the Blues. In fact, these two fan bases are so divided that they never even live next to each other. I've even heard stories where the Red fans deliberately moved away once Blues fans moved in next door. True story!

Anyway, we are new in town and are trying to sell some Blues merchandise to people by going from door to door. However, every now and then we come across a bleeding-heart Reds fan who yells at us for selling Blues stuff and chases us off their lawn. Not nice! It would be much less stressful, and a better use of our time, to avoid these houses altogether and just visit the Blues fans instead.

Confident that we can learn to predict where the Reds fans live, we start keeping track of our encounters. If we come by a Reds fan's house, we draw a red triangle on we handy town map; otherwise, we draw a blue square. After a while, we get a pretty good idea of where everyone lives:

Randomville's town map

However, now we approach the house that is marked as a green circle in the preceding map. Should we knock on their door? We try to find some clue as to what team they prefer (perhaps a team flag hanging from the back porch), but we can't see any. How can we know if it is safe to knock on their door?

What this silly example illustrates is exactly the kind of problem a supervised learning algorithm can solve. We have a bunch of observations (houses, their locations, and their color) that make up our training data. We can use this data to learn from experience, so that when we face the task of predicting the color of a new house, we can make a well-informed estimate.

As we mentioned earlier, fans of the Reds are really passionate about their team, so they would never move next to a Blues fan. Couldn't we use this information and look at all the neighboring houses, in order to find out what kind of fan lives in the new house?

This is exactly what the k-NN algorithm would do.

主站蜘蛛池模板: 遂溪县| 龙井市| 保靖县| 奎屯市| 康平县| 永泰县| 彩票| 肃宁县| 临桂县| 襄汾县| 梧州市| 澜沧| 衡阳市| 乌兰察布市| 浮山县| 郸城县| 太谷县| 仁怀市| 邢台市| 微山县| 澜沧| 工布江达县| 依兰县| 临清市| 巴楚县| 邵武市| 荔浦县| 营山县| 鹤庆县| 台湾省| 扬中市| 泰兴市| 喀喇沁旗| 招远市| 霍山县| 许昌市| 阳曲县| 莱州市| 高邮市| 专栏| 甘德县|