官术网_书友最值得收藏!

What is classification?

Classification is one of the largest uses of data mining, both in practical use and in research. As before, we have a set of samples that represents objects or things we are interested in classifying. We also have a new array, the class values. These class values give us a categorization of the samples. Some examples are as follows:

  • Determining the species of a plant by looking at its measurements. The class value here would be: Which species is this?
  • Determining if an image contains a dog. The class would be: Is there a dog in this image?
  • Determining if a patient has cancer, based on the results of a specific test. The class would be: Does this patient have cancer?

While many of the examples previous are binary (yes/no) questions, they do not have to be, as in the case of plant species classification in this section.

The goal of classification applications is to train a model on a set of samples with known classes and then apply that model to new unseen samples with unknown classes. For example, we want to train a spam classifier on my past e-mails, which I have labeled as spam or not spam. I then want to use that classifier to determine whether my next email is spam, without me needing to classify it myself.

主站蜘蛛池模板: 桃源县| 卓尼县| 天长市| 化州市| 鹤壁市| 连云港市| 芷江| 彭山县| 翼城县| 漯河市| 天祝| 休宁县| 西华县| 中西区| 静安区| 自治县| 永宁县| 东丽区| 驻马店市| 桃江县| 咸宁市| 道孚县| 江口县| 梅河口市| 诸城市| 沛县| 盘山县| 葫芦岛市| 穆棱市| 宜兰县| 长白| 三穗县| 江西省| 舒兰市| 六盘水市| 海阳市| 张家口市| 山阳县| 藁城市| 井冈山市| 泰兴市|