官术网_书友最值得收藏!

Learning about the seeds dataset

We now look at another agricultural dataset, which is still small, but already too large to plot exhaustively on a page as we did with the Iris dataset. This dataset consists of measurements of wheat seeds. There are seven features that are present, which are as follows:

  • Area A
  • Perimeter P
  • Compactness C = 4πA/P2
  • Length of kernel
  • Width of kernel
  • Asymmetry coefficient
  • Length of kernel groove

There are three classes corresponding to three wheat varieties: Canadian, Koma, and Rosa. As earlier, the goal is to be able to classify the species based on these morphological measurements. Unlike the Iris dataset, which was collected in the 1930s, this is a very recent dataset and its features were automatically computed from digital images.

This is how image pattern recognition can be implemented: you can take images, in digital form, compute a few relevant features from them, and use a generic classification system. In Chapter 12, Computer Vision, we will work through the computer vision side of this problem and compute features in images. For the moment, we will work with the features that are given to us.

UCI Machine Learning Dataset Repository:
The University of California at Irvine ( UCI) maintains an online repository of machine learning datasets (at the time of writing, they list 233 datasets). Both the Iris and the seeds datasets used in this chapter were taken from there. The repository is available online at http://archive.ics.uci.edu/ml/.
主站蜘蛛池模板: 汝南县| 祁门县| 大港区| 县级市| 阳东县| 大竹县| 平塘县| 东乡| 时尚| 华宁县| 拉萨市| 建水县| 浠水县| 武强县| 曲水县| 万盛区| 天镇县| 甘肃省| 湖南省| 高州市| 承德县| 锦屏县| 樟树市| 永登县| 多伦县| 长岛县| 凤翔县| 太谷县| 舞阳县| 敦煌市| 宁远县| 温宿县| 宁夏| 太湖县| 翁牛特旗| 清苑县| 平湖市| 仙桃市| 龙里县| 湘乡市| 界首市|