- Building Computer Vision Projects with OpenCV 4 and C++
- David Millán Escrivá Prateek Joshi Vinícius G. Mendon?a Roy Shilkrot
- 518字
- 2021-07-02 12:28:43
Computer vision and the machine learning workflow
Computer vision applications with machine learning have a common basic structure. This structure is divided into different steps:
- Pre-process
- Segmentation
- Feature extraction
- Classification result
- Post-process
These are common in almost all computer vision applications, while others are omitted. In the following diagram, you can see the different steps that are involved:

Almost all computer vision applications start with a Pre-process applied to the input image, which consists of the removal of light and noise, filtering, blurring, and so on. After applying all pre-processing required to the input image, the second step is Segmentation. In this step, we have to extract the regions of interest in the image and isolate each one as a unique object of interest. For example, in a face detection system, we have to separate the faces from the rest of the parts in the scene. After detecting the objects inside the image, we continue to the next step. Here, we have to extract the features of each one; the features are normally a vector of characteristics of objects. A characteristic describes our objects and can be the area of an object, contour, texture pattern, pixels, and so on.
Now, we have the descriptor, also known as a feature vector or feature set, of our object. Descriptors are the features that describe an object, and we use these to train or predict a model. To do this, we have to create a large dataset of features where thousands of images are pre-processed. We then use the extracted features (image/object characteristics) such as area, size, and aspect ration, in the Train model function we choose. In the following diagram, we can see how a dataset is fed into a Machine Learning Algorithm to train and generate a Model:

When we Train with a dataset, the Model learns all the parameters required to be able to predict when a new vector of features with an unknown label is given as input to our algorithm. In the following diagram, we can see how an unknown vector of features is used to Predict using the generated Model, thus returning the Classification result or regression:

After predicting the result, the post-processing of output data is sometimes required, for example, merging multiple classifications to decrease the prediction error or merging multiple labels. A sample case in Optical Character recognition is where the Classification result is according to each predicted character, and by combining the results of character recognition, we construct a word. This means that we can create a post-processing method to correct errors in detected words. With this small introduction to machine learning for computer vision, we are going to implement our own application that uses machine learning to classify objects in a slide tape. We are going to use support vector machines as our classification method and explain how to use them. The other machine learning algorithms are used in a very similar way. The OpenCV documentation has detailed information about all of the machine learning algorithms at the following link: https://docs.opencv.org/master/dd/ded/group__ml.html.
- Hands-On Machine Learning with Microsoft Excel 2019
- 圖解機器學習算法
- 軟件成本度量國家標準實施指南:理論、方法與實踐
- Microsoft Power BI數據可視化與數據分析
- Learning Proxmox VE
- Proxmox VE超融合集群實踐真傳
- Hadoop大數據開發案例教程與項目實戰(在線實驗+在線自測)
- TextMate How-to
- 數據分析師養成寶典
- 聯動Oracle:設計思想、架構實現與AWR報告
- 企業大數據處理:Spark、Druid、Flume與Kafka應用實踐
- 離線和實時大數據開發實戰
- Oracle 11g數據庫管理與開發基礎教程
- Oracle 11g數據庫系統設計、開發、管理與應用
- 大數據原理與技術