Exploring Structure from Motion Using OpenCV
In this chapter, we will discuss the notion of Structure from Motion (SfM),or better put, extracting geometric structures from images taken with a camera under motion, using OpenCV's API to help us. First, let's constrain the otherwise very b road approach to SfM using a single camera, usually called a monocular approach, and a discrete and sparse set of frames rather than a continuous video stream. These two constrains will greatly simplify the system we will sketch out in the coming pages, and help us understand the fundamentals of any SfM method. To implement our method, we will follow in the footsteps of Hartley and Zisserman (hereafter referred to as H&Z, for brevity), as documented in Chapters 9 through 12 of their seminal book Multiple View Geometry in Computer Vision.
In this chapter, we will cover the following:
- Structure from Motion concepts
- Estimating the camera motion from a pair of images
- Reconstructing the scene
- Reconstructing from many views
- Refining the reconstruction
Throughout the chapter, we assume the use of a calibrated camera, one that was calibrated beforehand. Calibration is a ubiquitous operation in Computer Vision, fully supported in OpenCV using command-line tools, and was discussed in previous chapters. We, therefore, assume the existence of the camera's intrinsic parameters embodied in the K matrix and distortionn coefficients vector - the outputs from the calibration process.
To make things clear in terms of language, from this point on, we will refer to a camera as a single view of the scene rather than to the optics and hardware taking the image. A camera has a 3D position in space (translation) and a 3D direction of view (orientation). In general, we describe this as the 6 Degree of Freedom (DOF) camera pose, sometimes referred to as extrinsic parameters. Between two cameras, therefore, there is a 3D translation element (movement through space) and a 3D rotation of the direction of view.
We will also unify the terms for the point in the scene, world, real, or 3D to be the same thing, a point that exists in our real world. The same goes for points in an image or 2D, which are points in the image coordinates of some real 3D point that was projected on the camera sensor at that location and time.
In the chapter's code sections, you will notice references to Multiple View Geometry in Computer Vision, for example // HZ 9.12. This refers to equation number 12 of Chapter 9 of the book. Also, the text will include excerpts of code only; while the complete runnable code is included in the material accompanied with the book.
The following flow diagram describes the process in the SfM pipeline we will implement. We begin by triangulating an initial reconstructed point cloud of the scene, using 2D features matched across the image set and a calculation of two camera poses. We then add more views to the reconstruction by matching more points into the forming point cloud, calculating camera poses and triangulating their matching points. In between, we will also perform bundle adjustment to minimize the error in the reconstruction. All the steps are detailed in the next sections of this chapter, with relevant code excerpts, pointers to useful OpenCV functions, and mathematical reasoning:

- .NET之美:.NET關(guān)鍵技術(shù)深入解析
- Learning RabbitMQ
- QGIS:Becoming a GIS Power User
- 名師講壇:Spring實(shí)戰(zhàn)開(kāi)發(fā)(Redis+SpringDataJPA+SpringMVC+SpringSecurity)
- bbPress Complete
- Java面向?qū)ο蟪绦蛟O(shè)計(jì)
- Learning AWS
- 一步一步跟我學(xué)Scratch3.0案例
- Magento 2 Beginners Guide
- 青少年學(xué)Python(第2冊(cè))
- 超好玩的Scratch 3.5少兒編程
- Scratch編程從入門(mén)到精通
- 數(shù)據(jù)結(jié)構(gòu)與算法詳解
- HTML5程序開(kāi)發(fā)范例寶典
- PHP程序設(shè)計(jì)經(jīng)典300例