官术网_书友最值得收藏!

CNNs

The most common use case scenarios of CNNs are all to do with image processing, but are not restricted to other types of input, whether it be audio or video. A typical use case is image classification – the network is fed with images so that it can classify the data. For example, it outputs a lion if you give it a lion picture, a tiger when you give it a tiger picture, and so on. The reason why this kind of network is used for image classification is because it uses relatively little preprocessing compared to other algorithms in the same space – the network learns the filters that, in traditional algorithms, were hand-engineered.

Being a multilayered neural network, A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers can be convolutional, pooling, fully connected, and normalization layers. Convolutional layers apply a convolution operation (https://en.wikipedia.org/wiki/Convolution) to an input, before passing the result to the next layer. This operation emulates how the response of an individual physical neuron to a visual stimulus is generated. Each convolutional neuron processes only the data for its receptive field (which is the particular region of the sensory space of an individual sensory neuron in which a change in the environment will modify the firing of that neuron). Pooling layers are responsible for combining the outputs of clusters of neurons in a layer into a single neuron in the next layer. There are different implementations of poolings—max pooling, which uses the maximum value from each cluster from the prior layer; average pooling, which uses the average value from any cluster of neurons on the prior layer; and so on. Fully connected layers, instead, as you will clearly realize from their name, connect every neuron in a layer to every other neuron in another layer.

CNNs don't parse all the training data at once, but they usually start with a sort of input scanner. For example, consider an image of 200 x 200 pixels as input. In this case, the model doesn't have a layer with 40,000 nodes, but a scanning input layer of 20 x 20, which is fed using the first 20 x 20 pixels of the original image (usually, starting in the upper-left corner). Once we have passed that input (and possibly used it for training), we feed it using the next 20 x 20 pixels (this will be explained better and in a more detailed manner in Chapter 5, Convolutional Neural Networks; the process is similar to the movement of a scanner, one pixel to the right). Please note that the image isn't dissected into 20 x 20 blocks, but the scanner moves over it. This input data is then fed through one or more convolutional layers. Each node of those layers only has to work with its close neighboring cells—not all of the nodes are connected to each other. The deeper a network becomes, the more its convolutional layers shrink, typically following a divisible factor of the input (if we started with a layer of 20, then, most probably, the next one would be a layer of 10 and the following a layer of 5). Powers of two are commonly used as divisible factors.

The following diagram (by Aphex34—own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45679374) shows the typical architecture of a CNN:

Figure 2.8
主站蜘蛛池模板: 凉城县| 潢川县| 宁安市| 克什克腾旗| 科技| 阿拉善盟| 科技| 会理县| 竹北市| 罗平县| 鞍山市| 滨海县| 富阳市| 缙云县| 湘乡市| 疏附县| 屏山县| 当涂县| 蓝山县| 吉林市| 穆棱市| 敖汉旗| 沐川县| 墨江| 张家口市| 专栏| 上林县| 高台县| 玛曲县| 温宿县| 神农架林区| 汾阳市| 晋城| 博白县| 涪陵区| 克什克腾旗| 巨鹿县| 垣曲县| 泸西县| 南华县| 得荣县|