官术网_书友最值得收藏!

Network architecture 

We will now experiment with the pre-trained ResNet50, InceptionV3, and VGG16 networks, and find out which one gives the best results. Each of the pre-trained models' weights are based on ImageNet. I have provided the links to the original papers for the ResNet, InceptionV3, and VGG16 architectures, for reference. Readers are advised to go over these papers, to get an in-depth understanding of these architectures and the subtle differences between them.

The VGG paper link is as follows:

The ResNet paper link is as follows:

The InceptionV3 paper link is as follows:

To explain in brief, VGG16 is a 16-layered CNN that uses 3 x 3 filters and 2 x 2 receptive fields for convolution. The activation functions used throughout the network are all ReLUs. The VGG architecture, developed by Simonyan and Zisserman, was the runner up in the ILSVRC 2014 competition. The VGG16 network gained a lot of popularity due to its simplicity, and it is the most popular network for extracting features from images.

ResNet50 is a deep CNN that implements the idea of residual block, quite different from that of the VGG16 network. After a series of convolution-activation-pooling operations, the input of the block is again fed back to the output. The ResNet architecture was developed by Kaiming He, et al., and although it has 152 layers, it is less complex than the VGG network. This architecture won the ILSVRC 2015 competition by achieving a top five error rate of 3.57%, which is better than the human-level performance on this competition dataset. The top five error rate is computed by checking whether the target is in the five class predictions with the highest probability. In principle, the ResNet network tries to learn the residual mapping, as opposed to directly mapping from the output to the input, as you can see in the following residual block diagram:

Figure 2.8: Residual block of ResNet models

InceptionV3 is the state-of-the-art CNN from Google. Instead of using fixed-sized convolutional filters at each layer, the InceptionV3 architecture uses filters of different sizes to extract features at different levels of granularity. The convolution block of an InceptionV3 layer is illustrated in the following diagram:

Figure 2.9: InceptionV3 convolution block

Inception V1 (GoogleNet) was the winner of the ILSVRC 2014 competition. Its top 5% error rate was very close to human-level performance, at 6.67%.

主站蜘蛛池模板: 贵港市| 金坛市| 开封县| 华坪县| 嵩明县| 横峰县| 璧山县| 孝昌县| 普宁市| 房产| 北票市| 永定县| 广安市| 常熟市| 宁远县| 扶绥县| 张家港市| 桃园县| 东乡| 化德县| 岳西县| 定安县| 柳州市| 江西省| 沁阳市| 文山县| 连州市| 黔西县| 辽源市| 武宁县| 云龙县| 石城县| 运城市| 中西区| 青川县| 芷江| 米易县| 靖远县| 呈贡县| 兴海县| 赣州市|