官术网_书友最值得收藏!

The neural network model

A neural network model is similar to the preceding logistic regression model. The only difference is the addition of hidden layers between the input and output layers. Let's consider a single hidden layer neural network for classification to understand the process as shown in the following diagram:

Here, Layer 0 is the input layer, Layer 1 is the hidden layer, and Layer 2 is the output layer. This is also known as two layered neural networks, owing to the fact that when we count the number of layers in a neural network, we don't consider input layer as the first layer. Thus, input layer is considered as Layer 0 and then successive layers get the notation of Layer 1, Layer 2, and so on.

Now, a basic question which comes to mind: why the layers between the input and output layer termed as hidden layers ?

This is because the values of the nodes in the hidden layers are not present in the training set. As we have seen, at every node two calculations happen. These are:

  • Aggregation of the input signals from previous layers

  • Subjecting the aggregated signal to an activation to create deeper inner representations, which in turn are the values of the corresponding hidden nodes

Referring to the preceding diagram, we have three input features, , and . The node showing value 1 is regarded as the bias unit. Each layer, except the output, generally has a bias unit. Bias units can be regarded as an intercept term and play an important role in shifting the activation function left or right. Remember, the number of hidden layers and nodes in them are hyperparameters that we define at the start. Here, we have defined the number of hidden layers to be one and the number of hidden nodes to be three,, and . Thus, we can say we have three input units, three hidden units, and three output units (, and , since we have out of three classes to predict). This will give us the shape of weights and biases associated with the layers. For example, Layer 0 has 3 units and Layer 1 has 3. The shape of the weight matrix and bias vector associated with Layer i is given by:

Therefore, the shapes of :

  • will be and will be

  • will be and will be

Now, let's understand the following notation:

  • : Here, it refers to the value of weight connecting node a in Layer i to node d in Layer i+1

  • : Here, it refers to the value of the bias connecting the bias unit node in Layer i to node d in Layer i+1

Therefore, the nodes in the hidden layers can be calculated in the following way:

Where, the f function refers to the activation function. Remember the logistic regression where we used sigmoid and softmax a the activation function for binary and multi-class logistic regression respectively.

Similarly, we can calculate the output unit, as so:

This brings us to an end of the forward propagation process. Our next task is to train the neural network (that is, train the weights and biases parameters) through backpropagation.

Let the actual output classes be and .

Recalling the cost function section in linear regression, we used cross entropy to formulate our cost function. Since, the cost function is defined by,

where, C = 3, and m = number of examples

Since this is a classification problem, for each example the output will have only one output class as 1 and the rest would be zero. For example, for i, it would be:

Thus, cost function

Now, our goal is to minimize the cost function with regards to and . In order to train our given neural network, first randomly initialize and. Then we will try to optimize through gradient descent where we will update and accordingly at the learning rate, , in the following manner:

After setting up this structure, we have to perform these optimization steps (of updating and ) repeatedly for numerous iterations to train our neural network.

This brings us to the end of the basic of neural networks, which forms the basic building block of any neural network, shallow or deep. Our next frontier will be to understand some of the famous deep neural network architectures, such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Apart from that, we will also have a look at the benchmarked deep neural network architectures such as AlexNet, VGG-net, and Inception.

主站蜘蛛池模板: 抚顺县| 花莲县| 吴忠市| 巧家县| 昆明市| 中牟县| 大连市| 凤凰县| 闵行区| 特克斯县| 琼中| 磴口县| 常宁市| 城固县| 葫芦岛市| 会泽县| 申扎县| 静乐县| 襄垣县| 灵宝市| 额敏县| 和林格尔县| 黄山市| 湘潭县| 灯塔市| 云霄县| 祥云县| 双牌县| 中宁县| 博乐市| 新化县| 宣威市| 齐齐哈尔市| 都江堰市| 商河县| 华容县| 娄底市| 南宁市| 纳雍县| 资中县| 永宁县|