官术网_书友最值得收藏!

Activation functions

To allow a neural network to learn complex decision boundaries, we apply a non-linear activation function to some of its layers. Commonly used functions include Tanh, ReLU, softmax, and variants of these. More technically, each neuron receives as input signal the weighted sum of the synaptic weights and the activation values of the neurons connected. One of the most widely used functions for this purpose is the so-called sigmoid function. It is a special case of the logistic function, which is defined by the following formula:

The domain of this function includes all real numbers, and the co-domain is (0, 1). This means that any value obtained as an output from a neuron (as per the calculation of its activation state), will always be between zero and one. The sigmoid function, as represented in the following diagram, provides an interpretation of the saturation rate of a neuron, from not being active (= 0) to complete saturation, which occurs at a predetermined maximum value (= 1).

On the other hand, a hyperbolic tangent, or tanh, is another form of the activation function. Tanh squashes a real-valued number to the range [-1, 1]. In particular, mathematically, tanh activation function can be expressed as follows:

The preceding equation can be represented in the following figure:

Sigmoid versus tanh activation function

In general, in the last level of an feedforward neural network (FFNN), the softmax function is applied as the decision boundary. This is a common case, especially when solving a classification problem. In probability theory, the output of the softmax function is squashed as the probability distribution over K different possible outcomes. Nevertheless, the softmax function is used in various multiclass classification methods, such that the network's output is distributed across classes (that is, probability distribution over the classes) having a dynamic range between -1 and 1 or 0 and 1.

For a regression problem, we do not need to use any activation function since the network generates continuous values—probabilities. However, I've seen people using the IDENTITY activation function for regression problems nowadays. We'll see this in later chapters.

To conclude, choosing proper activation functions and network weights initialization are two problems that make a network perform at its best and help to obtain good training. We'll discuss more in upcoming chapters; we will see where to use which activation function.

主站蜘蛛池模板: 宜良县| 滨州市| 鄂托克旗| 东莞市| 确山县| 蒲江县| 车致| 马山县| 商南县| 民乐县| 浮山县| 清丰县| 龙海市| 玉林市| 南川市| 云南省| 咸丰县| 江陵县| 开江县| 布拖县| 乌兰浩特市| 勐海县| 澎湖县| 黎平县| 眉山市| 宁化县| 清涧县| 肥城市| 观塘区| 遵义市| 滦南县| 中宁县| 宜城市| 昔阳县| 崇州市| 黄平县| 芮城县| 沭阳县| 璧山县| 蓬安县| 米泉市|