官术网_书友最值得收藏!

2.Method

In past few years,Deep Neural Network became popular due to its success in many fields.It has been applied to the field of speech inversion[22,23].Wu et al.[23]try general linear model (GLM),Gaussian mixture model (GMM),artificial neural network (ANN) and deep neural network (DNN) with sigmoid hidden units to estimate articulatory position from synchronized speech features on a English articulatory-acoustic corpus MNGU0.Their results demonstrate that the DNN performs best.

Traditional DNN is obtainedby stacking a series of trained RBMs together layer by layer,where hidden layer of the preceding Restricted Boltzmann Machine (RBM) serves as the visible layer of the following RBM.At the top layer,a regression layer with linear units is added to the stack RBMs.This method is simple and effective in many applications.In DNN,The input of each neuron can be formulated as:

whereIn),i is the input of the ith unit in the nth layer,on-1),j is the output of the jth neuron in the (n-1)th layer,wn),ij is the weight that connecting the ith unit in the nth layer and the jth unit in the (n-1)th layer,and bn),j is the bias of the jth neuron in the (n-1)th layer.fx) is the activation function of corresponding neuron.It is usually a sigmoid function.

However,the training a traditional DNN with sigmoid hidden units is slow and the performance tends to be affected by gradient vanish/explosion problems.Moreover,the distribution of In),i of hidden layers changes during training as the parameters of the previous layers change[26],which slows down the training by requiring lower learning rates and careful parameter initialization,and makes it hard to train models with saturating nonlinearities.

Figure 1 structure of batch normalized feedforward neural network

In this study,the batch normalization technique is implemented to perform the normalization for each training mini-batch (yellow blocks shown in Figure 1).And ‘ReLU’ activation function is used for the neurons of hidden layers.The process of the batch normalization can be formulated as:

whereμn),iare the mean and variance of xn),iγn),iand βn),iare scaling and shifting parameter on normalized value so as to keep the representation capability of the layer.These parameters are optimized with momentum gradient method:

where L is the loss over the training set,d is the momentum,and η is the learning rate. The partial derivatives of each layer can be calculated by using backpropagation algorithm (shown in Equation 12-22).

whereLl is the loss over the lth example,m is the number training examples, is the input to the batch normalization blocks of the (n-1)th layer corresponding to the lth input example,In is the input of the nth layer, is the input to the ith neuron of the nth layer corresponding to the lth input example, is the batch normalized input to the ith batch normalization block of then nth layer corresponding to the lth input example,on is the output of then nth layer,Wn is the connection weight that link neurons in the (n-1)th layer and the nth layer,and bn bn-1)is the bias of the nth layer.

主站蜘蛛池模板: 马鞍山市| 手游| 西平县| 双江| 南岸区| 澄迈县| 富阳市| 永昌县| 安义县| 淮安市| 安龙县| 康保县| 武宁县| 汉源县| 武川县| 安达市| 青浦区| 库尔勒市| 康平县| 荣成市| 禹州市| 九江县| 碌曲县| 太保市| 略阳县| 来安县| 雷波县| 承德市| 丰县| 霞浦县| 疏附县| 修文县| 中山市| 阳泉市| 古蔺县| 固安县| 高台县| 博乐市| 福海县| 泽州县| 贺州市|