- Advanced Machine Learning with R
- Cory Lesmeister Dr. Sunil Kumar Chinnamgari
- 745字
- 2021-06-24 14:24:45
Deep learning – a not-so-deep overview
So, what is this deep learning that is grabbing our attention and headlines? Let's turn to Wikipedia again to form a working definition: Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple nonlinear transformations. That sounds as if a lawyer wrote it. The characteristics of deep learning are that it is based on ANNs where the machine learning techniques, primarily unsupervised learning, are used to create new features from the input variables. We will dig into some unsupervised learning techniques in the next couple of chapters, but you can think of it as finding structure in data where no response variable is available.
A simple way to think of it is the periodic table of elements, which is a classic case of finding a structure where no response is specified. Pull up this table online and you will see that it is organized based on atomic structure, with metals on one side and non-metals on the other. It was created based on latent classification/structure. This identification of latent structure/hierarchy is what separates deep learning from your run-of-the-mill ANN. Deep learning sort of addresses the question of whether there is an algorithm that better represents the outcome than just the raw inputs. In other words, can our model learn to classify pictures other than with just the raw pixels as the only input? This can be of great help in a situation where you have a small set of labeled responses but a vast amount of unlabeled input data. You could train your deep learning model using unsupervised learning and then apply this in a supervised fashion to the labeled data, iterating back and forth.
Identification of these latent structures is not trivial mathematically, but one example is the concept of regularization that we looked at in Chapter 4, Advanced Feature Selection in Linear Models. In deep learning, you can penalize weights with regularization methods such as L1 (penalize non-zero weights), L2 (penalize large weights), and dropout (randomly ignore certain inputs and zero their weight out). In standard ANNs, none of these regularization methods take place.
Another way is to reduce the dimensionality of the data. One such method is the autoencoder. This is a neural network where the inputs are transformed into a set of reduced dimension weights. In the following diagram, notice that Feature A is not connected to one of the hidden nodes:

This can be applied recursively and learning can take place over many hidden layers. What you have seen happening, in this case, is that the network is developing features of features as they are stacked on each other. Deep learning will learn the weights between two layers in sequence first and then use backpropagation to fine-tune these weights. Other feature selection methods include restricted Boltzmann machine and sparse coding model.
and http://deeplearning.net/.
Deep learning has performed well on many classification problems, including winning a Kaggle contest or two. It still suffers from the problems of ANNs, especially the black box problem. Try explaining to the uninformed what is happening inside a neural network, regardless of the use of various in vogue methods. However, it is appropriate for problems where an explanation of how is not a problem and the important question is what. After all, do we really care why an autonomous car avoided running into a pedestrian, or do we care about the fact that it did not? Additionally, the Python community has a bit of a head start on the R community in deep learning usage and packages. As we will see in the practical exercise, the gap is closing.
While deep learning is an exciting undertaking, be aware that to achieve the full benefit of its capabilities, you will need a high degree of computational power along with taking the time to train the best model by fine-tuning the hyperparameters. Here is a list of some things that you will need to consider:
- An activation function
- Size and number of the hidden layers
- Dimensionality reduction, that is, restricted Boltzmann versus autoencoder
- The number of epochs
- The gradient descent learning rate
- The loss function
- Regularization
- 用“芯”探核:龍芯派開發實戰
- 單片機系統設計與開發教程
- Machine Learning Solutions
- Building 3D Models with modo 701
- 單片機開發與典型工程項目實例詳解
- 超大流量分布式系統架構解決方案:人人都是架構師2.0
- Hands-On Artificial Intelligence for Banking
- 計算機電路基礎(第2版)
- 計算機組裝、維護與維修項目教程
- 微服務架構基礎(Spring Boot+Spring Cloud+Docker)
- 現代多媒體技術及應用
- 計算機組裝與維護
- 3D打印:Geomagic Design X5.1 逆向建模設計實用教程
- Exceptional C++:47個C++工程難題、編程問題和解決方案(中文版)
- FPGA的人工智能之路:基于Intel FPGA開發的入門到實踐