Building an ANN model for prediction using Keras and TensorFlow
Now that we have our libraries installed, let's create a folder called aibook and within that create another folder called chapter2. Move all the code for this chapter into the chapter2 folder. Make sure that the conda environment is still active (the prompt will start with the environment name):
Once within the chapter2 folder, type jupyter notebook. This will open an interactive Python editor on the browser.
Use theNew dropdown in the top-right corner to create a new Python 3 notebook:
We are now ready to build our first ANN using Keras and TensorFlow, to predict real estate prices:
Import all the libraries that we need for this exercise. Use the first cell to import all the libraries and run it. Here are the four main libraries we will use:
pandas: We use this to read the data and store it in a dataframe
sklearn: We use this to standardize data and for k-fold cross-validation
keras: We use this to build our sequential neural network
numpy: We use numpy for all math and array operations
Let's import these libraries:
import numpy import pandas as pd from keras.models import Sequential from keras.layers import Dense from keras import optimizers from keras.wrappers.scikit_learn import KerasRegressor from sklearn.model_selection import cross_val_score from sklearn.model_selection import KFold from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline
To view the feature variables, the target variables, and a few rows of the data, enter the following:
dataframe.head()
This output will be a few rows of dataframe, which is shown in the following screenshot:
The dataset has eight columns, details of each column are given as follows:
BIZPROP: Proportion of non-retail business acres per town
ROOMS: Average number of rooms per dwelling
AGE: Proportion of owner-occupied units built before 1940
HIGHWAYS: Index of accessibility to radial highways
TAX: Full-value property tax rate per $10,000
PTRATIO: Pupil-to-teacher ratio by town
LSTAT: Percentage of lower status of the population
VALUE: Median value of owner-occupied homes in thousand dollars (target variable)
In our use case, we need to predict the VALUE column, so we need to split the dataframe into features and target values. We will use a 70/30 split, that is, 70% of data for training and 30% data for testing:
features = dataset[:,0:7] target = dataset[:,7]
Also, to make sure we can reproduce the results, let's set a seed for random generation. This random function is used during cross-validation to randomly sample the data:
# fix random seed for reproducibility seed = 9 numpy.random.seed(seed)
Now we are ready to build our ANN:
Create a sequential neural network that has a simple and shallow architecture.
Make a function called simple_shallow_seq_net() that will define the architecture of the neural network:
def simple_shallow_seq_net(): # create a sequential ANN model = Sequential() model.add(Dense(7, input_dim=7, kernel_initializer='normal', activation='sigmoid')) model.add(Dense(1, kernel_initializer='normal')) sgd = optimizers.SGD(lr=0.01) model.compile(loss='mean_squared_error', optimizer=sgd) return model
The function does the following:
model = Sequential()
A sequential model is instantiated – a sequential model is an ANN model built using a linear stack of layers:
Here, we are adding a dense layer or fully-connected layer with seven neurons that are added to this sequential network. This layer accepts an input with 7 features (since there are seven input or features for predicting house price), which is indicated by the input_dim parameter. The weights of all the neurons in this layer are initialized using a random normal distribution, as indicated by the kernel_initializer parameter. Similarly, all the neurons of this layer use the sigmoid activation function, as indicated by the activation parameter:
model.add(Dense(1, kernel_initializer='normal'))
Add another layer with a single neuron initialized using a random normal distribution:
sgd = optimizers.SGD(lr=0.01)
Set the network to use Scalar Gradient Descent (SGD) to learn, usually specified as optimizers. We also indicate that the network will use a learning rate (lr) of 0.01 at every step of learning:
Indicate that the network needs to use the mean squared error (MSE) cost function to measure the magnitude of the error rate of the model, and use the SGD optimizer to learn from the wrongness measured or loss of the model:
return model
Finally, the function returns a model with the defined specifications.
The next step is to set a random seed for reproducibility; this random function is used to split the data into training and validation. The method used is k-fold validation, where the data is randomly divided into 10 subsets for training and validation:
Now, we need to fit this model to predict a numerical value (house price, in this case), therefore we use KerasRegressor. KerasRegressor is a Keras wrapper used to access the regression estimators for the model from sklearn:
Great, we have built and saved our first neural net to predict real estate price. Our next efforts are to improve the neural net. The first thing to try before fiddling with the network parameters is to improve its performance (lower the MSE) when we standardize the data and use it:
In the preceding code, we created a pipeline to standardize the data and then use it during every learning cycle of the network. In the following code block, we train and cross-evaluate the neural network:
Let's now fiddle with our network to see whether we can get better results. We can start by creating a deeper network. We will increase the number of hidden or fully-connected layers and use both the sigmoid and tanh activation functions in alternate layers:
def deep_seq_net(): # create a deep sequential model model = Sequential() model.add(Dense(7, input_dim=7, kernel_initializer='normal', activation='sigmoid')) model.add(Dense(7,activation='tanh')) model.add(Dense(7,activation='sigmoid')) model.add(Dense(7,activation='tanh')) model.add(Dense(1, kernel_initializer='normal')) sgd = optimizers.SGD(lr=0.01) model.compile(loss='mean_squared_error', optimizer=sgd) return model
The next block of code is used to standardize the variables in the training data and then fit the shallow neural net model to the training data. Create the pipeline and fit the model using standardized data:
So, we get better results when we increase the depth (layers) of the network. Now, let's see what happens when we widen the network, that is, increase the number of neurons (nodes) in each layer. Let's define a deep and wide network to tackle the problem, we increase the neurons in each layer to 21. Also, this time around, we will use the relu and sigmoid activation functions for the hidden layers:
def deep_and_wide_net(): # create a sequential model model = Sequential() model.add(Dense(21, input_dim=7, kernel_initializer='normal', activation='relu')) model.add(Dense(21,activation='relu')) model.add(Dense(21,activation='relu')) model.add(Dense(21,activation='sigmoid')) model.add(Dense(1, kernel_initializer='normal')) sgd = optimizers.SGD(lr=0.01) model.compile(loss='mean_squared_error', optimizer=sgd) return model
The next block of code is used to standardize the variables in the training data and then fit the deep and wide neural net model to the training data:
This time, the MSE is again better than the previous networks we created. This is a good example of how a deeper network with more neurons abstracts the problem better:
deep_and_wide_net:(34.43) MSE
Finally, save the network for later use. The saved network model will be used in the next section and served within a REST API:
The data we used here is for demonstrating the technique, so try out different use cases for prediction using the preceding technique on other datasets (https://data.world/datasets/prediction)
We will learn more about optimizers and regularizers, which are other parameters you can use to tune the network, in Chapter 4, Building a Machine Vision Mobile App to Classify Flower Species. The complete code for our ANN model creation is available as a Python notebook named sequence_networks_for_prediction.ipynb.