Lecture 2C - R and Keras

November 05, 2020

Keras sits on top of TensorFlow (and others) and creates a high level abstraction of the underlying TensorFlow programming model.

There are a ton of really nice walk-throughs https://keras.rstudio.com/

The basic elements we will need are:

data
model structure
learning process
predictions

We are going to walk though 3 analysis:

simple linear regression
multiple linear regression
image recognition

Data – this is common between project types

We need 2-3 buckets of data:

training data
- used in training
validation data
- used during training to assess training progress
- this is often a X-fold cross validation type of process
testing data
- used after final model is created, OOS estimate of model accuracy

Data – normalization

Generally, we will want to center and scale our data.

important in optimization methods to reduce dominance of a feature/variable
important to “highlight” extreme values

Don’t forget the importance of trying to visualize your data.

Keras – build model

Keras uses the concept of layers. Here we have a three layers. Fully connected or dense, meaning every node is connected between layers.

model <- keras_model_sequential()
model %>%
  layer_flatten(input_shape = c(28, 28)) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 10, activation = 'softmax')

Activation functions

https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/ https://raw.githubusercontent.com/rstudio/cheatsheets/master/keras.pdf

Activation functions: 1. normalize across neurons within a layer
2. enable non-linearity
3. need to be computationally efficient
4. allow for back propogation

Convolutions, pooling, padding, dropout

Convolutions are filters in the photography sense. The essentially take your input image and create a new image. Often times in image analysis, you will have several convolution layers where you have many filters, so one image may become 16, 32, 64 new filtered images.

These convoluted images can be subject to down sampling, ie pooling, max pooling, or dropout. The image size can also be changed, ie padding to add pixels to the edge of the image.

And just about any other creative thing you can imagine.

Loss function and optimizer

To train, we need a measure of “goodness of fit”, or in the negative sense, loss. In linear regression, we tend to think about MSE. For binary choice or categorical, we need a different measure akin to logistic regression.

\[\begin{equation} H_p(q) = -\frac{1}{N} \sum_{i=1}^{N} y_i \ast log(p(y_i)) + (1-y_i) \ast log(1-p(y_i)) \end{equation}\]

For more than two classes, we arrive at a similar, but more complex equation. For a decent writup, see https://gombru.github.io/2018/05/23/cross_entropy_loss/.

model %>% compile(loss = "mse",
                  optimizer = optimizer_rmsprop(),
                  metrics = list("mean_absolute_error"))

model %>% compile(
  optimizer = 'adam', 
  loss = 'sparse_categorical_crossentropy',
  metrics = c('accuracy')
)

Keras – run model

hstory <- model %>% fit(train_data,train_labels,
                        epochs = epochs,
                        validation_split = 0.2,
                        verbose = 0,
                        callbacks = list(print_dot_callback))
test_predictions <- model %>% predict(test_data)

Keras