Lecture 2C - R and Keras
Keras sits on top of TensorFlow (and others) and creates a high level abstraction of the underlying TensorFlow programming model.
There are a ton of really nice walk-throughs https://keras.rstudio.com/
The basic elements we will need are:
- data
- model structure
- learning process
- predictions
We are going to walk though 3 analysis:
- simple linear regression
- multiple linear regression
- image recognition
Data – this is common between project types
We need 2-3 buckets of data:
- training data
- used in training
- used in training
- validation data
- used during training to assess training progress
- this is often a X-fold cross validation type of process
- used during training to assess training progress
- testing data
- used after final model is created, OOS estimate of model accuracy
Data – normalization
Generally, we will want to center and scale our data.
- important in optimization methods to reduce dominance of a feature/variable
- important to “highlight” extreme values
Don’t forget the importance of trying to visualize your data.
Keras – build model
Keras uses the concept of layers. Here we have a three layers. Fully connected or dense, meaning every node is connected between layers.
model <- keras_model_sequential()
model %>%
layer_flatten(input_shape = c(28, 28)) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax')
Activation functions
https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/ https://raw.githubusercontent.com/rstudio/cheatsheets/master/keras.pdf
Activation functions:
1. normalize across neurons within a layer
2. enable non-linearity
3. need to be computationally efficient
4. allow for back propogation
Convolutions, pooling, padding, dropout
Convolutions are filters in the photography sense. The essentially take your input image and create a new image. Often times in image analysis, you will have several convolution layers where you have many filters, so one image may become 16, 32, 64 new filtered images.
These convoluted images can be subject to down sampling, ie pooling, max pooling, or dropout. The image size can also be changed, ie padding to add pixels to the edge of the image.
And just about any other creative thing you can imagine.
Loss function and optimizer
To train, we need a measure of “goodness of fit”, or in the negative sense, loss. In linear regression, we tend to think about MSE. For binary choice or categorical, we need a different measure akin to logistic regression.
\[\begin{equation} H_p(q) = -\frac{1}{N} \sum_{i=1}^{N} y_i \ast log(p(y_i)) + (1-y_i) \ast log(1-p(y_i)) \end{equation}\]
For more than two classes, we arrive at a similar, but more complex equation. For a decent writup, see https://gombru.github.io/2018/05/23/cross_entropy_loss/.
model %>% compile(loss = "mse",
optimizer = optimizer_rmsprop(),
metrics = list("mean_absolute_error"))
model %>% compile(
optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics = c('accuracy')
)
Keras – run model
hstory <- model %>% fit(train_data,train_labels,
epochs = epochs,
validation_split = 0.2,
verbose = 0,
callbacks = list(print_dot_callback))
test_predictions <- model %>% predict(test_data)