Transfer Learning: The Highest Leverage Deep Learning Skill You Can Learn

Oct. 26, 2021

Training a deep learning model can take days, weeks, or even months.

Transfer Learning could solve this problem. It’s a machine learning method where trained models are reused as starting points for new tasks. This speeds up training and improves performance on related issues.

It is one of the most popular methods in Deep Learning because it saves time and money by reusing pre-trained models from other tasks that have a similar structure to your own task. In this post, you’ll learn how transfer learning works and how you can use it to speed up your deep learning training process!

What is transfer learning?

Transfer learning is a machine learning technique in which a model trained on a specific task is reused as part of the training process for another, different task.

Here is a simple analogy to help you understand how transfer learning works: imagine that one person has learned everything there is to know about dogs. In contrast, another person has learned everything about cats. If both people are asked, “What’s an animal with four legs, a tail, and barks?” The person who knows all about dogs would answer “dog” while the individual who knows everything about cats would say “cat.”

Since both people already know half of what they need to know to solve the problem at hand, each one only has to fill in their missing information before answering correctly. This is how transfer learning works in machine learning. Combining the information that one model has learned about certain features with another model’s knowledge of other features can result in a new task.

Related: Machine Learning Vs. Artificial Intelligence; What's the difference?

How to use Transfer Learning?

Now that you know how transfer learning works, you probably wonder how to make it work for your own machine learning models. There are two different ways to do this: feature extraction and fine-tuning.

Transfer learning uses pre-trained deep learning models to train for new problems. Feature extraction and Fine-Tuning are two methods used in this process.

Feature Extraction: If you want to transfer knowledge from one machine learning model to another but don’t want to re-train the second, larger model on your data set, then feature extraction is the best way to do this. This is possible because you can take the learned features from one model and train another, much smaller model. Used in conjunction with fine-tuning, this process can give you outstanding results in a short amount of time.

Fine-Tuning: If you are already training your own deep learning models or want to fine-tune the output of an existing model for your dataset, this approach could be a good fit for you. By using a smaller model to learn from the larger one, you can benefit from any of the work that has already been done by the larger model without having to go through all of the hassles of training it yourself. As a result, this approach is faster and more efficient than feature extraction alone.

Steps to transfer learning from a pre-trained model to a new one.

Transfer learning procedures include these five steps. Here are the steps and how to do if you’re using Keras to build your deep nets.

Related: How to Evaluate if Deep Learning Is Right For You?

Extract layers from a pre-trained model.

These layers contain information to achieve the task in general settings. Typically they are pre-trained with huge datasets. In Keras, you can easily eliminate it by specifying include_top=False when loading a pre-trained model.

model = VGG16(
  include_top=False, 
  # Other model parameters  
)

Freeze the layers.

This ensures the information from the previous model is not destroyed in future training. In Keras, you can use the trainable option to toggle a layer from frozen and unfrozen modes.

# trainable property is available on model instances
model.trainable = False

# Also on individual layers
layer.trainable = False

Extend the frozen model by adding trainable layers. These new layers will learn to adapt the pre-trained model for your specific problem. The below is one way how you can add a new layer to a Keras model.

tf.keras.layers.Dense(8, activation='relu')(input1)

Train the new layers on your dataset.

Specify a small learning rate when using a new dataset on a pre-trained model. A small rate ensures the model doesn’t drastically deviate from the original model. Slowly, the new model will adapt to the new problem.

# Specify a small value for learning rate. 
# This ensures the model doesn't drastically deviate from the original one.
base_learning_rate = 0.0001

model.compile(
  optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate),
  loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
  metrics=['accuracy']
)

training_history = model.fit(
  train_dataset,
  epochs=initial_epochs,
  validation_data=validation_dataset
)

Fine Tuning (Optional): unfreeze the entire model, and re-training it on your new dataset.

Use the model.trainable=False to set the entire model free for learning. Now train with a small learning rate as shown in the previous steps.

Positive and Negative Transfer learnings

There are two types of transfer learning: negative transfer and positive transfer. All knowledge acquired by models trained on one task will be applied to a new one, but not all knowledge will be transferred beneficially, and this difference is the source of negative and positive transfer.

Positive and Negative transfer learning of deep learning models.

Negative transfer occurs when a model trained on one task, for example, digit classification using a deep neural network, fails to complete another task, such as an image classification task, because the knowledge gained from the prior training bleeds into its performance of the new task. This type of interference manifests itself in the degradation of the model’s performance on new tasks.

Positive transfer occurs when some learned knowledge can help improve the results of a new task. It learns how to perform it faster by leveraging its previous learning experience with related tasks. This is beneficial since you do not have to spend as much time training on the second novel task.

Transfer learning in popular deep learning models

The different types of models that are used as part of transfer learning are as follows:

Convolutional Neural Networks: These models are used when the input data is a set of images. The input to a convolutional neural network is a 2-D array of numbers representing pixels. The output from each layer of the model can be thought to represent different parts of an image. This is because these types of networks map the input to a set of different outputs, and each one is responsible for representing a specific part of the image.

Example: See VGG16 and VGG19. Here’s a detailed blog post by Jason Brownlee

Recurrent Neural Networks: These models are used when the input data has a temporal nature, such as text or time-series data. In these cases, it is common for information from previous points to influence the information learned at a future point in time. This makes recurrent neural networks a perfect fit for this type of data since they are based on loops that allow the network to have memory and learn from previous information.

Example: Timenet, an RNN trained to extract features from time-series data

Generative Adversarial Networks: These models use transfer learning by taking what has been learned about generating one set of images and applying it to another. For example, GANs can be trained to generate new samples of handwritten digits similar to those previously seen. However, this is just one example of how this model can be used in transfer learning.

Example: DCGAN-TensorFlow, Deep Convolutional Generative Adversarial Networks

Auto-encoders: Auto-encoders is a particular type of model that is used to compress and decompress information. There are two types of auto-encoders: one that encodes the original data into a compressed representation (a coding auto-encoder), and another that attempts to recreate the original image from this compact representation (a reconstruction auto-encoder). These models can be very efficient at finding a compressed representation of the input data, which can then be used as part of transfer learning.

Example: Here’s a Youtube video by Sreenivas Bhattiprolu.

In Summary,

Transfer learning is an amazing way to speed up deep learning training. It helps solve complex problems with pre-existing knowledge. At the core, transfer learning is using a deep learning model trained for one problem as a starting point to solve another.

This article covers the basics and benefits of using transfer learning. Data scientists can train new models at a much shorter time period without eating up a huge chunk of their budget if they use transfer learning. Also, when there are fewer training samples for your new problem, transfer learning is a lifesaver.

In most cases, you can solve organizational problems with simple models. If you’re interested in learning about them do check out these articles.


Thanks for the read, friend. It seems you and I have lots of common interests. Say Hi to me on LinkedIn, Twitter, and Medium. I’ll break the ice for you.

Not a Medium member yet? Please use this link to become a member because I earn a commission for referring at no extra cost for you.

How we work

Readers support The Analytics Club. We earn through display ads. Also, when you buy something we recommend, we may get an affiliate commission. But it never affects your price or what we pick.

Connect with us