Last updated on Oct 25 2021
Ashutosh Wakiroo

Table of Contents

Neural style transfer is an optimization technique that is used for two images – a content image and a style reference image – and they are merged so that the output image looks like a content image, but in the style of the style reference image “depicted” is.
To achieve style transfer, it is necessary to separate the style of the image from its content. After that, it is possible to transfer the style elements of one image to the content elements of another image. This process is mainly carried out using the feature extraction form standard nonlinear neural network.
These features are then manipulated to extract content information or style information. The process involves three images a style image, a content image, and finally, a target image.
The ultimate goal is the style of the style image combined with the content of the content image to create the image.


This process begins by selecting some layers from within our model to extract features. We have a good idea of how our image is processed in neural networks by choosing a few layers to extract features. We remove the model attributes of our style image and content image. After that, we remove the elements from our target image and compare it with our style image feature and our content image feature.

Working of Style Transferring

Neural style transfer is the optimization technique used to take two images- a content image and a style reference image and blend them, so the output image looks like the content image, but it “painted” in the style of the style reference image.

Import and configure the modules

Open Google colab
1. from __future__ import absolute_import, division, print_function, unicode_literals

1. try:
2. # %tensorflow_version only exists in Colab.
3. %tensorflow_version 2.x
4. except Exception:
5. pass
6. import tensorflow as tf
TensorFlow 2.x selected.

1. import IPython.display as display
2. import matplotlib.pyplot as plt
3. import matplotlib as mpl
4. mpl.rcParams['figure.figsize'] = (12,12)
5. mpl.rcParams['axes.grid'] = False
6. import numpy as np
7. import time
8. import functools
10. content_path = tf.keras.utils.get_file('nature.jpg','')
11. style_path = tf.keras.utils.get_file('cloud.jpg','')
Downloading data from
1122304/1117520 [==============================] - 1s 1us/step
Downloading data from
49152/43511 [=================================] - 0s 0us/step5. def
Check the greatest measurement to 512 pixels.
1. load_img(path_to_img):
2. max_dim = 512
3. img =
4. img = tf.image.decode_image(img, channels=3)
5. img = tf.image.convert_image_dtype(img, tf.float32)
6. shape = tf.cast(tf.shape(img)[:-1], tf.float32)
7. long_dim = max(shape)
8. scale = max_dim / long_dim
9. new_shape = tf.cast(shape * scale, tf.int32)
10. img = tf.image.resize(img, new_shape)
11. img = img[tf.newaxis, :]
12. return img

Creating a function to show the image

1. def imshow(image, title=None):
2. if len(image.shape) > 3:
3. image = tf.squeeze(image, axis=0)
5. plt.imshow(image)
6. if title:
7. plt.title(title)

1. content_image = load_img(content_path)
2. style_image = load_img(style_path)
3. plt.subplot(1, 2, 1)
4. imshow(content_image, 'Content Image')
5. plt.subplot(1, 2, 2)
6. imshow(style_image, 'Style Image')


tensorFlow 4
1. x = tf.keras.applications.vgg19.preprocess_input(content_image*255)
2. x = tf.image.resize(x, (224, 224))
3. vgg = tf.keras.applications.VGG19(include_top=True, weights='imagenet')
4. prediction_probabilities = vgg(x)
5. prediction_probabilities.shape
Downloading data from models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels.h5

574717952/574710816 [==============================] - 8s 0us/step
TensorShape([1, 1000])

1. predicted_top_5 = tf.keras.applications.vgg19.decode_predictions(prediction_probabilities.numpy())[0]
2. [(class_name, prob) for (number, class_name, prob) in predicted_top_5]
Downloading data from
40960/35363 [==================================] - 0s 0us/step
[('mobile_home', 0.7314594),
('picket_fence', 0.119986326),
('greenhouse', 0.026051044),
('thatch', 0.023595566),
('boathouse', 0.014751049)]

Define style and content representations

Use the middle layers of the model to the content and style representation of the image. Starting from the input layer, the first few layer activation represents low-level represent like edges and textures.
For the input image, try to match the similar style and content target representation at the intermediate layers.
Load the VGG19 and run it on our image to ensure it used correctly here.

1. vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
2. print()
3. for layer in vgg.layers:
4. print(
Download data from
80142336/80134624 [==============================] - 2s 0us/step


1. # Content layer
2. content_layers = ['block5_conv2']
4. # Style layer of interest
5. style_layers = ['block1_conv1',
6. 'block2_conv1',
7. 'block3_conv1',
8. 'block4_conv1',
9. 'block5_conv1']
11. num_content_layers = len(content_layers)
12. num_style_layers = len(style_layers)

Intermediate layers for style and content

At the high level, to a network to perform image classification, it understands the image and requires taking the image as the pixels and building an internal illustration that converts the raw image pixels into a complex features present within the image.
This is also a reason why the convolutional neural networks can generalize well: they can capture the deviating and defining features within classes (e.g., cats vs. dogs) that are agnostic where the image is fed into the model and output arrangement label, the model deliver as a complex feature extractor. By accessing intermediate layers of the model, we’re able to describe the style and content of input images.
Build the model
The network in tf.keras.applications are defined, so we can easily extract the intermediate layer values using the Keras functional API.
To define any model using the functional API, specify the inputs and outputs:
model= Model(inputs, outputs)
The given function builds a VGG19 model that returns a list of intermediate layer.

1. def vgg_layers(layer_names):
2. """ Creating a vgg model that returns a list of intermediate output values."""
3. # Load our model. Load pretrained VGG, trained on imagenet data
4. vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
5. vgg.trainable = False
6. outputs = [vgg.get_layer(name).output for name in layer_names]
7. model = tf.keras.Model([vgg.input], outputs)
8. return model

1. style_extractor = vgg_layers(style_layers)
2. style_outputs = style_extractor(style_image*255)
3. #Look at the statistics of each layer's output
4. for name, output in zip(style_layers, style_outputs):
5. print(name)
6. print(" shape: ", output.numpy().shape)
7. print(" min: ", output.numpy().min())
8. print(" max: ", output.numpy().max())
9. print(" mean: ", output.numpy().mean())
10. print()
shape: (1, 427, 512, 64)
min: 0.0
max: 763.51953
mean: 25.987665

shape: (1, 213, 256, 128)
min: 0.0
max: 3484.3037
mean: 134.27835

shape: (1, 106, 128, 256)
min: 0.0
max: 7291.078
mean: 143.77878

shape: (1, 53, 64, 512)
min: 0.0
max: 13492.799
mean: 530.00244

shape: (1, 26, 32, 512)
min: 0.0
max: 2881.529
mean: 40.596397

Gram matrix:

Calculating style
The content of the image is represented by the values of the common features of the map.
Calculate a Gram Matrix, which includes this information by taking the output product over all locations.
The Gram matrix can be calculated for a particular layer as:

tensorFlow 5

This is implemented concisely using the tf.linalg.einsum function:

1. def gram_matrix(input_tensor):
2. result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
3. input_shape = tf.shape(input_tensor)
4. num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
5. return result/(num_locations)

Extracting the style and content of image

Building the model that returns the content and style tensor.

1. class StyleContentModel(tf.keras.models.Model):
2. def __init__(self, style_layers, content_layers):
3. super(StyleContentModel, self).__init__()
4. self.vgg = vgg_layers(style_layers + content_layers)
5. self.style_layers = style_layers
6. self.content_layers = content_layers
7. self.num_style_layers = len(style_layers)
8. self.vgg.trainable = False
9. def call(self, inputs):
10. "Expects float input in [0,1]"
11. inputs = inputs*255.0
12. preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
13. outputs = self.vgg(preprocessed_input)
14. style_outputs, content_outputs = (outputs[:self.num_style_layers],outputs[self.num_style_layers:])
15. style_outputs = [gram_matrix(style_output)
16. for style_output in style_outputs]
18. content_dict = {content_name:value for content_name, value in zip(self.content_layers, content_outputs)}
19. style_dict = {style_name:value
20. for style_name, value
21. in zip(self.style_layers, style_outputs)}
22. return {'content':content_dict, 'style':style_dict}
When called on the image, this model returns the gram matrix (style) of the style_layers and content of the content_layers:
1. extractor = StyleContentModel(style_layers, content_layers)
2. results = extractor(tf.constant(content_image))
3. style_results = results['style']
4. print('Styles:')
5. for name, output in sorted(results['style'].items()):
6. print(" ", name)
7. print(" shape: ", output.numpy().shape)
8. print(" min: ", output.numpy().min())
9. print(" max: ", output.numpy().max())
10. print(" mean: ", output.numpy().mean())
11. print()
12. print("Contents:")
13. for name, output in sorted(results['content'].items()):
14. print(" ", name)
15. print(" shape: ", output.numpy().shape)
16. print(" min: ", output.numpy().min())
17. print(" max: ", output.numpy().max())
18. print(" mean: ", output.numpy().mean())
shape: (1, 64, 64)
min: 0.0055228453
max: 28014.557
mean: 263.79025

shape: (1, 128, 128)
min: 0.0
max: 61479.496
mean: 9100.949

shape: (1, 256, 256)
min: 0.0
max: 545623.44
mean: 7660.976

shape: (1, 512, 512)
min: 0.0
max: 4320502.0
mean: 134288.84

shape: (1, 512, 512)
min: 0.0
max: 110005.37
mean: 1487.0381

shape: (1, 26, 32, 512)
min: 0.0
max: 2410.8796
mean: 13.764149

Run gradient descent

With this style and content extractor, we implement the style transfer algorithm. Do this by evaluating the mean square error in our image’s output relative to each target, then take the weighted sum of the losses.
Set our style and content target values:

1. style_targets = extractor(style_image)['style']
2. content_targets = extractor(content_image)['content']
Define a tf.Variable to contain the image to hold. Initialize it with the help of content image (the tf.Variable be the same shape as the content image):
1. image = tf.Variable(content_image)
This is a floating image, define a function to keep the pixel value between 0 and 1:
1. def clip_0_1(image):
2. return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
Create the optimizer. The paper recommends LBFGS:
1. opt = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)
To optimizing it, use a weight combination of the two losses to get the total loss:
1. style_weight=1e-2
2. content_weight=1e4

1. def style_content_loss(outputs):
2. style_outputs = outputs['style']
3. content_outputs = outputs['content']
4. style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2)
5. for name in style_outputs.keys()])
6. style_loss *= style_weight / num_style_layers
8. content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2)
9. for name in content_outputs.keys()])
10. content_loss *= content_weight / num_content_layers
11. loss = style_loss + content_loss
12. return loss
Use the function tf.GradientTape to update the image.
1. @tf.function()
2. def train_step(image):
3. with tf.GradientTape() as tape:
4. outputs = extractor(image)
5. loss = style_content_loss(outputs)
6. grad = tape.gradient(loss, image)
7. opt.apply_gradients([(grad, image)])
8. image.assign(clip_0_1(image))
Run below steps to test:
1. train_step (image)
2. train_step (image)
3. train_step (image)
4. plt.imshow(image.read_value()[0])


tensorFlow 6

Transforming the image

Performing a longer optimization in this step:
1. import time
2. start = time.time()
4. epochs = 10
5. steps_per_epoch = 100
6. step = 0
7. for n in range(epochs):
8. for m in range(steps_per_epoch):
9. step += 1
10. train_step(image)
11. print(".", end='')
12. display.clear_output(wait=True)
13. imshow(image.read_value())
14. plt.title("Train step: {}".format(step))
16. end = time.time()
17. print("Total time: {:.1f}".format(end-start))


tensFlow 2
tensFlow 3
tensFlow 4
tensFlow 5
tensFlow 6
tensFlow 7
tensFlow 8
tensFlow 9
tensFlow 10
tensFlow 11

Total variation loss

1. def high_pass_x_y(image):
2. x_var = image[:,:,1:,:] - image[:,:,:-1,:]
3. y_var = image[:,1:,:,:] - image[:,:-1,:,:]
4. return x_var, y_var

1. x_deltas, y_deltas = high_pass_x_y(content_image)
2. plt.figure(figsize=(14,10))
3. plt.subplot(2,2,1)
4. imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Original")
5. plt.subplot(2,2,2)
6. imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Original")
7. x_deltas, y_deltas = high_pass_x_y(image)
8. plt.subplot(2,2,3)
9. imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Styled")
10. plt.subplot(2,2,4)
11. imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Styled")


tensFlow 12

tensFlow 13

This shows how the high frequency component have increased.
This high frequency component is an edge-detector. We get same output from the edge detector, from the given example:

1. plt.figure(figsize=(14,10))
2. sobel = tf.image.sobel_edges(content_image)
3. plt.subplot(1,2,1)
4. imshow(clip_0_1(sobel [...,0]/4+0.5), "Horizontal Sobel-edges")
5. plt.subplot(1,2,2)
6. imshow(clip_0_1(sobel[...,1]/4+0.5), "Vertical Sobel-edges")


tensFlow 14

The regularization loss associated with this is sum of the square of the value:

1. def total_variation_loss(image):
2. x_deltas, y_deltas = high_pass_x_y(image)
3. return tf.reduce_sum(tf.abs(x_deltas)) + tf.reduce_sum(tf.abs(y_deltas))

1. total_variation_loss(image).numpy()
That demonstrate what it does. But there's no need to implement it ourselves, it includes a standard implementation:
1. tf.image.total_variation(image).numpy()
array([99172.59], dtype=float32)

Re-running the optimization function

Pick the weight for the function total_variation_loss:

1. total_variation_weight=30
Now, train_step function:
1. @tf.function()
2. def train_step(image):
3. with tf.GradientTape() as tape:
4. outputs = extractor(image)
5. loss = style_content_loss(outputs)
6. loss += total_variation_weight*tf.image.total_variation(image)
7. grad = tape.gradient(loss, image)
8. opt.apply_gradients([(grad, image)])
9. image.assign(clip_0_1(image))
Reinitializing the optimization variable:
1. image = tf.Variable(content_image)
And run the optimization:
1. import time
2. start = time.time()
4. epochs = 10
5. steps_per_epoch = 100
7. step = 0
8. for n in range(epochs):
9. for m in range(steps_per_epoch):
10. step += 1
11. train_step(image)
12. print(".", end='')
13. display.clear_output(wait=True)
14. display.display(tensor_to_image(image))
15. print("Train step: {}".format(step))
16. end = time.time()
17. print("Total time: {:.1f}".format(end-start))


tensFlow 15

finally save the result:

1. file_name = 'styletransfer.png'
2. tensor_to_image(image). save(file_name)
3. try: from google. colab import files
4. except ImportError:
5. pass
6. else:

