Transfer Learning from InceptionV3 to Classify Images

Jenny Ching
3 min readMay 4, 2020

--

task: image classification
model: inceptionv3
difficulty: low

As we’ve talked about text classification in the last post, we can easily reuse that same method for image classification leveraging inceptionV3 model. Instead of training the model ourselves (which could take days running on multiple GPUs), we extract the features from the inception model and train it on same classes from the last post so we now can predict both text and images into our taxonomy. Next time when we see a new image and extract its alt-text, we can classify the image and alt-text separately and probably train another model to weight the results.

Processing the datasets

Similar to text classification text, we have the taxonomy for image classification below.

class_labels = {
0: ‘Lifestyle&Activity’,
1: ‘Food’,
2: ‘Entertainment’,
3: ‘Sports’,
4: ‘Home’,
5: ‘Automotive’,
7: ‘Technology’,
8: ‘Entertainment’,
9: ‘Travel’,
10: ‘Retail’,
11: ‘Politics’,
}

For each batch, we want to load the image and convert it into input tensor of size (3x299x299) by resizing and converting each image into PNG format. As for loading the image, it’s a good idea to save it in a database or message queue system and pull them into memory when training. You can also download the image and open it with PIL.Image.open(filename) with:

import wget
import requests
import validators
from PIL import Image
preprocess = transforms.Compose([
transforms.Resize(299),
transforms.CenterCrop(299),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_batch = None
filtered_labels_list = []
for i, url in enumerate(img_urls):
filename = “../images/” + url.split(“/”)[-1]
# first validate the image url
if validators.url(url) != True:
continue
if path.exists(filename) == False:
wget.download(url, filename)
input_image = Image.open(filename)
if filename[-3:] != “jpg”:
input_image = input_image.convert(“RGB”)
input_tensor = preprocess(input_image)
if input_batch == None:
input_batch = torch.unsqueeze(input_tensor, 0) # (1x3x229x229)
else:
input_batch = torch.cat((input_batch, torch.unsqueeze(input_tensor, 0)), dim=0) # (Nx3x229x229)
filtered_labels_list.append(labels[i])filtered_labels = torch.as_tensor(list(filtered_labels_list), dtype=torch.int64)

Loading InceptionV3 and extract image features for training

In the last post, we are extracting features from Word2Vec for text embeddings. Similarly, here we’re extracting features from InceptionV3 for image embeddings. First we load the pytorch inception_v3 model from torch hub. Then, we pass in the preprocessed image tensor into inception_v3 model to get out the output. Inception_v3 model has 1000 classes in total, so we are mapping those 1000 classes to our 12 classes.

We’re using cross entropy as the loss function and optimized with auxiliary classifiers (refer to https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958).

import torch
model = torch.hub.load(‘pytorch/vision:v0.6.0’, ‘inception_v3’, pretrained=True)
num_classes = 12
batch_size = 32
learning_rate = 0.1
num_epochs = 10
output_path = "vdcnn.torch"
if torch.cuda.is_available():
model.to("cuda:0")
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.8)

criterion = nn.CrossEntropyLoss()
for epoch in range(num_epochs):
running_loss = 0.
for i, batch in enumerate(batch_iter(urls_train, labels_train, batch_size=batch_size, num_epochs=1)):
input_, label = load_images_batch_with_label(batch[0], batch[1])
optimizer.zero_grad()
output, aux_output = model(input_) # (batch, 1000 classes)
loss1 = criterion(output, label)
loss2 = criterion(aux_output, label)
loss = loss1 + loss2 * 0.4
loss.backward()
optimizer.step()
running_loss += loss.item()
if True:
print('[%d, %5d] loss: %.6f' %
(epoch + 1, i + 1, running_loss / batch_size))
running_loss = 0.0
model = model.eval()
num_correct = 0
num_examples = 0
# testing accuracy
for i, batch in enumerate(batch_iter(urls_test, labels_test, batch_size=batch_size, num_epochs=1)):
y, target = load_images_batch_with_label(batch[0], batch[1])
output = model(y)
y_pred = []
# 32x12 -> need to convert to actual class number
for o in output:
y_pred.append(torch.max(o, -1).indices.item())
y_pred = torch.FloatTensor(y_pred)
correct = torch.eq(torch.round(y_pred).type(target.type()), target).view(-1)
num_correct += torch.sum(correct).item()
num_examples += correct.shape[0]
print("Accuracy: ", num_correct / num_examples)

--

--

No responses yet