Using Convolution Neural Networks to Classify Text in PyTorch

5 min readMay 4, 2020

Task: text classification
Model: Convolutional neural network (CNN)
Difficulty: Low

In this post, I want to introduce a simple yet handy architecture (paper). It is basic enough if you just want to try out text classification on your labeled data and get a quick result as your first step in training your model.

Convolutional neural network for text classification

Convolutional neural network (CNN) is a kind of typical artificial neural network. In this kind of network, the output of each layer is used as the input of the next layer of neuron. Multi-layer convolution operation is used to transform the results of each layer by nonlinear until the output layer. In general, the convolution neural network model used in text analysis.which includes four parts: embedding layer, convolutional layer, pooling layer and fully connected layer.

CNN is used heavily in image classifications, but can also be used for text classification with the same idea. The only difference is that the input layer of the CNN model used in text analysis is the word vector extracted from pre-trained embeddings such as Word2Vec.

Processing the datasets

In this text classification task, we want to classify the alt-text (usually a short sentence) of an image into categories like entertainment, politics, travel, etc.

class_labels = {
 0: ‘Lifestyle&Activity’,
 1: ‘Food’,
 2: ‘Entertainment’,
 3: ‘Sports’,
 4: ‘Home’,
 5: ‘Automotive’,
 7: ‘Technology’,
 8: ‘Entertainment’,
 9: ‘Travel’,
 10: ‘Retail’,
 11: ‘Politics’,
}

The data is read in via csv file into memory and trained by batch (batch size=32) containing the data (the alt-text) and corresponding labels (classification). The alt-text is passed through spaCy to extract the Word2Vec features on the entire short sentence. The features vector will be used as the embedding layer in our CNN model for training.

from docopt import docopt
import spacy
nlp = spacy.load('en_core_web_md')# load data
args = docopt(__doc__)
input_file = args[“<input_file>”]#  extract features on alt-text for training and do spilt randomly for train/test datasets
all_data = pd.read_csv(open(input_file, 'rb'), encoding='utf-8', engine='c', header=None)
data_sample_size=1.0, test_sample_size=0.2
all_data = all_data.sample(frac=data_sample_size)
msk = np.random.rand(len(all_data)) < 1 - test_sample_size
test = all_data[~msk]
train = all_data[msk]
labellist, datalist = [], []
for df in [train, test]:
    labels = df[0].tolist()
    labellist.append(labels)
    # word2vec embeddings for entire sentence    
    data = [[nlp(v).vector] for v in df[1].tolist()]      datalist.append(data)data, labels, test_data, test_labels = datalist[0], labellist[0], datalist[1], labellist[1]

Defining the model

Now we’ve got the data and labels ready, it’s time to setup our model to do training. Here we’re creating a simple CNN model, using the PyTorch.nn module as parent it takes care of things like back propagation for us.

from torch import nn
import torch
from torch import optim
import torch.nn.functional as F
LINEAR_DIMENTION = 4800class SimpleCNN(nn.Module):
 def __init__(self, nclasses: int, window_size: int = 16, embedding_dim: int = 16,
 filter_multiplier: int = 64):

 super(SimpleCNN, self).__init__()
 self.simpleconv = nn.Conv2d(in_channels=1, out_channels=filter_multiplier, stride=CONV_STRIDE, kernel_size=3, padding=1)
 self.maxpool = nn.MaxPool2d(kernel_size=3, stride=CONV_STRIDE, padding=1)
 self.linear = nn.Linear(LINEAR_DIMENTION, out_features=nclasses)
 self.softmax = nn.LogSoftmax(1)def forward(self, x, **kwargs): # shape: (batch, 16, 768)
 # input x is already embedding vectors from spaCy
 # set the channel dim to a new dimension that’s just 1
 x = torch.transpose(x, 1, 2) # (batch, 1, 300)
 x = torch.unsqueeze(x, 1) # (batch, 1, 300, 1)
 x = self.simpleconv(x) # (batch, 64, 150, 1)
 x = self.maxpool(x) # (batch, 64, 75, 1) 
 F.relu(x) # non-linear function to activate the neurons
 x = x.flatten(start_dim=1) # (batch, 4800) x = self.linear(x)
 F.relu(x)
 x = self.softmax(x)
 return x

To do text classification using CNN model, the key part is to make sure you are giving the tensors it expects. CNN models for image classification usually has input of three dimensions, literally the RGB channels. To make the tensor shape to fit CNN model, first we transpose the tensor so the embedding features is in the second dimension. Then we unsqueeze it to add an extra dimension to match the convolution layer requirements of three channels.

After running through simpleconv and maxpool layers, we will get a tensor of (batch, 64, 75, 1). Run the results through a non-linear function (here using ReLu) to activate the neurons and finally flatten the tensor so we can map the tensor to the classes in a linear function. The flattened tensor size is (batch, 64*75*1) and will be mapped to the 12 classes we’re seen above.

Where does the number 300 comes from? As text embedded with spaCy, it returns a vector size of 300 features for each given word or sentence. Since we are viewing the whole sentence as our input data instead of word by word, the model input shape is batch size=32, 1, 300; where 1 represents the one embedding vector and 300 represents the Word2Vec model embedding size. You can also tweak it to return vectors for each word, but be aware to pad the sentence (cut off at 16, and add paddings at the end if less then 16 words) so your tensors will have consistence dimension.

Training the model

Now we have the data and the model, it’s time to run it and examine the results. The optimize we’re using is SGD; and the loss function we’re using here is the negative log likelihood loss which is useful to train a classification problem with a number of classes.

num_classes = 12 
batch_size = 32
learning_rate = 0.1
num_epochs = 10
output_path = “vdcnn.torch”optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.8)
lossfn = nn.NLLLoss()for epoch in range(num_epochs):
 running_loss = 0.
 for i, batch in enumerate(utils.batch_iter(data, labels, batch_size=batch_size, num_epochs=1)): # create a new generator every epoch
 input_ = batch[0]
 label = batch[1]
 optimizer.zero_grad()
 outputs = model(input_)
 loss = lossfn(outputs, label)
 loss.backward()
 optimizer.step() running_loss += loss.item() if True: 
  print(‘[%d, %5d] loss: %.6f’ %
   (epoch + 1, i + 1, running_loss / batch_size))
  running_loss = 0.0model = model.eval()
num_correct = 0
num_examples = 0# Check accuracy
for i, batch in enumerate(utils.batch_iter(test_data, test_labels, batch_size=batch_size, num_epochs=1)):
  y, target = batch
  output = model(y)
  y_pred = []
  # 32x12 -> need to convert softmax to actual class number
  for o in output:
     y_pred.append(torch.max(o, -1).indices.item())
     print("batch ", i, ": ", y_pred, target)
     y_pred = torch.FloatTensor(y_pred)
     correct = torch.eq(torch.round(y_pred).type(target.type()), target).view(-1)
     num_correct += torch.sum(correct).item()
     num_examples += correct.shape[0]
print("Accuracy: ",  num_correct / num_examples)

Tips: always try to overfit first

There’s times when your model failed to converge. It might be that your model architecture is not set up correctly, the layers might not make sense, or the data itself is corrupted. The easiest way to test out your model architecture is to train it on a small fake dataset and see if you can get 100% accuracy.