TensorFlow Text Classification

Text Classification using Tensorflow:

Now a days , text classification is one of the most important part of the machine learning applications.The peoples communication is almost in text format like emails , chats, tweets and comments and generally it’s a task assigning the right label to given text. This text may be in the form of sentences,words,phrases and paragraphs etc..

In this text classification our aim would be we can take a some text and assign a label to it and in this we going train a neural network with huge data indicating what a piece of text represents. I hope you heard about sentiment analysis. This is the best example for the text classification. Once you can observe this sentiment analysis the movie was awesome, it is a positive statement the food was ugly, it is a negative statement and sun raises from east, it is a neutral statement . this is one type of use case.

And another type of text classification is to create a chatboats and document parsing applications.

In the tensorflow the text classification is done in several steps. They are text preprocessing and usage of bag-of-words. The natural language processing is heavily used in this text classification. For this we have to know the some concepts that we will be using. Those concepts are

=>> Tokenization

=>> Stemming

=>> Bag of words

I) Tokenization – These are basically words. Tokenization is used for taking a piece of text and cuts into tokens find out all the unique words in the text. We would get a list of words in the text as the output of tokens.

For example:

For the sentence “Machine learning and NLP is just going great” we have the token list [ “Machine learning”, “NLP”, ïs”, “just”, “going”, “great”]. So, by using tokenization break the text into words.

II) Stemming – Stemming is a process it is used to get the root word of respective words. We are using number of words in a particular text , in that text the words that are being used often inflected or derived. To standardize our process, we would like to stem such words we get root words. For example, the process will convert the following words “walking”, “walked”, “walker” to its root word “walk“.

III) Bag of Words – The Bag of Words is used in Text Processing is to create a unique list of words. This Bag of Words mainly used as a tool for feature generation.

For Example:

Star Wars is better than Star Trek.

Star Trek isn’t as good as Star Wars.

For the above two statements, the bag of words are: [“Star”, “Wars”, “better”, “Trek”, “good”, “isn’t”, “is”, “as”].

The above list of words are the unique words in the above sentences. The postion of the word is fixed in their list and we construct a binnary array for the feature classification.(either 1 or 0).

For ex:

A new sentence “Star wars is good”. It will represented in the binary form like this.

[1,1,0,0,1,0,1,0]. Once observe in the array position 1 and 2 is set to 1 because the position of words are in our bag of words. Otherwise we can set it to 0.

Data Preparation:

First, we have to create our data with json file that will hold our rfequired data for training.before we train our model that can classify a text to specific category. Below is the sample file that I have created. Yo can also create your data as you want.

For this we can take a data set of DBPedia it contains no of features and basically it is called as topic selection and can be used in number of different ways.

# Prepare training and testing data  
dbpedia = learn.datasets.load_dataset('dbpedia', size='')  
x_train = pandas.DataFrame(dbpedia.train.data)[1]  
y_train = pandas.Series(dbpedia.train.target)  
x_test = pandas.DataFrame(dbpedia.test.data)[1]  
y_test = pandas.Series(dbpedia.test.target)   # Process vocabulary  
vocab_processor = learn.preprocessing.VocabularyProcessor(
  MAX_DOCUMENT_LENGTH)  
x_train = np.array(list(vocab_processor.fit_transform(x_train))) 
x_test = np.array(list(vocab_processor.transform(x_test)))
n_words = len(vocab_processor.vocabulary_)
print('Total words: %d' % n_words)

The Tensorflow has the datasets learn_datasets module that consists of few examples that you can acces it and load it in memory and to load full data you can pass an empty string.

In this text classification , we are going to convert the sentences to matrices for this we find total words in the text and remap into different id ‘s and a number for each uniqe word. For example “I am here” this will be mapped into [12,43,55,45]. Then what about 45.?

So we want to make as each sentence as same length. For this we have a function

MAX_DOCUMENT_LENGTH that will identify how long the each sentence. If it is long it will truncated and it will short it will padded.

Now the x_train and x_test will contain only matrices and that matrix pass to our learning algorithm.

def bag_of_words_model(features, target):  
  """A bag-of-words model. Note it disregards the word order in the text."""  
  target = tf.one_hot(target, 15, 1, 0)  
  features = tf.contrib.layers.bow_encoder(      
      features, vocab_size=n_words, embed_dim=EMBEDDING_SIZE)  
  logits = tf.contrib.layers.fully_connected(features, 15,
      activation_fn=None)  
  loss = tf.contrib.losses.softmax_cross_entropy(logits, target)
  train_op = tf.contrib.layers.optimize_loss(
      loss, tf.contrib.framework.get_global_step(),      
      optimizer='Adam', learning_rate=0.01)  
      {'class': tf.argmax(logits, 1), 
  return (      
       'prob': tf.nn.softmax(logits)},      
      loss, train_op)

Here we create a basic TensorFlow model function, that takes features (list of word IDs) and output or target (one of 15 classes). For this we are using a function simple bow_encoder combines creation of embedding matrix, obeserve the ID in the input and then averaging them. After this we add a fully connected layer on top and the use it to compute the loss and classification results using tf.argmax(logits, 1) and Adding training regime ( with 0.01 learning rate) and that’s our function.

Now we are invoking this with training data and prepared we can observe how the bag of words work for this problem:

classifier = learn.Estimator(model_fn=bag_of_words_model) 
# Train and predict 
classifier.fit(x_train, y_train, steps=10000) 
y_predicted = [ p[‘class’] for p in 
  classifier.predict(x_test, as_iterable=True)]
score = metrics.accuracy_score(y_test, y_predicted) 
print(‘Accuracy: {0:f}’.format(score))

Now, you can change with training steps and training regime (different learning rate and other parameters optimize_loss has).

But all we know the bag of words is not really modeling how languages work — the order of words are matter (even though less then you would think in practice) here and we want to handle that one also.

There are some few ways how one can do — add bi-grams, use convolution to learn n-grams over text.only Recurrent Neural Network to handle long term dependencies in text. For various problems any of this methods can work better. You can observe the examples of all are implemented here (including characters):

So we are going with Reccurent neural networks:

def rnn_model(features, target):  
  """RNN model is used to predict from sequence of words to a class."""  
  # Convert indexes of words into embeddings.  
  # This is used to creates embeddings matrix of [n_words, EMBEDDING_SIZE] and
  # after maps the word indexes of the sequence into [batch_size, 
  # sequence_length, EMBEDDING_SIZE].    word_vectors = tf.contrib.layers.embed_sequence(      
    features, vocab_size=n_words, embed_dim=EMBEDDING_SIZE, scope='words')     # here split it into a list of embedding per word, while removing doc length
  # dim. word_list results these are list of tensors [batch_size, 
  # EMBEDDING_SIZE].    word_list = tf.unstack(word_vectors, axis=1)  # we can reate a Gated Recurrent Unit cell with a hidden size of EMBEDDING_SIZE.  cell = tf.nn.rnn_cell.GRUCell(EMBEDDING_SIZE)     # after that we can create an unrolled Recurrent Neural Networks to length.
  # MAX_DOCUMENT_LENGTH and  we can passes the word_list as inputs for each 
  # unit.    _, encoding = tf.nn.rnn(cell, word_list, dtype=tf.float32)     # To encoding of RNN,we must  take encoding of last step (e.g hidden 
  #the size of the neural network of last step) and it pass as features 
  # fully connected layer to output probabilities per each class.    target = tf.one_hot(target, 15, 1, 0)  
  logits = tf.contrib.layers.fully_connected encoding, 15,activation_fn=None)  
  loss = tf.contrib.losses.softmax_cross_entropy(logits, target)     # Create a training op.

  train_op = tf.contrib.layers.optimize_loss(      
     loss, tf.contrib.framework.get_global_step(),      
     optimizer='Adam', learning_rate=0.01, clip_gradients=1.0)   
  return (      
     {'class': tf.argmax(logits, 1), 
      'prob': tf.nn.softmax(logits)},      
     loss, train_op)

Here , comments also includes to decribe the each step of the code.the same estimator is call the same function that one also run this model and see the improvements. So finally you got an idea about how to apply the basics about text classification