Optical Character Recognition with One-Shot Learning, RNN, and TensorFlow

by Sophia TurolMarch 9, 2017
Straightforwardly coded into Keras on top TensorFlow, a one-shot mechanism enables token extraction to pluck out information of interest from a data source.

Generating expense reports with machine learning

Optical character recognition (OCR) drives the conversion of typed, handwritten, or printed symbols into machine-encoded text. However, the OCR process brings the need to eliminate possible errors, while extracting only valuable data from ever-growing amount of it.

At the recent TensorFlow meetup, the attendees learnt how employing the one-shot attention mechanism for token extraction in Keras using TensorFlow as a back end can help out. In addition, the meetup discussed how to enable multilingual neural machine translation with TensorFlow.

Mike Stark

Mike Stark, a data scientist at Concur, shared his experience of enabling an application to automatically generate expense reports from the photos of receipts. Relying on optical character recognition, the solution is able to convert images into reports, while employing machine learning techniques to extract important information from the OCR text. In contrast to regular expression matching, machine learning allows for automatically learning a large number of features and ongoing retraining as the amount of receipts grows.

The valuable receipt data to be extracted includes:

  • transaction amount
  • transaction date
  • currency
  • vendor
  • location


At the classification stage, the entire text of a receipt is split into words, which become features for a classifier. Then, candidate strings are extracted from the receipt by pattern matching. The text surrounding the strings becomes the features of the regression algorithm that predicts the likelihood of each string being the result.


Training with the one-shot attention mechanism

Here, recurrent neural networks come into play. The text is fed into the neural network character by character and the network is triggered to generate either a classification or a sequence of characters.


As long as not all the information in the receipt is valuable (e.g., expense type or a phone number), one needs to enable token extraction. For that purpose, Mike applied the one-shot attention mechanism, which is easy to train and is straightforwardly coded into Keras running on top of TensorFlow.

class Concurrence(Layer):

def build(self, input_shape):
    self.input_spec = [InputSpec(shape=input_shape)]
    self.input_dim = input_shape[2]

    self.W = self.add_weight((self.input_dim, 1),
    super(Concurrence, self).build(input_shape)

def call(self, x, mask=None):
    attention = K.softmax(K.squeeze(K.dot(x, self.W), 2))
    return K.batch_dot(x, attention, (1, 1))

Here’s also a sample code of a model running on Keras.

model = Sequential()
model.add(Bidirectional(GRU(hidden_size, return_sequences=True), merge_mode='concat',
                            input_shape=(None, input_size)))

model.add(RepeatVector(max_out_seq_len + 1))
model.add(GRU(hidden_size * 2, return_sequences=True))
model.add(TimeDistributed(Dense(output_dim=output_size, activation="softmax")))
model.compile(loss="categorical_crossentropy", optimizer="rms_prop")


Want details? Watch the video!


Below, you can also check out the slides by Mike.


Further reading


About the expert

Mike Stark has been an academic astronomer for many years, concentrating on black holes and neutron stars observed via satellites. In 2015, he followed his growing interest in machine learning out of academia and into Concur. At Concur, as part of a data science group, Mike is working on machine learning solutions to various problems created by and/or addressable with large volumes of data. He is particularly interested in the surprising power of recurrent neural networks. You can also check out Mike’s GitHub profile.