Lighting the way to deep machine learning

Open source Torchnet helps researchers and developers build rapid and reusable prototypes of learning systems in Torch.

Building rapid and clean prototypes for deep machine-learning operations can now take a big step forward with Torchnet, a new software toolkit that fosters rapid and collaborative development of deep learning experiments by the Torch community.

Introduced and open-sourced this week at the International Conference on Machine Learning (ICML) in New York, Torchnet provides a collection of boilerplate code, key abstractions, and reference implementations that can be snapped together or taken apart and then later reused, substantially speeding development. It encourages a modular programming approach, reducing the chance of bugs while making it easy to use asynchronous, parallel data loading and efficient multi-GPU computations.

The new toolkit builds on the success of Torch, a framework for building deep learning models by providing fast implementations of common algebraic operations on both CPU (via OpenMP/SSE) and GPU (via CUDA).

A framework for experimentation

Although Torch has become one of the main frameworks for research in deep machine learning, it doesn't provide abstractions and boilerplate code for machine-learning experiments. So researchers repeatedly code their experiments from scratch and march over the same ground — making the same mistakes and possibly drawing incorrect conclusions — which slows development overall. We created Torchnet to give researchers clear guidelines on how to set up their code, and boilerplate code that helps them develop more quickly.

The modular Torchnet design makes it easy to test a series of coding variants focused around the data set, the data loading process, and the model, as well as optimization and performance measures. This makes rapid experimentation possible. Running the same experiments on a different data set, for instance, is as simple as plugging in a different (bare-bones) data loader, and changing the evaluation criterion amounts to a one-line code change that plugs in a different performance meter. (More detailed information can be found at the Github repository here and in the Torchnet research paper here.)

Torchnet’s overarching design is akin to Legos, in that the building blocks are built on a set of conventions that allow them to be snapped together easily. The interlocked chunks make a universal system in which engaged pieces fit together firmly yet can be replaced easily by other pieces. We've also developed clear guidelines on how to build new pieces.

The open source Torch already has a very active developer community that has created packages for optimization, manifold learning, metric learning, and neural networks, among other things. Torchnet builds on this, and it is designed to serve as a platform to which the research community can contribute, primarily via plugins that implement machine-learning experiments or tools.

Powered for GPUs

Although machine learning and artificial intelligence have been around for many years, most of their recent advances have been powered by publicly available research data sets and the availability of more powerful computers — specifically ones powered by GPUs.

Torchnet is substantially different from deep learning frameworks such as Caffe, Chainer, TensorFlow, and Theano in that it does not focus on performing efficient inference and gradient computations in deep networks. Instead, Torchnet provides a framework on top of a deep learning framework (in this case, torch/nn) that makes rapid experimentation easier.

Torchnet provides a collection of subpackages and implements five main types of abstractions:

Datasets — provide a size function that returns the number of samples in the data set, and a get(idx) function that returns the idx-th sample in the data set.
Dataset Iterators — a simple for loop that runs from one to the data set size and calls the get() function with loop value as input.
Engines — provides the boilerplate logic necessary for training and testing models.
Meter — used for performance measurements, such as the time needed to perform a training epoch or the value of the loss function averaged over all examples.
Logs — for logging experiments.

The most important subpackages provide implementations of boilerplate code that is relevant to machine-learning problems. These include computer vision, natural language processing, and speech processing.

Other subpackages may be smaller and focus on more specific problems or even specific data sets. For instance, small subpackages that wrap vision data sets such as the Imagenet and COCO data sets, speech data sets such as the TIMIT and LibriSpeech data sets, and text data sets such as the One Billion Word Benchmark and WMT-14 data sets.

Example

This section presents a simple, working example of how to train a logistic regressor on the MNIST data set using Torchnet. The code first includes necessary dependencies:


require ’nn’
local tnt   = require ’torchnet’
local mnist = require ’mnist’

Subsequently, we define a function that constructs an asynchronous data set iterator over the MNIST training or test set. The data set iterator receives as input a closure that constructs the Torchnet data set object. Here, the data set is a ListDataset that simply returns the relevant row from tensors that contain the images and the targets; in practice, you would replace this ListDataset with your own data set definition. The core data set is wrapped in a BatchDataset to construct mini-batches of size 128:


local function getIterator(mode)
  return tnt.ParallelDatasetIterator{
    nthread = 1,
    init    = function() require 'torchnet' end,
    closure = function()
      local dataset = mnist[mode .. 'dataset']()
      return tnt.BatchDataset{
         batchsize = 128,
         dataset = tnt.ListDataset{
           list = torch.range(
             1, dataset.data:size(1)
           ), 
           load = function(idx)
             return {
               input  = dataset.data[idx],
               target = torch.LongTensor{
                 dataset.label[idx]
               },
             } -- sample contains input and target
           end,
        }
      }
    end,
  }
end

Subsequently, we set up a simple linear model:


local net = nn.Sequential():add(nn.Linear(784,10))

Next, we initialize the Torchnet engine and implement hooks that reset, update, and print the average loss and the average classification error. The hook that updates the average loss and classification error is called after the forward() call on the training criterion:


local engine = tnt.SGDEngine()
local meter  = tnt.AverageValueMeter()
local clerr  = tnt.ClassErrorMeter{topk = {1}}
engine.hooks.onStartEpoch = function(state)
  meter:reset()
  clerr:reset()
end
engine.hooks.onForwardCriterion = 
function(state)
  meter:add(state.criterion.output)
  clerr:add(
    state.network.output, state.sample.target)
  print(string.format(
    'avg. loss: %2.4f; avg. error: %2.4f',
    meter:value(), clerr:value{k = 1}))
end

Next, we minimize the logistic loss using SGD:


local criterion = nn.CrossEntropyCriterion()
engine:train{
  network   = net,
  iterator  = getIterator('train'),
  criterion = criterion,
  lr        = 0.1,
  maxepoch  = 10,
}

After the model is trained, we measure the average loss and the classification error on the test set:


engine:test{
  network   = net,
  iterator  = getIterator(’test’),
  criterion = criterion,
}

More advanced examples would likely implement additional hooks in the engine. For instance, if you want to measure the test error after each training epoch, this may be implemented in the engine.hooks.onEndEpoch hook. Making the same example run a GPU requires a few simple additions to the code — in particular, to copy both the model and the data to the GPU. Copying data samples to a buffer on the GPU3 can be performed by implementing a hook that is executed after the samples become available:


require 'cunn'
net       = net:cuda()
criterion = criterion:cuda()
local input  = torch.CudaTensor()
local target = torch.CudaTensor()
engine.hooks.onSample = function(state)
  input:resize( 
      state.sample.input:size()
  ):copy(state.sample.input)
  target:resize(
      state.sample.target:size()
  ):copy(state.sample.target)
  state.sample.input  = input
  state.sample.target = target
end

Collaborative intelligence

The goal of open-sourcing Torchnet is to empower the developer community, allowing it to rapidly build effective and reusable learning systems. Experimentation can flourish as prototypes are snapped together more quickly. Successful implementations can be easily reproduced, and bugs are diminished.

We hope that Torchnet channels the collaborative intelligence of the Torch community so we can all work together to create more effective deep learning experiments.