Method (expand all | collapse all) | Error (%) | |
---|---|---|

Regularization of Neural Networks using DropConnect (ICML 2013) | 0.21% | |

We introduce DropConnect, a generalization of Dropout (Hinton et al., 2012), for regularizing
large fully-connected layers within neural networks. When training with Dropout,
a randomly selected subset of activations are set to zero within each layer. DropConnect
instead sets a randomly selected subset of weights within the network to zero.
Each unit thus receives input from a random subset of units in the previous layer.
We derive a bound on the generalization performance of both Dropout and DropConnect.
We then evaluate DropConnect on a range of datasets, comparing to Dropout, and
show state-of-the-art results on several image recognition benchmarks by aggregating multiple
DropConnect-trained models. |
||

Multi-Column Deep Neural Networks for Image Classification (Feb 2012, CVPR 2012) | 0.23% | |

Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks. |
||

APAC: Augmented PAttern Classification with Neural Networks (May 2015) | 0.23% | |

Deep neural networks have been exhibiting splendid accuracies in many of visual pattern classification problems. Many of the state-of-the-art methods employ a technique known as data augmentation at the training stage. This paper addresses an issue of decision rule for classifiers trained with augmented data. Our method is named as APAC: the Augmented PAttern Classification, which is a way of classification using the optimal decision rule for augmented data learning. Discussion of methods of data augmentation is not our primary focus. We show clear evidences that APAC gives far better generalization performance than the traditional way of class prediction in several experiments. Our convolutional neural network model with APAC achieved a state-of-the-art accuracy on the MNIST dataset among non-ensemble classifiers. Even our multilayer perceptron model beats some of the convolutional models with recently invented stochastic regularization techniques on the CIFAR-10 dataset. |
||

Batch-normalized Maxout Network in Network (Nov 2015) | 0.24% | |

This paper reports a novel deep architecture referred to as Maxout network In Network (MIN), which can enhance model discriminability and facilitate the process of information abstraction within the receptive field. The proposed network adopts the framework of the recently developed Network In Network structure, which slides a universal approximator, multilayer perceptron (MLP) with rectifier units, to exact features. Instead of MLP, we employ maxout MLP to learn a variety of piecewise linear activation functions and to mediate the problem of vanishing gradients that can occur when using rectifier units. Moreover, batch normalization is applied to reduce the saturation of maxout units by pre-conditioning the model and dropout is applied to prevent overfitting. Finally, average pooling is used in all pooling layers to regularize maxout MLP in order to facilitate information abstraction in every receptive field while tolerating the change of object position. Because average pooling preserves all features in the local patch, the proposed MIN model can enforce the suppression of irrelevant information during training. Our experiments demonstrated the state-of-the-art classification performance when the MIN model was applied to MNIST, CIFAR-10, and CIFAR-100 datasets and comparable performance for SVHN dataset. |
||

Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree (Sep 2015, AISTATS 2016) | 0.29% | |

We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures. We pursue a careful exploration of approaches to allow pooling to learn and to adapt to complex and variable patterns. The two primary directions lie in (1) learning a pooling function via (two strategies of) combining of max and average pooling, and (2) learning a pooling function in the form of a tree-structured fusion of pooling filters that are themselves learned. In our experiments every generalized pooling operation we explore improves performance when used in place of average or max pooling. We experimentally demonstrate that the proposed pooling operations provide a boost in invariance properties relative to conventional pooling and set the state of the art on several widely adopted benchmark datasets; they are also easy to implement, and can be applied within various deep neural network architectures. These benefits come with only a light increase in computational overhead during training and a very modest increase in the number of model parameters. |
||

Recurrent Convolutional Neural Network for Object Recognition (CVPR 2015) | 0.31% | |

On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units (Aug 2015) | 0.31% | |

Fractional Max-Pooling (Dec 2014) | 0.32% | |

Competitive Multi-scale Convolution (Nov 2015) | 0.33% | |

Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition (Mar 2010, Neural Computation 2010) | 0.35% | |

C-SVDDNet: An Effective Single-Layer Network for Unsupervised Feature Learning (Dec 2014) | 0.35% | |

Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network (Mar 2015) | 0.37% | |

All you need is a good init (Nov 2015, ICLR 2016) | 0.38% | |

Efficient Learning of Sparse Representations with an Energy-Based Model (NIPS 2006) | 0.39% | |

Convolutional Kernel Networks (Jun 2014) | 0.39% | |

Deeply-Supervised Nets (Sep 2014) | 0.39% | |

Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis (Document Analysis and Recognition 2003) | 0.4% | |

Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks (Feb 2015) | 0.40% | |

Multi-Loss Regularized Deep Neural Network (CSVT 2015) | 0.42% | |

Maxout Networks (Feb 2013, ICML 2013) | 0.45% | |

Training Very Deep Networks (Jul 2015, NIPS 2015) | 0.45% | |

ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks (May 2015) | 0.45% | |

Deep Convolutional Neural Networks as Generic Feature Extractors (IJCNN 2015) | 0.46% | |

Network in Network (Dec 2013, ICLR 2014) | 0.47% | |

Trainable COSFIRE filters for keypoint detection and pattern recognition (PAMI 2013) | 0.52 % | |

What is the Best Multi-Stage Architecture for Object Recognition? (ICCV 2009) | 0.53% | |

Deformation Models for Image Recognition (PAMI 2007) | 0.54% | |

A trainable feature extractor for handwritten digit recognition (Journal Pattern Recognition 2007) | 0.54% | |

Training Invariant Support Vector Machines (Machine Learning 2002) | 0.56% | |

Simple Method for High-Performance Digit Recognition Based on Sparse Coding (TNN 2008) | 0.59% | |

Unsupervised learning of invariant feature hierarchies with applications to object recognition (CVPR 2007) | 0.62% | |

PCANet: A Simple Deep Learning Baseline for Image Classification? (Apr 2014) | 0.62% | |

Shape matching and object recognition using shape contexts (PAMI 2002) | 0.63% | |

Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features (CVPR 2012) | 0.64% | |

Handwritten Digit Recognition using Convolutional Neural Networks and Gabor Filters (ICCI 2003) | 0.68% | |

On Optimization Methods for Deep Learning (ICML 2011) | 0.69% | |

Deep Fried Convnets (Dec 2014, ICCV 2015) | 0.71% | |

Sparse Activity and Sparse Connectivity in Supervised Learning (JMLR 2013) | 0.75% | |

HyperNetworks (Sep 2016, arXiv 2016) | 0.76% | |

Explaining and Harnessing Adversarial Examples (Dec 2014, ICLR 2015) | 0.78% | |

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations (ICML 2009) | 0.82% | |

Supervised Translation-Invariant Sparse Coding (CVPR 2010) | 0.84% | |

Large-Margin kNN Classification using a Deep Encoder Network (Jun 2009, 2009) | 0.94% | |

Deep Boltzmann Machines (AISTATS 2009) | 0.95% | |

BinaryConnect: Training Deep Neural Networks with binary weights during propagations (Nov 2015, NIPS 2015) | 1.01% | |

StrongNet: mostly unsupervised image recognition with strong neurons (technical report on ALGLIB website 2014) | 1.1% | |

CS81: Learning words with Deep Belief Networks (2008) | 1.12% | |

Reducing the dimensionality of data with neural networks (2006) | 1.2% | |

Convolutional Clustering for Unsupervised Learning (Nov 2015) | 1.40% | |

Deep learning via semi-supervised embedding (2008) | 1.5% | |

Deep Representation Learning with Target Coding (AAAI 2015) | 14.53% |