|Method (expand all | collapse all)||Accuracy (10 fixed folds of 1000) (%)||Accuracy (1 fold of 5000) (%)|
|Invariant Information Clustering for Unsupervised Image Classification and Segmentation (Jul 2018, arXiv 2018)||88.8%|
We present a novel clustering objective that learns a neural network classifier from scratch, given only unlabelled data samples. The model discovers clusters that accurately match semantic classes, achieving state-of-the-art results in eight unsupervised clustering benchmarks spanning image classification and segmentation. These include STL10, an unsupervised variant of ImageNet, and CIFAR10, where we significantly beat the accuracy of our closest competitors by 8 and 9.5 absolute percentage points respectively. The method is not specialised to computer vision and operates on any paired dataset samples; in our experiments we use random transforms to obtain a pair from each image. The trained network directly outputs semantic labels, rather than high dimensional representations that need external processing to be usable for semantic clustering. The objective is simply to maximise mutual information between the class assignments of each pair. It is easy to implement and rigorously grounded in information theory, meaning we effortlessly avoid degenerate solutions that other clustering methods are susceptible to. In addition to the fully unsupervised mode, we also test two semi-supervised settings. The first achieves 88.8% accuracy on STL10 classification, setting a new global state-of-the-art over all existing methods (whether supervised, semi supervised or unsupervised). The second shows robustness to 90% reductions in label coverage, of relevance to applications that wish to make use of small amounts of labels. github.com/xu-ji/IIC
|Scaling the Scattering Transform: Deep Hybrid Networks (Mar 2017, arXiv 2017)||76.0% (±0.6%)||87.6%|
We use the scattering network as a generic and fixed ini-tialization of the first layers of a supervised hybrid deep network. We show that early layers do not necessarily need to be learned, providing the best results to-date with pre-defined representations while being competitive with Deep CNNs. Using a shallow cascade of 1 x 1 convolutions, which encodes scattering coefficients that correspond to spatial windows of very small sizes, permits to obtain AlexNet accuracy on the imagenet ILSVRC2012. We demonstrate that this local encoding explicitly learns invariance w.r.t. rotations. Combining scattering networks with a modern ResNet, we achieve a single-crop top 5 error of 11.4% on imagenet ILSVRC2012, comparable to the Resnet-18 architecture, while utilizing only 10 layers. We also find that hybrid architectures can yield excellent performance in the small sample regime, exceeding their end-to-end counterparts, through their ability to incorporate geometrical priors. We demonstrate this on subsets of the CIFAR-10 dataset and on the STL-10 dataset.
|Improved Regularization of Convolutional Neural Networks with Cutout (Aug 2017, arXiv 2017)||87.26% (±0.23%)|
Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance. We evaluate this method by applying it to current state-of-the-art architectures on the CIFAR-10, CIFAR-100, and SVHN datasets, yielding new state-of-the-art results with almost no additional computational cost. We also show improved performance in the low-data regime on the STL-10 dataset.
|Deep Unsupervised Learning Through Spatial Contrasting (Oct 2016, arXiv 2016)||81.3%|
Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks. They are, however, most suited for supervised learning from large amounts of labeled data. Previous attempts have been made to use unlabeled data to improve model performance by applying unsupervised techniques. These attempts require different architectures and training methods. In this work we present a novel approach for unsupervised training of Convolutional networks that is based on contrasting between spatial regions within images. This criterion can be employed within conventional neural networks and trained using standard techniques such as SGD and back-propagation, thus complementing supervised methods.
|Stacked What-Where Auto-encoders (Jun 2015)||74.33%|
We present a novel architecture, the "stacked what-where auto-encoders" (SWWAE), which integrates discriminative and generative pathways and provides a unified approach to supervised, semi-supervised and unsupervised learning without relying on sampling during training. An instantiation of SWWAE uses a convolutional net (Convnet) (LeCun et al. (1998)) to encode the input, and employs a deconvolutional net (Deconvnet) (Zeiler et al. (2010)) to produce the reconstruction. The objective function includes reconstruction terms that induce the hidden states in the Deconvnet to be similar to those of the Convnet. Each pooling layer produces two sets of variables: the "what" which are fed to the next layer, and its complementary variable "where" that are fed to the corresponding layer in the generative decoder.
|Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks (Jun 2014, arXiv 2015)||74.2% (±0.4%)|
|Convolutional Clustering for Unsupervised Learning (Nov 2015)||74.10%|
|Deep Representation Learning with Target Coding (AAAI 2015)||73.15%|
|Discriminative Unsupervised Feature Learning with Convolutional Neural Networks (NIPS 2014)||72.8% (±0.4%)|
|An Analysis of Unsupervised Pre-training in Light of Recent Advances (Dec 2014, ICLR 2015)||70.20% (±0.7%)|
|Multi-Task Bayesian Optimization (NIPS 2013)||70.1% (±0.6%)|
|C-SVDDNet: An Effective Single-Layer Network for Unsupervised Feature Learning (Dec 2014)||68.23% (±0.5%)|
|Committees of deep feedforward networks trained with few data (Jun 2014)||68% (±0.55%)|
|Stable and Efficient Representation Learning with Nonnegativity Constraints (ICML 2014)||67.9% (±0.6%)|
|Unsupervised Feature Learning for RGB-D Based Object Recognition (ISER 2012)||64.5% (±1%)|
|Convolutional Kernel Networks (Jun 2014)||62.32%|
|Discriminative Learning of Sum-Product Networks (NIPS 2012)||62.3% (±1%)|
|No more meta-parameter tuning in unsupervised sparse feature learning (Feb 2014)||61.0% (±0.58%)|
|Deep Learning of Invariant Features via Simulated Fixations in Video (NIPS 2012 2012)||61%|
|Selecting Receptive Fields in Deep Networks (NIPS 2011)||60.1% (±1%)|
|Learning Invariant Representations with Local Transformations (Jun 2012, ICML 2012)||58.7%|
|Pooling-Invariant Image Feature Learning (Jan 2013)||58.28%|