|Method (expand all | collapse all)||Accuracy (%)|
|Invariant Information Clustering for Unsupervised Image Classification and Segmentation (Jul 2018, arXiv 2018)||88.8%|
We present a novel clustering objective that learns a neural network classifier from scratch, given only unlabelled data samples. The model discovers clusters that accurately match semantic classes, achieving state-of-the-art results in eight unsupervised clustering benchmarks spanning image classification and segmentation. These include STL10, an unsupervised variant of ImageNet, and CIFAR10, where we significantly beat the accuracy of our closest competitors by 8 and 9.5 absolute percentage points respectively. The method is not specialised to computer vision and operates on any paired dataset samples; in our experiments we use random transforms to obtain a pair from each image. The trained network directly outputs semantic labels, rather than high dimensional representations that need external processing to be usable for semantic clustering. The objective is simply to maximise mutual information between the class assignments of each pair. It is easy to implement and rigorously grounded in information theory, meaning we effortlessly avoid degenerate solutions that other clustering methods are susceptible to. In addition to the fully unsupervised mode, we also test two semi-supervised settings. The first achieves 88.8% accuracy on STL10 classification, setting a new global state-of-the-art over all existing methods (whether supervised, semi supervised or unsupervised). The second shows robustness to 90% reductions in label coverage, of relevance to applications that wish to make use of small amounts of labels. github.com/xu-ji/IIC
|Improved Regularization of Convolutional Neural Networks with Cutout (Aug 2017, arXiv 2017)||87.26% (±0.23%)|
Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance. We evaluate this method by applying it to current state-of-the-art architectures on the CIFAR-10, CIFAR-100, and SVHN datasets, yielding new state-of-the-art results with almost no additional computational cost. We also show improved performance in the low-data regime on the STL-10 dataset.
|Stacked What-Where Auto-encoders (Jun 2015)||74.33%|
We present a novel architecture, the "stacked what-where auto-encoders" (SWWAE), which integrates discriminative and generative pathways and provides a unified approach to supervised, semi-supervised and unsupervised learning without relying on sampling during training. An instantiation of SWWAE uses a convolutional net (Convnet) (LeCun et al. (1998)) to encode the input, and employs a deconvolutional net (Deconvnet) (Zeiler et al. (2010)) to produce the reconstruction. The objective function includes reconstruction terms that induce the hidden states in the Deconvnet to be similar to those of the Convnet. Each pooling layer produces two sets of variables: the "what" which are fed to the next layer, and its complementary variable "where" that are fed to the corresponding layer in the generative decoder.
|Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks (Jun 2014, arXiv 2015)||74.2% (±0.4%)|
Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled 'seed' image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While such generic features cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.
|Convolutional Clustering for Unsupervised Learning (Nov 2015)||74.10%|
The task of labeling data for training deep neural networks is daunting and tedious, requiring millions of labels to achieve the current state-of-the-art results. Such reliance on large amounts of labeled data can be relaxed by exploiting hierarchical features via unsupervised learning techniques. In this work, we propose to train a deep convolutional network based on an enhanced version of the k-means clustering algorithm, which reduces the number of correlated parameters in the form of similar filters, and thus increases test categorization accuracy. We call our algorithm convolutional k-means clustering. We further show that learning the connection between the layers of a deep convolutional neural network improves its ability to be trained on a smaller amount of labeled data. Our experiments show that the proposed algorithm outperforms other techniques that learn filters unsupervised. Specifically, we obtained a test accuracy of 74.1% on STL-10 and a test error of 0.5% on MNIST.
|Deep Representation Learning with Target Coding (AAAI 2015)||73.15%|
|Discriminative Unsupervised Feature Learning with Convolutional Neural Networks (NIPS 2014)||72.8% (±0.4%)|
|An Analysis of Unsupervised Pre-training in Light of Recent Advances (Dec 2014, ICLR 2015)||70.20% (±0.7%)|
|Multi-Task Bayesian Optimization (NIPS 2013)||70.1% (±0.6%)|
|C-SVDDNet: An Effective Single-Layer Network for Unsupervised Feature Learning (Dec 2014)||68.23% (±0.5%)|
|Committees of deep feedforward networks trained with few data (Jun 2014)||68% (±0.55%)|
|Stable and Efficient Representation Learning with Nonnegativity Constraints (ICML 2014)||67.9% (±0.6%)|
|Unsupervised Feature Learning for RGB-D Based Object Recognition (ISER 2012)||64.5% (±1%)|
|Convolutional Kernel Networks (Jun 2014)||62.32%|
|Discriminative Learning of Sum-Product Networks (NIPS 2012)||62.3% (±1%)|
|No more meta-parameter tuning in unsupervised sparse feature learning (Feb 2014)||61.0% (±0.58%)|
|Deep Learning of Invariant Features via Simulated Fixations in Video (NIPS 2012 2012)||61%|
|Selecting Receptive Fields in Deep Networks (NIPS 2011)||60.1% (±1%)|
|Learning Invariant Representations with Local Transformations (Jun 2012, ICML 2012)||58.7%|
|Pooling-Invariant Image Feature Learning (Jan 2013)||58.28%|