Method (expand all | collapse all) | Accuracy (%) | |
---|---|---|

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (Nov 2018, arXiv 2018) | 99.0% | |

GPipe is a scalable pipeline parallelism library that enables learning of giant deep neural networks. It partitions network layers across accelerators and pipelines execution to achieve high hardware utilization. It leverages recomputation to minimize activation memory usage. For example, using partitions over 8 accelerators, it is able to train networks that are 25x larger, demonstrating its scalability. It also guarantees that the computed gradients remain consistent regardless of the number of partitions. It achieves an almost linear speed up without any changes in the model parameters: when using 4x more accelerators, training the same model is up to 3.5x faster. We train a 557 million parameters AmoebaNet model on ImageNet and achieve a new state-of-the-art 84.3% top-1 / 97.0% top-5 accuracy on ImageNet 2012 dataset. Finally, we use this learned model to finetune multiple popular image classification datasets and obtain competitive results, including pushing the CIFAR-10 accuracy to 99% and CIFAR-100 accuracy to 91.3%. |
||

A Survey on Neural Architecture Search (May 2019, arXiv 2019) | 98.67% | |

The growing interest in both the automation of machine learning and deep learning has inevitably led to the development of automated methods for neural architecture optimization. The choice of the network architecture has proven to be critical, and many advances in deep learning spring from its immediate improvements. However, deep learning techniques are computationally intensive and their application requires a high level of domain knowledge. Therefore, even partial automation of this process would help make deep learning more accessible to both researchers and practitioners. With this survey, we provide a formalism which unifies and categorizes the landscape of existing methods along with a detailed analysis that compares and contrasts the different approaches. We achieve this via a discussion of common architecture search spaces and architecture optimization algorithms based on principles of reinforcement learning and evolutionary algorithms along with approaches that incorporate surrogate and one-shot models. Additionally, we address the new research directions which include constrained and multi-objective architecture search as well as automated data augmentation, optimizer and activation function search. |
||

AutoAugment: Learning Augmentation Policies from Data (May 2018, arXiv 2018) | 98.52% | |

In this paper, we take a closer look at data augmentation for images, and describe a simple procedure called AutoAugment to search for improved data augmentation policies. Our key insight is to create a search space of data augmentation policies, evaluating the quality of a particular policy directly on the dataset of interest. In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. Our method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data). On ImageNet, we attain a Top-1 accuracy of 83.54%. On CIFAR-10, we achieve an error rate of 1.48%, which is 0.65% better than the previous state-of-the-art. Finally, policies learned from one dataset can be transferred to work well on other similar datasets. For example, the policy learned on ImageNet allows us to achieve state-of-the-art accuracy on the fine grained visual classification dataset Stanford Cars, without fine-tuning weights pre-trained on additional data. Code to train Wide-ResNet, Shake-Shake and ShakeDrop models with AutoAugment policies can be found at https://github.com/tensorflow/models/tree/master/research/autoaugment |
||

ShakeDrop Regularization (Feb 2018, ICLR 2018) | 97.69% | |

This paper proposes a powerful regularization method named ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake regularization that decreases error rates by disturbing learning. While Shake-Shake can be applied to only ResNeXt which has multiple branches, ShakeDrop can be applied to not only ResNeXt but also ResNet, Wide ResNet and PyramidNet in a memory efficient way. Important and interesting feature of ShakeDrop is that it strongly disturbs learning by multiplying even a negative factor to the output of a convolutional layer in the forward training pass. The effectiveness of ShakeDrop is confirmed by experiments on CIFAR-10/100 and Tiny ImageNet datasets. |
||

Improved Regularization of Convolutional Neural Networks with Cutout (Aug 2017, arXiv 2017) | 97.44% | |

Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance. We evaluate this method by applying it to current state-of-the-art architectures on the CIFAR-10, CIFAR-100, and SVHN datasets, yielding new state-of-the-art results with almost no additional computational cost. We also show improved performance in the low-data regime on the STL-10 dataset. |
||

Random Erasing Data Augmentation (Aug 2017, arXiv 2017) | 96.92% | |

Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization (Nov 2018, arXiv 2018) | 96.55% | |

Densely Connected Convolutional Networks (Aug 2016, arXiv 2016) | 96.54% | |

Fractional Max-Pooling (Dec 2014) | 96.53% | |

Residual Networks of Residual Networks: Multilevel Residual Networks (Aug 2016, arXiv 2017) | 96.23% | |

Wide Residual Networks (May 2016, arXiv 2017) | 96.20% | |

Residual Attention Network for Image Classification (Apr 2017, arXiv 2017) | 96.10% | |

Striving for Simplicity: The All Convolutional Net (Dec 2014, ICLR 2015) | 95.59% | |

All you need is a good init (Nov 2015, ICLR 2016) | 94.16% | |

Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree (Sep 2015, AISTATS 2016) | 93.95% | |

Spatially-sparse convolutional neural networks (Sep 2014) | 93.72% | |

Scalable Bayesian Optimization Using Deep Neural Networks (Feb 2015, ICML 2015) | 93.63% | |

Deep Residual Learning for Image Recognition (Dec 2015) | 93.57% | |

Fast and Accurate Deep Network Learning by Exponential Linear Units (Nov 2015) | 93.45% | |

Universum Prescription: Regularization using Unlabeled Data (Nov 2015) | 93.34% | |

Batch-normalized Maxout Network in Network (Nov 2015) | 93.25% | |

Competitive Multi-scale Convolution (Nov 2015) | 93.13% | |

Recurrent Convolutional Neural Network for Object Recognition (CVPR 2015) | 92.91% | |

HyperNetworks (Sep 2016, arXiv 2016) | 92.77% | |

Learning Activation Functions to Improve Deep Neural Networks (Dec 2014, ICLR 2015) | 92.49% | |

cifar.torch (unpublished 2015) | 92.45% | |

Training Very Deep Networks (Jul 2015, NIPS 2015) | 92.40% | |

Stacked What-Where Auto-encoders (Jun 2015) | 92.23% | |

Multi-Loss Regularized Deep Neural Network (CSVT 2015) | 91.88% | |

Deeply-Supervised Nets (Sep 2014) | 91.78% | |

BinaryConnect: Training Deep Neural Networks with binary weights during propagations (Nov 2015, NIPS 2015) | 91.73% | |

On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units (Aug 2015) | 91.48% | |

Spectral Representations for Convolutional Neural Networks (Jun 2015, NIPS 2015) | 91.40% | |

Network in Network (Dec 2013, ICLR 2014) | 91.2% | |

Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves (IJCAI 2015) | 91.19% | |

Deep Networks with Internal Selective Attention through Feedback Connections (Jul 2014, NIPS 2014) | 90.78% | |

Regularization of Neural Networks using DropConnect (ICML 2013) | 90.68% | |

Maxout Networks (Feb 2013, ICML 2013) | 90.65% | |

Improving Deep Neural Networks with Probabilistic Maxout Units (Dec 2013, ICLR 2014) | 90.61% | |

Practical Bayesian Optimization of Machine Learning Algorithms (Jun 2012, NIPS 2012) | 90.5% | |

APAC: Augmented PAttern Classification with Neural Networks (May 2015) | 89.67% | |

Deep Convolutional Neural Networks as Generic Feature Extractors (IJCNN 2015) | 89.14% | |

ImageNet Classification with Deep Convolutional Neural Networks (NIPS 2012) | 89% | |

Empirical Evaluation of Rectified Activations in Convolution Network (May 2015, ICML workshop 2015) | 88.80% | |

Multi-Column Deep Neural Networks for Image Classification (Feb 2012, CVPR 2012) | 88.79% | |

ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks (May 2015) | 87.65% | |

An Analysis of Unsupervised Pre-training in Light of Recent Advances (Dec 2014, ICLR 2015) | 86.70 % | |

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks (Jan 2013) | 84.87% | |

Improving neural networks by preventing co-adaptation of feature detectors (Jul 2012) | 84.4% | |

Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks (Jun 2014, arXiv 2015) | 84.3% | |

Discriminative Learning of Sum-Product Networks (NIPS 2012) | 83.96% | |

Stable and Efficient Representation Learning with Nonnegativity Constraints (ICML 2014) | 82.9% | |

Learning Invariant Representations with Local Transformations (Jun 2012, ICML 2012) | 82.2% | |

Convolutional Kernel Networks (Jun 2014) | 82.18% | |

Discriminative Unsupervised Feature Learning with Convolutional Neural Networks (NIPS 2014) | 82% | |

Selecting Receptive Fields in Deep Networks (NIPS 2011) | 82.0% | |

Learning Smooth Pooling Regions for Visual Recognition (BMVC 2013) | 80.02% | |

Object Recognition with Hierarchical Kernel Descriptors (CVPR 2011) | 80% | |

Learning with Recursive Perceptual Representations (NIPS 2012) | 79.7% | |

An Analysis of Single-Layer Networks in Unsupervised Feature Learning (AISTATS 2011) | 79.6 % | |

PCANet: A Simple Deep Learning Baseline for Image Classification? (Apr 2014) | 78.67% | |

Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network (Mar 2015) | 75.86% |