Getting started: paper readings

Posted on mar. 19 décembre 2017 in tutorial by Kazuhiro Terao

This is a message I compiled once for myself but also share with my students etc.. You can easily find a similar compilation of readings on people's github: just google it :) But here's just one of those for our group's reference.

Online ML/DL Introduction

Modern CNN papers

I list them in an order of history, hoping this allows you to skip some toward the beginning. I put “recommended” next to the paper i think it’s good/important to read.

  • 2012 Drop-out (recommended)

    • A big jump in training technique to avoid over-fitting and improve final accuracy, key technique for AlexNet
  • 2012 AlexNet (recommended)

    • Legendary debut of CNN, first implementation on GPU by Hinton (prof. U. of Toronto), Alex (now Google), and Ilya (now OpenAI), dramatic performance improvement on ILSVRC, annual competition of image recognition from the last year in both accuracy and speed of processing (previous year based on fisher vector machine). First time CNN was applied on 224x224x3 tensor image.
  • 2014 VGG

    • First systematic approach to understand the effect of network depth using a homogeneous network architectures (all 3x3 kernel convolutions + 2x2 pooling layers)
  • 2014 GoogLeNet

    • First “Inception” network idea
  • 2015 Batch Normalization (recommended)

    • Another big jump in training techniques to make dependency on initial weights smaller, making training dramatically easier (avoids over-fitting)
  • 2015 ResNet (recommended)

    • First “ResNet” idea, surpassed human average accuracy on ILSVRC data set. Technique allowed to train 1000 layer deep network, jaw-dropping for researchers who have been competing to make network deeper and deeper since GoogLeNet and VGG. This technique development was a huge deal, and ResNet is current default choice for constructing a deep network architecture today.
  • 2015 Faster-RCNN (recommended)

    • First real-time object detection by neural network (I think it was 20~30 Hz, which is > 60Hz today with advanced version). Elegant technique to piggy-back detection network on top of any image recognition network. Region Proposal Network (RPN), part of Faster-RCNN development originally for an object detection (in this paper), is today cooked further and used by the best semantic segmentation network today.
  • 2015 DC-GAN (recommended)

    • First generative-adversarial-network which looks like the machine has learned a concept of real world image. Super popular for the network accurately generating images of a bedroom and bathroom (toilet).
  • 2016 FCN (recommended)

    • First solid implementation of CNN for semantic segmentation. > 1500 citations! Still used today as a standard candle of accuracy. Not the best accuracy in the field but extremely fast learning, simple architecture.
  • 2016 R-FCN

    • Detection network improved by combining FCN with Faster-RCNN to improve the detection (first find pixels, then draw boudning box, makes sense!). First segmentation=>detection=>classification work flow. … note “Kaming He” :)
  • 2016 Inception-V4, Inception-ResNet (recommended)

    • Hey let’s improve image classification even more… here’s huge network by Google, the latest Inception module, studied by combining with ResNet.
  • 2016 Wide-ResNet (recommended)

    • Empirical study to answer the question of What-is-the-“depth”-in-ResNet? The group found that ResNet actually performs better by making it “wider” rather than “deeper”.
  • 2016 Wider-or-Deeper ResNet? (recommended)

    • Analytical explanation and analysis of the observation made in the previous paper. Very well written. Demonstrated the importance of the width to image classification and semantic segmentation
  • 2016 Instance-sensitive Fully Convolutional Network (IS-FCN) … (recommended)

    • Extension of R-FCN, improved design architecture to win the ILSVRC semantic segmentation competition 2016 … note “Kaming He” :)
  • 2016 Aggregated Residual Transformations: ResNeXt (recommended)

    • Introduces a new dimension, named "cardinality" (the size of the set of transformations), claimed as yet another effective direction to improve the accuracy besides "width and depth".
  • 2017 Mask R-CNN (recommended)

    • Kaming He’s latest work that already beated IS-FCN, our current target to implement for instance-aware semantic segmentation for particle clustering.
  • 2017 Shattered Gradient Problem

    • If resnets are the answer, what is the question?
  • 2017 Squeeze-and-Excitation Networks (recommended)

    • First detailed study for enhancing the "channels" of the tensor to encode image features instead of spatial dimensions (width/height) of images.