Lecture 13. Convolutional Neural Networks#

This notes is completed with assistance of ChatGPT


Quick Question

What are the choice of activation functions and loss functions or a binary classification deep learning model?

Quick Question

In image detection, why can CNN recognise a triangle image even if it not on the same spot?


Random notes by hand#

  • CNN are motivated by: efficiency and translation invariance.

  • translation invariant: you can recognize an object as an object, even when its appearance varies in some way.

  • filters / kernel (\(w\)): for extracting features from image (a form of vector of weights)

  • parameter sharing: learn translation invariant filters

  • convolution operator: technique to match filter to image


Tip

• When input and kernel have the same pattern: high activation response


Convolutional Neural Networks (ConvNets or CNNs) Overview#

Basic Principles:#

  1. Convolution Operator:

    • Originates from signal processing.

    • Defined by the integral/sum of the product of two functions, with one function being a flipped and shifted version of the other.

    • Measures how the shape of one function matches the other as it slides along.

    • In the context of CNNs, it is applied to discrete inputs like images or sequences.

  2. Convolution in 2D:

    • Applied extensively for image processing tasks.

    • Helps in producing a “Feature Map”, a 2D representation showing the presence of a specific pattern (defined by a kernel) at different locations in an input image.

    • Different kernels can detect different patterns (like edges, textures, etc.)

  3. Convolution in 1D:

    • Primarily for sequential data such as time series or text.

    • Captures patterns or n-grams in sequences, like word combinations in text data.

Architectural Components:#

  1. Convolution Layers:

    • Extract features from input data using the convolution operation.

    • Filter weights are learned during training.

  2. Downsampling via Max Pooling:

    • Reduces the spatial dimension, retaining dominant features.

    • Helps prevent overfitting and reduces computational demand.

    • Not entirely differentiable, but gradients are defined based on the “sub-gradient” for backpropagation.

  3. Fully Connected Layers:

    • Used towards the end of the network.

    • Combines the features learned and makes predictions or classifications.

  4. Residual Connections (specific to ResNet):

    • Helps in training very deep networks by adding the input to the output of layers.

    • Alleviates the vanishing gradient problem in deep networks.

Applications:#

  1. Computer Vision:

    • LeNet-5: An early and influential CNN model.

    • ResNet: Introduced residual connections, allowing for the training of much deeper networks.

    • Common practice involves using transfer learning, i.e., pretraining on a large dataset and fine-tuning on a smaller, task-specific dataset.

  2. Language Processing:

    • 1D convolutions applied to word sequences.

    • CNNs can effectively classify sentences and other text-based tasks.

    • Research by Yoon Kim in 2014 highlighted the efficacy of CNNs for sentence classification.

Advantages of CNNs:#

  1. Translation Invariance:

    • CNNs can detect patterns regardless of their position in the input.

    • Max-pooling further amplifies this characteristic.

  2. Parameter Sharing:

    • Reduces the number of parameters to learn, as the same filter is used across the entire input.

  3. Hierarchical Pattern Learning:

    • Stacking multiple convolution layers allows CNNs to learn complex patterns. Early layers might detect simple patterns like edges, while deeper layers can detect more abstract features.