Neural Networks made ridiculously simple

AI/ML
beginner
Published

August 12, 2024

Previous attendees have said…

  • 8 previous attendees have left feedback
  • 100% would recommend this session to a colleague
  • 100% said that this session was pitched correctly

Three random comments from previous attendees
  • Really interesting and informative, showing me a data processing techniques that have previously been a mysterious black box for me.
  • engaging. Good balance of principles and examples
  • I won’t pretend I ‘got’ everything but a fascinating and helpful window into this topic!
Session materials

Welcome

  • this session is 🌶: for beginners

What’s this session for?

  • neural nets are a core technology for AI/ML systems
  • they’ve been around for decades (and probably will go on for decades)
  • they’re also particularly helpful for health & care folk as a way of understanding AI/ML tools in general

What this session won’t do

  • give a general introduction to AI/ML
  • explain how to build a neural net of your very own
  • discuss in any detail the (often formidable) maths of neural nets

Biology: the neurone

  • biological neurones:
    • receive some input from an upstream neuron(s)
    • process that input in some way
    • generate some output(s) in response to that input, and pass it downstream

Biology: activation

https://upload.wikimedia.org/wikipedia/commons/thumb/4/4a/Action_potential.svg/778px-Action_potential.svg.png
  • neurones respond to stimulii
    • threshold-y
    • approximately digital output (on/off)
    • sometimes complex behaviour about inputs

Biology: networks of neurones

  • neurones are usually found in networks
  • can produce complex and sophisticated behaviours in a robust way
    Leon Benjamin on Flickr
  • non-obvious relationships between structure and function

Machines: the node

Here’s a simple representation of a node, implemented in code, that we might find in a neural network:

Machines: activation functions

Here are some example input:output pairs for our node:

  • there are lots of possible activation functions
  • a simple one: NOT
    • our node outputs TRUE when we input FALSE, and vice versa

This flexibility means that we can build networks of nodes (hence neural networks). Again, a very simple example:

Activation functions can be extremely simple

node <- function(input){
  !input
}

node(TRUE)
[1] FALSE

Machines: networks of nodes

Machines: networks of nodes

  • nodes are usually found in networks
  • can produce complex and sophisticated behaviours in a robust way
    Chat GPT nonsense
  • again, non-obvious relationships between structure and function in artifical neural networks (ANN)

A user supplies some input. That input is fed into an input node(s), which processes the input, and produces three different outputs that are then fed into a second layer of nodes. Further processing happens in this hidden layer, leading to three outputs that are integrated together in a final output node that processes the outputs of the hidden layer into a single output.

Several kinds of networks

  • there are lots of ways that neural networks can be arranged
  • our example above = feed-forward
    • all the nodes are connected from left-to-right
  • but more complex architectures - like recurrent neural networks - might have feedback looks and other biological-ish features
  • different numbers of layers
  • lots of different design tendencies since the first intro of neural nets in the 1950s [@rosenblatt1958]
  • most fancy ANNs are currently architecturally simple

Why ANNs?

  • ANNs can can potentially replicate any input-output ransformation
  • we do that by a) increasing complexity and b) allowing them to ‘learn’

. . .

Different activation functions

  • binary (true/false)
  • continuous
    • linear
    • non-linear (like sigmoid, ReLU)

Training in neural networks

  • ANNs can be trained
    • take a dataset
    • split it into training and test parts
      • classify (by hand) the training data
    • then train
      • feed your ANN the training data and evaluate how well it performs
      • modify the ANN based on that evaluation
      • repeat until done/bored/perfect
    • finally, test your model with your unlabelled test data and evaluate

MNIST

  • a classic dataset
  • recognizing handwritten numbers = actually-important task
  • 60000 labelled training images
  • 10000 test images
  • each is a 28*28 pixel matrix grey vales encoded as 0-255

MNIST data example

V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
0 0 0 0 0 0 0 0 0 0 0
0 0 0 3 18 18 18 126 136 175 26
36 94 154 170 253 253 253 253 253 225 172
253 253 253 253 253 253 253 253 251 93 82
253 253 253 253 253 198 182 247 241 0 0
156 107 253 253 205 11 0 43 154 0 0

Train for MNIST

  • take your training data
  • put together a neural network (number of nodes, layers, feedback, activation functions)
  • run the training data, and evaluate based on labelling
  • modify your neural network, rinse, and repeat
  • when happy, try the unlabelled test data

MNIST examples

  • lots of different examples
  • why do this work in an ANN, rather than in some other tool?
    • generic training strategy - reduces need for domain knowledge
    • hopefully robust outcomes - so giving models able to work across contexts
    • scalable

MNIST in R

An aside here for the R enthusiasts - we can plot the handwritten numbers back out of the data using ggplot():

mnist_plot_dat <- function(df) {
   # matrix to pivoted tibble for plotting
  df |>
      as_tibble() |>
      mutate(rn = row_number()) |>
      tidyr::pivot_longer(!rn) |>
      mutate(name = as.numeric(gsub("V", "", name)))
}

mnist_main_plot <- function(df) {
  df |>
    ggplot() +
    geom_tile(aes(
      x = rn,
      y = reorder(name,-name),
      fill = value
    )) +
    scale_fill_gradient2(mid = "white", high = "black")
}

mnist_plot <- function(n){

  mnist_plot_dat(matrix(mnist$train$images[n, ], 28, 28)) |>
    mnist_main_plot() +
      ggtitle(glue("Label: { mnist$train$labels[n]}")) +
      theme_void() +
      theme(legend.position = "none")

}

gridExtra::grid.arrange(grobs = purrr::map(1:36, mnist_plot), nrow = 6, top="Some MNIST examples")