Neural nets made ridiculously simple

AI/ML
beginner
Published

September 15, 2025

Previous attendees have said…

  • 8 previous attendees have left feedback
  • 100% would recommend this session to a colleague
  • 100% said that this session was pitched correctly

Three random comments from previous attendees
  • Great introductions to the origin and concept of Neural Networks.
  • Helpful overview
  • A very accessible introduction to a very complicated topic; highly recommended for anyone interested in learning more but unsure where to begin.

Welcome

  • this session is 🌶: for beginners

What’s this session for?

  • neural nets are a core technology for AI/ML systems
  • they’ve been around for decades (and probably will go on for decades)
  • they’re also particularly helpful for health & care folk as a way of understanding AI/ML tools in general

What this session won’t do

  • give a general introduction to AI/ML
  • explain how to build a neural net of your very own
  • discuss in any detail the (often formidable) maths of neural nets

Biology: the neurone

  • biological neurones:
    • receive some input from an upstream neuron(s)
    • process that input in some way
    • generate some output(s) in response to that input, and pass it downstream

Biology: activation

https://upload.wikimedia.org/wikipedia/commons/thumb/4/4a/Action_potential.svg/778px-Action_potential.svg.png
  • neurones respond to stimulii
    • threshold-y
    • approximately digital output (on/off)
    • sometimes complex behaviour about inputs

Biology: networks of neurones

  • neurones are usually found in networks
  • can produce complex and sophisticated behaviours in a robust way Leon Benjamin on Flickr
  • non-obvious relationships between structure and function

Machines: the node

Here’s a simple representation of a node, implemented in code, that we might find in a neural network:

Simple node representation

Machines: activation functions

Here are some example input:output pairs for our node:

Node input:output pairs
  • there are lots of possible activation functions
  • a simple one: NOT
    • our node outputs TRUE when we input FALSE, and vice versa

This flexibility means that we can build networks of nodes (hence neural networks). Again, a very simple example:

Activation functions can be extremely simple

node <- function(input){
  !input
}

node(TRUE)
[1] FALSE

Machines: networks of nodes

Networks of nodes

Machines: networks of nodes

  • nodes are usually found in networks
  • can produce complex and sophisticated behaviours in a robust way Chat GPT nonsense
  • again, non-obvious relationships between structure and function in artifical neural networks (ANN)

A user supplies some input. That input is fed into an input node(s), which processes the input, and produces three different outputs that are then fed into a second layer of nodes. Further processing happens in this hidden layer, leading to three outputs that are integrated together in a final output node that processes the outputs of the hidden layer into a single output.

Several kinds of networks

  • there are lots of ways that neural networks can be arranged
  • our example above = feed-forward
    • all the nodes are connected from left-to-right
  • but more complex architectures - like recurrent neural networks - might have feedback looks and other biological-ish features
  • different numbers of layers
  • lots of different design tendencies since the first intro of neural nets in the 1950s [@rosenblatt1958]
  • most fancy ANNs are currently architecturally simple

Why ANNs?

  • ANNs can can potentially replicate any input-output transformation
  • we do that by a) increasing complexity and b) allowing them to ‘learn’

Example of hidden layers

How do hidden layers help?

  • all this complexity allows input to be processed differently
  • like neurones, downstream nodes either fire (activate) or do not fire depending on their inputs
  • to understand how that takes place, we need three new ideas:
    • an activation function which determines how a node fires in response to its input input
    • a set of input weights which set the strength of each input. Higher weighted inputs are more likely to activate
    • bias = the strength of a node’s output. Higher bias = higher chance of our node’s output firing the downstream node

Different activation functions

  • binary (true/false)
  • continuous
    • linear
    • non-linear (like sigmoid, ReLU)
  • different activation = different node behaviour = different network functions

Sigmoid activation

ReLU activation

Training in neural networks

  • ANNs can be trained
    • take a dataset
    • split it into training and test parts
      • classify (by hand) the training data
    • then train
      • feed your ANN the training data and evaluate how well it performs
      • modify the ANN based on that evaluation
      • repeat until done/bored/perfect
    • finally, test your model with your unlabelled test data and evaluate

Training is about weights and biases

  • varying weights and biases changes how our network responds to input

  • you might start with random weights/biases, or with everything set to 1, or something more complicated

  • training works by varying weights and biases across the network, and evaluating the overall response across the training data. Call that an epoch

LeCun initialization: In this method, you would follow a formula that considers the number of inputs and outputs to create random weights that represent a series of numbers, all with an equal probability of being the starting weight.

Xavier initialization: This method is similar to the LeCun method but is used in Keras and takes a square root of the initialization values you would get from a LeCun method.

He initialization: A method useful for deep learning, He initialization uses an activation function to assign weights.

Example uses

Handwriting recognition with MNIST

  • a classic dataset
  • recognizing handwritten numbers = actually-important task
  • 60000 labelled training images
  • 10000 test images
  • each is a 28*28 pixel matrix grey vales encoded as 0-255

MNIST data example

V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
0 0 0 0 0 0 0 0 0 0 0
0 0 0 3 18 18 18 126 136 175 26
36 94 154 170 253 253 253 253 253 225 172
253 253 253 253 253 253 253 253 251 93 82
253 253 253 253 253 198 182 247 241 0 0
156 107 253 253 205 11 0 43 154 0 0

Train for MNIST

  • take your training data
  • put together a neural network (number of nodes, layers, feedback, activation functions)
  • run the training data, and evaluate based on labelling
  • modify your neural network, rinse, and repeat
  • when happy, try the unlabelled test data

MNIST examples

  • lots of different examples
  • why do this work in an ANN, rather than in some other tool?
    • generic training strategy - reduces need for domain knowledge
    • hopefully robust outcomes - so giving models able to work across contexts
    • scalable

MNIST in R

An aside here for the R enthusiasts - we can plot the handwritten numbers back out of the data using ggplot():

mnist_plot_dat <- function(df) {
   # matrix to pivoted tibble for plotting
  df |>
      as_tibble() |>
      mutate(rn = row_number()) |>
      tidyr::pivot_longer(!rn) |>
      mutate(name = as.numeric(gsub("V", "", name)))
}

mnist_main_plot <- function(df) {
  df |>
    ggplot() +
    geom_tile(aes(
      x = rn,
      y = reorder(name,-name),
      fill = value
    )) +
    scale_fill_gradient2(mid = "white", high = "black")
}

mnist_plot <- function(n){

  mnist_plot_dat(matrix(mnist$train$images[n, ], 28, 28)) |>
    mnist_main_plot() +
      ggtitle(glue("Label: { mnist$train$labels[n]}")) +
      theme_void() +
      theme(legend.position = "none")

}

gridExtra::grid.arrange(grobs = purrr::map(1:36, mnist_plot), nrow = 6, top="Some MNIST examples")

X-ray anomaly detection

A surprise while reviewing the content