Abstract: Hands-on, closed box: Discover what's inside for you, and how we coped.
Train and test an AI – and see how we learned!
The purpose of this interactive conference workshop is to provide participants with hands-on experience training and testing an AI model. This workshop is designed for individuals with little to no prior knowledge of AI and machine learning, but who are interested in learning about the basics of how these technologies work.
During the workshop, participants will be guided through the process of training a simple AI model using a popular machine learning framework. They will learn how to input data and adjust model hyperparameters in order to improve the model's performance. After the model has been trained, participants will have the opportunity to test it on a variety of sample data and evaluate its accuracy. They will also be introduced to common metrics for evaluating the performance of machine learning models, such as precision, recall, and F1 score.
Throughout the workshop, there will be ample opportunity for participants to ask questions and engage with the facilitators and other participants. By the end of the workshop, participants will have a better understanding of the basics of AI and machine learning, and will have gained practical experience working with a trained AI model.
This abstract was written by ChatGPT, prompted and selected by James and Bart
Outline
First half: We'll spend the first half of the workshop ...
Break: You're welcome to keep working over the break, to take a moment and return as the sessions start – or to leave. You're also welcome to drop in for the second half; we'll help you set up, and you should find plenty of information on this page.
Second half: In the second half, we'll look at ...
End: You'll have tried training an AI, you'll have tried testing that AI, you'll have worked with others in the group. Maybe you'll carry on with one or both!
Collaboration tools
We have a shared Miro board for this workshop. You can edit it now, and we'll lock it after the workshop so that you can refer to it.
We'll work in Teachable Machine. You'll need to use a laptop for this: it seems not to work well from handheld devices. Aaargh.
Using Teachable Machine
Teachable Machine runs in your browser, taking friction-free input from your camera and microphone. It runs best on Chrome or Firefox on a laptop. It's built on TensorFlowJs, and can export models that you train. You train it on files, or on input from your camera and microphone. It's built to be trained to classify images, poses, or sounds – you'll pick the project when you start to make a model. You can open an existing project (cloud or local) and save to cloud.
In use, you add classes, then add samples to each class, then train, then see how the trained model responds to input. Using a webcam makes this process swift and intuitive, giving fast feedback to what is oftern a slow but iterative process. It's good to play!
Sample size caveat
To paraphrase TM, Teachable Machine splits your samples into two sets, and labels them training and test. Most of the samples are used to train the model how to correctly classify new samples into the classes you’ve made. The rest are never used to train the model, so after the model has been trained on the training samples, they are used to check how well the model is performing on new, never-before-seen data.
We reckon that the split may be different from training to training, and don't know if it's different epoch to epoch.
This has two implications: 1) use three pictures, and it'll train on two and test its model with the remaining one. Weirdness abounds on this edge. 2) your model may be very different on different training runs, especially if your small sample set has diverse samples.
Training Parameters
Open 'Training: advanced' to see these
Epochs
: One epoch means that each and every sample in the training dataset has been fed through the training model at least once. If your epochs are set to 50, for example, it means that the model you are training will work through the entire training dataset 50 times. Generally the larger the number, the better your model will learn to predict the data. You probably want to tweak (usually increase) this number until you get good predictive results with your model.
Batch Size
: A batch is a set of samples used in one iteration of training. For example, let's say that you have 80 images and you choose a batch size of 16. This means the data will be split into 80 / 16 = 5 batches. Once all 5 batches have been fed through the model, exactly one epoch will be complete. You probably won't need to tweak this number to get good training results.
Learning Rate
: Be careful tweaking this number! Even small differences can have huge effects on how well your model learns.
(text from Teachable Machine)
Training Metrics
Accuracy per class
Accuracy per class is calculated using the test samples. Check out the vocab section to learn more about test samples.
Confusion Matrix
A confusion matrix summarizes how accurate your model's predictions are. You can use this matrix to figure out which classes the model gets confused about. The y axis (Class) represents the class of your samples. The x axis (Prediction) represents the class that the model, after learning, guesses those samples belong to. So, if a sample’s Class is "Muffin" but its Prediction is "Cupcake", that means that after learning from your data, the model misclassified that Muffin sample as a Cupcake. This usually means that those two classes share characteristics that the model picks up on, and that particular "Muffin" sample was more similar to the "Cupcake" samples.
Accuracy per Epoch
: Accuracy is the percentage of classifications that a model gets right during training. If your model classifies 70 samples right out of 100, the accuracy is 70 / 100 = 0.7. If the model's prediction is perfect, the accuracy is one; otherwise, the accuracy is lower than one.
Loss per Epoch
: Loss is a measure for evaluating how well a model has learned to predict the right classifications for a given set of samples. If the model's predictions are perfect, the loss is zero; otherwise, the loss is greater than zero. To get an intuitive sense of what this measures, imagine you have two models: A and B. Model A predicts the right classification for a sample but is only 60% confident of that prediction. Model B also predicts the right classification for the same sample but is 90% confident of that prediction. Both models have the same accuracy, but model B has a lower loss value.
(text from Teachable Machine)
Vocabulary
Training samples
: (85% of the samples) are used to train the model how to correctly classify new samples into the classes you’ve made.
Test samples
: (15% of the samples) are never used to train the model, so after the model has been trained on the training samples, they are used to check how well the model is performing on new, never-before-seen data.
Underfit
: a model is underfit when it classifies poorly because the model hasn't captured the complexity of the training samples.
Overfit
: a model is overfit when it learns to classify the training samples so closely that it fails to make correct classifications on the test samples.
Epochs
: One epoch means that every training sample has been fed through the model at least once. If your epochs are set to 50, for example, it means that the model you are training will work through the entire training dataset 50 times.
(text from Teachable Machine)
Training an AI
Samples, precision, recall, F1 – and how that matches to TM
Exercises
These exercises aren't to be done in order. Pick one that appeals to you and your group. If you can't agree, switch group.
We aim to do two exercise sessions in the workshop, with group sharing after each. We aim to have several groups on different exercises.
1 – Test a pre-Trained AI
We trained Teachable Machine to distinguish between different shapes, and different colours.
Propose some general principles of testing that reflect specific examples of what you've found while testing this model. List principle and example.
Share surprises.
Steps
- Open the model below.
- Look at the training data to build an expectation of the data that has been used to train.
- Judge the model's ability to recognise colour and shape, using images from the testing data folder – how might you expect the model to classify them? How does it classify them? Are these images useful? Try some from the training data folder.
- Make your own images to try. Work with things you expect it to classify correctly, and use your insights to find situations where it classifies something wrongly.
- As a tester, can you generalise some principles of ways to verify that the model classifies correctly? Some principles to help you understand how it might be classifying incorrectly?
- As a tester, think about how the training data might need to change to be 'better'. Better how?
Sources
Teachable Machine model: https://drive.google.com/file/d/1ib-t9MPBPPOl1IWxI8es9Y5puFKzJgc4/view?usp=sharing
Training data used in training this model: https://drive.google.com/drive/folders/1cpj5UX3Ovs9kfTB1Qu-plFAykwpz2Jq0?usp=drive_link
Images we used to test the model: https://drive.google.com/drive/folders/1lLKAttJ_PQReJTOHTNNw4c6TILiD1cIy?usp=drive_link
Extend exercise / insights
not in order - pick an interesting one
- Look at the training metrics in ?? – what do they seem to tell you?
- Try turning on the camera and testing with drawn pictures in the camera feed.
- Retrain with the same data and parameters. Look at the training metrics, and at the model behaviour. Does your newly-trained model have the same qualities? How well can you tell?
- Re-train with additions to / changes to / removals from the training data. Look at the training metrics, and at the model behaviour. How does your new model appear to differ?
- Re-train with changes ot training parameters. Look at the training metrics, and at the model behaviour. How does your new model appear to differ?
- Can you include the 'testing' data in the training dataset and re-train? Judge your newly-trained model against the provided one.
Open Questions
There is a sense of positive / negative , (and by implication false positive / false negative) in the training. Does that exist in the model?
This model is trained on colours and shapes – why might that be useful? Why might it be a problem?
Would the model be more useful if it had a class for "don't recognise this colour / shape" ? How might you train that?
1a – Compare two pre-Trained AIs
We've trained two instances of Teachable Machine to distinguish between different shapes, and different colours. The two instances have been trained on the same data, but have different parameters.
List ways that the two models differ in training parameters, and in behaviour.
Give examples of the ways that the information from training metrics reflect the behaviours.
Propose ways that you might use the training parameters, and the training metrics, to make a 'better' model.
Steps
- Use the 'testing' data to assess the similarities and differences in the two models.
- Look at the differences in the training metrics. What might they mean? Can you see any links to the behaviours of the models.
Stuff
Teachable Machine: Instance A / Instance B (links to follow)
Training data (links to follow)
Test data (links to follow)
Options
- Retrain one or both of the models on the same data, with the same training parameters. Compare the models with each other – do they continue to differ in the same ways that you saw before?
- Adjust the training parameters
2 – Train without Bias
: to be written alongside 3
here's stuff
Steps
things
- first
- second
- third
Stuff
what goes here?
example models
Options
to extend exercise
to extend insights
3 – Train with deliberate Bias
: to be written alongside 2
here's stuff
Steps
things
- first
- second
- third
Stuff
what goes here?
example models
Options
to extend exercise
to extend insights
4 – Test Outcome of Unbalanced Training
here's stuff
Steps
things
- first
- second
- third
Stuff
what goes here?
example models
Options
to extend exercise
to extend insights
5 – Train an AI with several classes
Teachable Machine's examples typically have two classes. Introducing more brings in more ways that an input can be mis-classified. We'll explore that here.
Describe your set of classes.
List ways that your trained model mis-classifies inputs.
Propose general principles of how one might test for these mis-classifications.
Steps
- Decide on your different classes – pick something with more than two options. Try to make few-enough that you can complete the task of training and working with the model in the time available. Consider how the classes differ, and how you'll reflect the difference in your data.
- Make a new Teachable Machine, and train it. Optionally, use your skills to tweak the training parameters.
- Try it out, looking for mis-classifications.
- Save your model and data and link them to the Miro board for others to try.
Stuff (none)
what goes here?
example models
Options (none)
to extend exercise
to extend insights
6 – Train an AI with Overlapping Classes in the data
For example, Square and Rectangle are overlapping classes: all Squares are Rectangles
here's stuff
Steps
things
- first
- second
- third
Stuff
what goes here?
example models
Options
to extend exercise
to extend insights
7 – How can a Tool Help in Testing an AI
Describe the tool in use
List specific insights that the tool led you towards
Steps
Tool 1 – the training metrics, analysis of the input and training parameters
Tool 2 – Vipul's tool AIEnsured
- Select your model (list of ours below / list of new on the Miro board)
- Use the tool
- ? Tool 1 – check the input for biases, look at the parameters, and the metrics from the training.
- ? Tool 2 – extract the TensorFlowJs from TM, and give it to AIEnsured (how??)
- Explore what the tool says – what from its information seems useful? Of the useful information, what seems true? How does the true, useful information help? What would you try next?
- Try the next thing.
Stuff
List of tools / link to board
Link to VK's tool
Links to several things in the parameters / metrics
Options (none)
to extend exercise
to extend insights
8 – How Training Size and Variety affects Outcome
here's stuff
Steps
things
- first
- second
- third
Stuff
what goes here?
example models
Options
to extend exercise
to extend insights
9 – Non-image Classification
Teachable Machine is trained to take several classes of input – pictures, poses, sounds. We imagine that many groups will be using images – in this exercise, we look at the differences in training and testing these other classes.
Poses are images, too, of course – but in this model, the AI is trained to treat the image always as a picture of a body.
Describe the model you're building, show the data
List insights from training and testing your model
Propose principles, based on similarities and differences you've seen with other models built in the workshop.
Steps
things
- first
- second
- third
Stuff
what goes here?
example models
Options
to extend exercise
to extend insights
10 – Metrics and Paramaters
While Teachable Machine is exceptionally limited, it computes several typical metrics from training, and provides several parameters to influence training.
List the training metrics the Teachable Machine provides, and the parameters it uses. Do any of the metrics help you judge the model?
Describe how you've been able to use one or more of the metrics / parameters.
Propose ways that testers might use one or more of the metrics.
Steps
things
- Take an existing model (LIST NEEDED)
- Check the parameters and the training metrics.
- Retrain with same paramaters – do the metrics stay the same?
- Observe the metrics that change over training. Retrain, and observe differences.
- What do the parameters and metrics purport tell you about the training?
- What do the parameters and training metrics purport to tell you about the product?
- Does the product's behaviour reflect what the training metrics indicate?
Stuff
what goes here?
example models
Options
to extend exercise
to extend insights
Background: Learning strategies (and tactics...)
JL – REWRITE TO REFLECT 2023
As Bart and James worked together, they identified different ways that they approached learning.
- Authority-first – follow the book, ask the expert
- Promise-driven – commit to do something you don't know how to do,
- Confusion-driven – try to understand the part you recognise as a part, yet understand the least
- Foundation-driven – work from what you already know, and expand outwaerds
- Literature survey – what are the key words? Go search, building a collectioon of core vocabulary (words and concepts). Do they mean different thigns to different groups? What are the core articles / sites / authors / groups / magazines / books / exercises / metaphors?
- Ask publicly for help – get comfortable with your own ignorance and curiosity, attract people who want to help, reward their commitment with your progress.
- Aim to teach / write – teaching and writing both require your mind to enage with the subject in a reflective, more-disciplined way
- Value-driven – find and deliver something of value to someone
- Trial-and-error – thrash about, reflect on what happened, repeat with control, thrash more.
There's nothing valuable under here – just no need to show you half-thought-through bits that didn't make this page.
ONLY SPRUE, FROM HERE ON
Learning Strategies
Conversation – how have you been learning?
5-10 minute facilitated chat
Exercise – pick an approach
15 minutes work, group or solo, + debrief
Consciously pick a learning strategy / tactic that you'd like to try here.
Make that choice clear to yourself (and to those around you). Perhaps say whether you're chosing something familar or whether it's a stretch.
Move on with your exploration of mutation testing.
Debrief: Stories and surprises about learning, 5-10 minutes faciltated chat.