Making Sense with Exploratory Testing

A workshop for TestScouts' conference, TestCoast 2023 in Gothenburg on September 21

Photo by NEOM on Unsplash

Background

Paths: You may find two ways through this workshop: using it to try out a bunch of different frameworks (not my direct aim) and working on the test-adjacent parts of note-taking etc. Either path is reasonable – you may find I'm dealling with the other path from time to time.

What notes should I keep? Keep the notes you usually keep – be conscious that you're keeping notes on your testing (to help you test and to share with peers) and notes on the workshop (to help you learn from the workshop and perhaps to look over later). Should you keep them separately? That's your decision.

Test adjacent? We're not exploring or testing anything important. We're exploring things that will let us experience some things that go around and through exploratory testing; particularly notes and exploratory frameworks

Notes vs frameworks: I believe that the framework you use affects the notes you keep, and your notes influence how you use a framework. So they're not neatly separable, and they don't need to be.

Testing, play, exploration and judgement: Exploration is play with purpose. Exploratory testing is exploration with judgement.

Models and judgement: We'll build models. Sometimes, you'll judge your model, not the thing you've explored.

For EuroSTAR 2013, in Gothenburg, I ran a workshop called Making Sense of Exploratory Testing.

Outline: Making Sense with Exploratory Testing

Sharpen your testing skills by bringing structure and focus to your exploratory testing in this half-day hands-on workshop.

In the workshop, you will:

  • build cohesive models which reliably capture your thoughts and actions – and the system’s reactions. 
  • use a range of approaches from experimental science to gain insights into your own preferred and effective methods of exploratory test design, observation and recording. 
  • share your approaches with the group, to expand your testing range, and to build on your existing testing, organising and communication skills. 


Come prepared to test. We’ll test things we think we know, and things we’ve never seen. We’ll look for trouble, diagnose problems, build models when we don’t have requirements, and will use simple tools that you already know to design thousands of bulk tests and to analyse their results. You’ll show what you’ve done, talk about how and why you’ve made your testing decisions, and you’ll learn from your workshop peers.

Logistics

Location and times

IHM Business School, Gothenburg. Workshop starts 8:30 (I think) and runs until 12 with a break. More details on the meetup page. I'll update this when I'm more sure.

Equipment

This is a hands-on workshop: you'll be testing. To test, you'll need a device with a browser.

My exercises will run on most phones or tablets. You'll find deeper access to exercises on more-capable devices, so bring a laptop if you want to get deeper into the systems. You'll also need something to keep notes on. Ideally, whatever you usually use.

Some of you will get by with a phone and a pencil. Others might want to bring a pair of laptops.

What will happen

I'm planning six exercises, three in the first half, three in the second half, and a break in the middle:

First half: Revelation | Notes | Show Your Work

Second half: Bulk interactions | Analysis | Diagnosis

For most of the exercises, I'll introduce an idea, you'll work with that idea in mind, you'll reflect on what happened, and share your insights.

Most work will involve interacting with software. That software (and some tools to test it) runs in-browser, to make it swift and easy to interact with. We'll be looking to test the underlying system, rather than how it looks in the browser.

I'll generally ask you to reflect on what you did in the exercises either on your own or with the people at your table; I hope that individuals and groups will share collective insights with everyone, either in person, or on the Miro board.

At the end of each piece of work, you'll have tried some ideas, seen how others reacted, and I hope that you'll be inspired to make a worthwhile change to nail down what you've learned.

Collaboration tools

To share with the room in a semi-permanent way, we have a shared Miro board. You can edit it now, and we'll lock it after the workshop so that you can refer to it.

If you want to share your name and likeness, go to the board and add yourself to the People section of the board.

Early? On time? Waiting for the workshop to start?

You could...
* Read some background
* Go play with this
* Talk to someone.
* add yourself to the People section of the board
* Prime your mind by looking over stuff on Note Types, Making Notes, Experiment Types


Exercises

1 – Revelation

In this exercise, we'll play with something that reveals itself slowly, and to think about the differences between working alone and working collaboratively. The purpose of the exercise is to have a collective experience that we can talk about.

💭
Shareable outcome
How we trigger, manage, respond to and communicate new information

Exercise

Stand up

Open RasterReveal2 on a handheld device (or on your laptop, if you must) and play with it

Walk about. Talk (loudly?) about what you're seeing. Find people who are doing something similar. Exchange names.

Work together.

FRAMEWORK: Focus Puller

Discover the overall thing. Dig into details. Find ways that they connect.

If feeling introspective: Where does your instinct take you? Do you follow or resist?

Output: Two different yet connected perspectives.

When making notes in this framework, I tend to keep a narrative of what I'm doing, and use colours to indicate what grouping I'm in.

This framework goes well with different pairs; private / public, looking for trouble/ verifying value, business / technical, design / implementation, look / tech. You'll think of more.

Reflect

Talk together:

  • How did you find each other?
  • How did "working together" work?
  • Did you collectively have a method for revealing the images?
  • What was similar, in your approaches? How did you differ?

Share

Add one or more insights to the Miro board.

I'll ask for volunteers to share insights with the room.

2 – Notes

In this section, we'll test a very simple system with no states, a single input, a deterministic output, in a familiar domain. We'll make notes as we go. The purpose of the exercise is to get us talking about notes; how and why we make them, and who we make them for.

💭
Shareable outcome
How we capture and communicate about our exploration, our systems, and our decisions

Exercise 2.A

In groups, talk about whether you make notes as you test. If you make notes, what do you use your notes for? If you don't, how do you satisfy those needs? What medium do note-takers use? What information do note-takers keep – and what do they discard?

James will write up some common note types and purposes, and we'll see how those are reflected in the room.

Exercise 2.B

FRAMEWORK: Input / Output

  • List the 'inputs'. List the 'outputs'.
  • What groups can you see in the outputs, and in the inputs? Note: for converter, grouping will help you see that the output is made of several parts
  • How are they connected?

When making notes in this framework, I tend to keep three columns; for inputs, connections and outputs. I fill in inputs and outputs first and together, rearrange for groups, and then think about, test and record connections. Pick your ordering to suit you.

You might also think about what the groups and connections tell you about the system, and how it is processing inputs / building outputs.

Explore converter for 5 minutes, making notes as you go. Work solo or in small groups. Share your notes with your peers. Put your notes on the Miro board if you want to share with the room.

James will ask for volunteers to talk about how they kept notes, what they kept, and what purpose their notes could serve.

Share

We'll take time to record our insights on the Miro board.

3 – Show Your Work

In this exercise, we'll explore something without the usual context of requirements or (many) expectations. While this is an interesting and useful skill, the purpose of this exercise is to show our work to each other.

💭
Shareable outcome
How we describe our testing to other testers

FRAMEWORK: Handholds

What are the parts? What are their names?

What similarities can you observe? What differences?

What behaviours are reliable? What are unreliable?

Where are the edges of the system you are testing?

What can help you understand the system, apart from the UI?

-> How can you simplify your exploration?

I've written more on this at Handholds Framework,

When I make notes, I tend to make several lists, starting with headings of what I'm thinking of (as above – I usually start with discernible parts, reliable behaviours, alternate routes), writing things as I go, and then stopping to look back at the lists to see what I can spot.

Pick one of the puzzles below.

Explore it for 10 minutes: the purpose of your exploration is to discover what the puzzle does so that you can describe an underlying system in 1-3 sentences.

Keep any notes you might need to to help you to summarise your exploration. You'll have a chance to share your notes, which will be a different artefact from the summary of puzzle behaviour that you're working towards as you test. Be conscious of the difference.

Reflect

Take a few minutes to look over your notes and to decide how you'd summarise your work (no need to describe what the Puzzle does). Practice with your peers. Post your notes on the board, indicating whether you'd be interested in summarising your testing to the group.

Share

I'll ask for up to 5 volunteers to give a 1-minute summary of their testing. I'll ask the audience to capture what worked well for them in the summaries, and to add those insights to the Miro board.

Puzzles

Puzzle 13
Puzzle26b
Puzzle 34
Puzzle 35

4 – Bulk

In this exercise, we'll return to converter – this time with tools which enable swifter exploration. The purpose of this exercise is to see how our approach to notes works with a less direct, more automated / generative approach.

💭
Shareable outcome
How we adapt our notes to an alternative style

FRAMEWORK: Bulk Testing

  • Work with sets of data, tracking the ways that you iterate through observations and changes.

When I make notes, I keep a decision-based narrative of what I'm trying, what I might be looking for, and what's odd. I mark my notes to show me where I might have places to dig deeper, and return to those as I test.

Please note: on mobile, you might get more from the slider than the generation tools, as you may not get the input and result fields to line up, resulting in unnecessary confusion.

This is the same converter, with tools that allow several inputs to be submitted as a batch, and provides data covering several transitions in the output. Explore converter, using the generators, for 10 minutes, making notes as you go. Work solo or in small groups. Share your notes with your peers. Put your notes on the Miro board if you want to share with the room.

Reflect

James will ask for volunteers to talk about how they adjusted their note-keeping to manage exploration in bulk, and will invite commentary on how one might scale or change one's record-taking if using bulk or generated data while exploring.

Share

Listeners will record insights on the Miro board.

5 – Analysis

In this exercise, we'll take a collection of observations, and try to understand how they might reflect some simple underlying mechanism. The purpose of this exercise is to give you a shared experience of turning observations into a model.

💭
Shareable outcome
How we work with notes to make a testable model of a system

To do that, you'll need to give your mind the chance to recognise patterns and meanings, so that it can give you potential hypotheses. You need to recognise and articulate those, see how they fit the observations, and from that experiment, build different hypotheses or stop.

FRAMEWORK: Reframe and Regroup

Understand the way your information is sorted. Try other ways of ordering – explore the possible meaning of a particular order.

Look for groups in input and output – filter on those groups, look for outliers and equivalences.

If the labels for the observations come from the system you're testing, do they have patterns and groupings? Consider different labels, aiming for clarification where there might be confusion.

I don't tend to make notes as I shuffle data. Perhaps I should. I do highlight notes as I read / re-read them, calling out information I can use, and areas that I want to look into further.

This exercise involves Machine R. You might explore it via the UI, if you choose to. For the purpose of this exercise, you're encouraged to step into an analysis of the following complete set of observations of all possible inputs.

Observations

Observations are labelled to match the underlying IDs in the UI.

The outputs (Lamp1-Lamp3) are deterministic, depend only on the four inputs (A1, A2, B1, B2 ) and have no dependencies on any prior state or history.

A1 A2 B1 B2 \ Lamp3 Lamp2 Lamp1
- - - - - - -
x - - - - - on
- x - - - on -
x x - - - on on
- - x - - - on
x - x - - on -
- x x - - on on
x x x - on - -
- - - x - on -
x - - x - on on
- x - x on - -
x x - x on - on
- - x x - on on
x - x x on - -
- x x x on - on
x x x x on on -

Reflection

Compare your notes with others on your table, or if you worked together, with other tables.

Sharing

How did you move from observations to (some sort of) model?

Put your insights up on the Miro Board. Do +1 insights that you share. Once we have a set, I'll ask for comments.

6 – Diagnosis

The purpose of this exercise is to experience how different it is to explore something with a known problem, looking for its causes.

💭
Shareable outcome
How we refine a report to build a model

We'll use Machine M(b) for this exercise. The Machine simulates a collection of crashes, each triggered by user input. Once crashed, you will need to 'reset' to try again. The machine will show only one crash at a time, and you can chose which to work on.

Framework: Principles of Diagnosis

  • Reduce steps to necessary and sufficient.
  • Start at the closest cause and work backwards
  • Sequence and history matter
  • Some triggers (time, environment) are not under your control

I don't tend to make narrative notes from diagnosis: My notes from diagnostic work tend to be sparse, then very detailed – but I'll often keep something running to capture the screen or my actions, to let me see what I've been doing. The useful outcome of a diagnosis is more to do with a useful simplification than a process of discovery. Your context may be different.

In this exercise, we'll start by going one crash at a time, so we're all working on the same thing. If you've got an insight, shout it out. If you want to go see someone's kit, walk over.

Reflect and Share

Put insights on the Miro board as you have them. We'll take time at the end of the exercise to collate and edit, and will talk briefly about key groupings.

Wrap-up

💡
The purpose of this wrapup is:
to help several insights stick in your mind
to give you a collection to return to in the coming days and weeks.

Look back over your testing notes, and your workshop notes.

What insights would you like to remember? Add two (or more)to the Miro Board, do +1 if you're adding a duplicate.

We'll briefly review – I particularly want to hear from people who were reminded of an insight that they were in danger of forgetting!

Thank you!

If you'd like to take these exercises back to your colleagues, please do!

This page will contain live links to the exercises for at least a month, and the Miro board will stay up for around a year.

I may add teaching notes to this page – subscribe (free) to see them, and to encourage me to make more available.

Note Types

Here are a few note types. This isn't intended to be definitive, but to help us to imagine different patterns and purposes.

Chronological

Chronological notes are written in sequence, as stuff happens. Lab notebooks, crime notebooks, court notes, journalist's shorthand are chronological.

Handwritten exploratory testing notes are made as you go. They tell the story of your testing – and may force readers to read as a story. They're often annotated with signs and arrows, and callouts for an event that may be written afterwards with hindsight.

Chronological notes are great for triggering your memory, especially if handwritten. You can find yourself recognising, after a period of concentration, that you've made no notes for that period. This is useful to notice, and hard to notice with more thread-based notes. I tend to scan handwritten notes so that they can be kept, and tag them so that I can search.

Antipatterns: If you're writing chronological notes in an outliner or text tool, you can't freely annotate, and readers can't trust the ordering. If you're writing chronological notes after the event, you're (often) kidding yourself.

Hierarchical / Mind-mapped (with trunk)

Hierarchical notes are both more free, and more structured, than chronological notes.

Notes made in an outliner tool or mindmap tend to be hierarchal – you may start a new indentation group for interesting things as you find them, and see what organisation appears. You can usefully rearrange and sometimes link as you go.

Antipatterns: using a text editor – it doesn't keep hierarchies well, and if you paste stuff in from your target system, it will throw away everything but text. An aversion to moving things around often indicates that you're not keeping the explicit information you need to, but instead making it implicit in the outline. This means you can't use the outline for thinking. If everyone's in the same mindmap, it becomes swiftly huge and unreadable – and mindmaps are pretty unreadable (unless the writer is using it as a describing tool) at the best of times.

Free-form pictures / no trunk mindmaps

Sometimes, you don't want a hierachy at all. That suits times when you're exploring things like state transitions, making maps, or doing work where you've decided to not yet use a model.

Scribbling on blank paper is fine, but working with notes on a wall, or using Miro, Omnigraffle or Scapple suits me better as I can reorganise as I find stuff out. It's a good fit for collective work suits this well, and I hope that our Miro board is a good example.

Antipattern: Ending up with a heap is no fun. You'll find that direction, size and colour develop meaning earlier than you might expect – and inconsistencies in those emergent rules make it hard to reorganise as the picture grows.

Tagged

Notes made with tags and backlinks (i.e. Roam / Notion) can have very rich emergent structure, and suit thread-based exploration. You need to know the tool, and it's oddnesses around paste and search – and you might find that you can't use web-based versions in many organisations.

These are complex tools, and suited for deep exploration over several sessions. Search and filtering become crucial parts of the tool and the note.

Antipatterns: assuming that OneNote will do this well.

Automated

Brendan Connolly and others have investigated Jupyter notebooks and Observable sheets to capture notes and results and to make exploration and re-exploration interactive. Perhaps this is the future...

Automatic

As with Converter and Machine M, it's possible to capture your actions and system reactions with scripts, then keep those as a record. Add parseable logs and environmental records, and you start to get something that takes off.

You need to get good with searches and filters to find useful patterns (patterns matter more than records, and are harder to find) in the vast pile of details.

Making Notes

This is from the distant past...

What to Record
What to track while exploring a system. From 2004.

Contents of your notes

Not exhaustive, but relatively memorable.

  • Fixed at the start: Information about you, time, place
  • Found Facts: input, output, screenshots, logs
  • About you: Actions, Decisions
  • About the target: Insights, Hypotheses, Ideas, Leads

Audience and Purpose

Some exploration notes are written simply as a place to store details which would otherwise clutter the mind of the writer – they're more written than read. They're still useful as a record, typically by the writer, typically within a short timeframe. They end up in the bin if you're organised, as undecipherable clutter if not.

Audience: the writer. Purpose: mind assistance. Persistence: until the end of the next day.

Some exploration notes are written as evidence, to be shared with peers. They're typically detailed, though may leave out necessary information that might be expected to be common knowledge. They don't necessarily tell a story, and while they're typically searchable, they may only be findable by a limited set of people. Examples include bug reports and test session reports.

Audience: the team and surrounding teams. Purpose: a single place for salient shared details. Persistence: until the end of the next release.

Some testing notes are written for long-term storage. They may include narratives of exploration as proof that work has been undertaken, parts of file systems, long log files and database images containing configuration and transaction data. The audience is typically unknown, and the information may be retained rather longer than the people who made it. Such notes are also, typically, more written than read. They take time to make that could otherwise be spent on more immediate concerns, so it's worth asking whoever owns the budget for their expectations. I've been asked only a couple of time to go back over old notes, and my audit colleagues have typically said that they're more concerned with actions taken after finding problems than they are with any exploration notes covering how the problem was found in the first place.

Audience: necessarily unknown. Purpose: audit, maybe. Persistence: until lost.

Experiment Types

Here is a short and non-definitive list of different types of experiments.

Classroom physics often looks like this; different weights on different wires, waves in fluids of different densities. Measurements are made of a few known qualities, in as regular a way as possible, while a few factors are changed by small increments. Gets interesting when the results are analysed, showing trends and a potential simplifying model.

Similar to testing across ranges and combinations, and to performance testing.

Observe results of a known process

Chemistry classes ask experimenters to follow a process in order to learn something or to demonstrate it. Typically has an expected outcome, and gets interesting when that outcome is not reached, as experimenters dig into what might have gone 'wrong' – typically with their experiment rather than the model.

Similar to confirmatory testing and regression testing.

Close observation over populations

Medical experiments, with wide and careful observation of the subject under controlled conditions, potentially using multi-factor analysis, tight control over the cohort, and protocol to reduce noise and bias.

Particle physics uses huge quantities of dull experiments, and custom tools to look for rare instances with unexpected outcomes.

Used to look for unexpected problems in something which may be judged to be effective or valuable in the lab, and for the edges of usefulness of existing models.

Similar to bulk testing.

Find factors

Where the phenomena is known, but the causes are not certain. Experimenters may have a hypothesis, and may be seeking to refute it more than to confirm it. May involve many observations with no difference, while the examined factors are ruled in or out. May involve capturing a lot of data, and throwing out all the expected stuff. May need many experiments, with reverse analysis on the one that goes odd. Gets interesting when an action has an unexpected effect, or does not have the expected effect, so seeks to establish or refute a model.

Similar to diagnosis in software testing.

No influence

Astrophysics has no way to influence most of the universe, so must seek out its subjects from the world. Tends to need observations and specialised instruments to observe known things ever-more finely, or over inhuman times. May need all-new observatories, if a hypothesis implies that something unmeasured is happening.

Similar to testing by observing the live system and its data.

I may put teaching notes in the subscriber area...

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.