Exploration II – Work (BBC 23Q2)

Open Miro frame for day 3. Return to central page

As delivered:
We covered the topics, and did not get to the exercise should be | actually is
Antony G put the following link into the chat: Dave Snowden on Risk and Resilience

Ways To Explore a System (and how to stop)

We'll start by expanding something we did ealier in the week.

Exercise – How do you Explore What a System Does?

20 mins: 10 mins to add and analyse, 10 mins to debrief

On Day 1, we made a pile of notes answering this question. I've put a copy of that into Day 3's area on the Miro board.

Take 5 minutes to build the pile:

Look at the collection of ways to explore a system.
Add new items to the board (One way per note please – I've split a couple of notes which had sequences)

Take 5 minutes to arrange the pile

In groups (drop into a breakout room) or individually, look for stuff which goes together.
When you see a group, draw and label a circle for that grouping, and put the sticky notes in.
If something you want for a circle has moved to another, duplicate it and put it where you want it. Draw a line between if you like to show that something exists in two places
If items look sequential, link the chain with arrows
Let the groupings and their contents talk to you. If something's missing from a grouping, add it.

When activity subsides, we'll gather as a whole and talk.

We'll look for differences between collections, and iterables (things which can be lists).

For background, I've written a :short and general list of ways.

Stopping

Coverage is a measure of done-ness. Some work makes you more done – projects like to see coverage increasing. Coverage can go down, as well as up: Some changes mean that you need to do more to cover the situation. Some discoveries show that you're not as done as you thought. In testing systems, Coverage tends to mean approaching a predictable and iterable 'done'.

'Done' might mean that you've exercised all options of an iterable, or it might mean that you've run out of time. Both are reasonable.

if you can define the work well, or can reliably achieve what you're seeking, you might aim for an achievement, and see if you can reduce the spend. Confirmatory testing falls into this planning style. People talk about Coverage gets talked about.
if you don't know what you're looking for, you might aim for a spend, and see how much you can do with it. Exploratory testing falls into this planning style. Budget gets talked about.

Some exploration has clear iterables – especially in limited areas. Example: Puzzle 29 is deterministic and has a limted set of 16 posssible inputs. For anything deterministic, limted but bigger (you'll often find huge sets with combinations of configuration, state, business rules) , you'll hope to use a tool to build the data or triggers, and you'll want to collect the output in a way that you can explore with a tool (or your eyes...)

Some exploration has almost-iterables. Example: inputs can be hit with the Big List of Naughty Strings or similar. Working through those might be automated for speed, or for verification.

If you're lucky, the repetitive work of pushing through combinations has already been done for you in large-scale confirmatory automated tests. You can read or run the tests, and you'll have an idea about what classes of weirdness you might engage in exploration.Or you can tweak them into novel cases, and use your observations to find surprises.

However, plenty of exploratory work in software systems does not have anything tangible to iterate over, does not have useful limits to the iteration, or the iterables themselves offer false reassurance of done-ness.

Here, we have to see exploration as an investment. Or a gamble, if you can use that word in your organisation without being drummed out of the engineers. Handily, plenty of project managers understand gambles all too well, even if they're surprised or dismayed to be asked to support it in a field they imagine might be more easily-determined.

So: the ultimate way to stop is to run out of opportunity. And thatg means tht one way to respond to "how long will it take" is "how long can you afford", which leads, in happy circumstances, to conversations about priorities, values and risks.

Variety and when to stop

We'll use a simulation to show how varying approaches can be a viable way to sustain the rate of discoveries.

Simulation characteristics

The 'system' has many things to be found. We, on the outside, can see what's been found, and what remains.
The work has a budget, representing how often it can look.
Each thing has a fixed chance of being found (for a given technique).
There are several independent techniques.
Each thing found has a consequential cost. These obey a power law; 90% 'cost' 1, 9% cost 10, 0.9% cost 100, 0.09% cost 1000 and so on. Unfound things pass their cost on

Exercise: Simulation – Budget and Bugs

10 minutes – I'll run these on my machine (it's in Flash...)

I'll ask you, after spending some of the budget, whether you think it's worth spending more.

We'll try different approaches; one tester with one technique, several testers with the same technique, several testers with different techniques, on tester switching technique.

The experiments I have are:

3.00: one tester, one tactic
3.01: six testers, all taking same tactic
3.02: six testers, all taking diferent approaches
4:01: one tester switching between 10 techniques
4.02: three testers with different switching approaches

Expectations and Deliverables

Exercise: should be | actually is

5-10 minutes

Look for the oval containing should be | actually is

Above it, there should be a set of stickies, each listing something you might use while testing a system.

Collectively, sort the stickies into either side of the circle. For example Execuable files actually exist. SLAs describe what the system should be.

If you find something fits both, or if you find something on the wrong side, duplicate it and make a note.

Debrief – can we see patterns? Can we see iterables on both sides? What else might we add?

‼️

CHANGES
I’ve cut the scope of this big session – do read if you want to. We'll put some of this into a later session.

Metrics

I keep track of time – time spent and time planned.

I keep track of how the tester feels about the part of the sytem the tested. A proxy for that is how much time they think they might need to 'complete' testing.

I keep track of charters added, and the planned session that we weren't able to do.

I keep track of (for instance) functional area, technological target, and risk profile of a charter, to help me slice the list, and to help me describe the work.

Session-based Testing – Charters, TimeBoxes and Reviews

A box failing to contain an lightning bolt, and a combined award and feedback loop

Exploration is:

Open-ended
Centred on learning

One way to manage the work is with Session-based Testing. SBT tries to:

1) Limit scope and time by

prioritising via a list of sessions,
focussing with a charter
timeboxing

2) Enable (group and individual) learning with

regular feedback
good records

Sessions and Charters

Many circles overlapping an open-ended box

A session is a unit of time. It is typically 10-120 minutes long; long enough to be useful, short enough to be done in one piece. A session is done once.

A charter is a unit of work. It has a purpose. A charter may be done over and over again, by different people. When planned, it's often given a duration – the duration indicates how much time the team is prepared to spend not how long it should take.

The team may plan to run the same session several times during a testing period, with different people, or as the target changes.

Anyone can add a new charter – and new charters are often added. When tester needs to continue an investigation, but wants to respect the priorities decided earlier, they will add a charter to the big pile of charters, then add that charter somewhere in the list of prioritised charters – which may bump another charter out.

In Explore It, Elisabeth Hendrickson suggests

A simple three part template

Target: Where are you exploring? It could be a feature, a requirement, or a module.

Resources: What resources will you bring with you? Resources can be anything: a tool, a data set, a technique, a configuration, or perhaps an interdependent feature.

Information: What kind of information are you hoping to find? Are you characterizing the security, performance, reliability, capability, usability, or some other aspect of the system? Are you looking for consistency of design or violations of a standard

I've found it helpful to consider a charter with a starting point, a way to generate or iterate, and a limit (which may work with the generator), and to explicily set out my framework of judgement.

Charter Starters

Use these to give shape to early exploratory test efforts.

These are unlikely to be useful charters on their own – but they may help to provoke ideas, clarify priority, and broaden or refine context.

Note behaviours, events and dependencies from switch-on to fully-available
Map possible actions from whatever reasonably-steady state the system stabilises at after switch-on. Are you mapping user actions, actions of the system, or actions of a supporting system?
How many different ways can the system, or a component within that system, stop working (i.e. move from a steady, sustainable state to one unresponsive ton all but switch-on)? Try each when the system is working hard - use logs and other tools to observe behaviours.
Pre-design a complex real-world scenario, then try to get through it. Keep track of the lines of enquiry / blocked routes / potential problems, and chase them down.
What data items can be added (i.e. consumable data)? Which details of those items are mandatory, and which are optional? Can any be changed afterwards? Is it possible to delete the item? Does adding an item allow other actions? Does adding an item allow different items to be added? What relationships can be set up between different items, and what exist by default? Can items be linked to others of the same type? Can items be linked to others of different types? Are relationships one-to-one, many-to-one, one-to-many, many-to-many? What restrictions and constraints be found?
Try none-, one-, two-, many- with a given entity relationship
Explore existing histories of existing data entities (that keep historical information). Look for bad/dirty data, ways that history could be distorted, and the different ways that history can be used (basic retrieval against time, summary, time-slice, lifecycle).
Identify data which is changed automatically, or actions which change based on a change in time, and explore activity around those changes.
Respond to error X by pressing ahead with action.
Identify potential nouns and verbs - i.e. what actions can you take, and what can you act upon? Are there other entities that can take action? What would their nouns and verbs be? Are there tests here? Are there tools to allow them?
Identify scope and some answers to the following: In what ways can input or stimulus be introduced to the system under test? What can be input at each of those points? What inputs are accepted or rejected? Can the conditions of acceptance or rejection change? Are some points of inputs unavailable because they're closed? Are some points of input unavailable because the test team cannot reach them? Which points of input are open to the user, and which to non-users? Are some users restricted in their access?
Identify scope and some answers to the following: In what ways can the system produce output or stimulate another system? What kinds of information is made available, and to what sort of audience? Is an output a push, a pull, a dialogue? Can points of output also accept input?
Explore configuration, or administration interfaces. Identify environmental and configuration data, and potential error/rejection conditions.
Consider multiple-use scenarios. With two or more simultaneous users, try to identify potential multiple-use problems. Which of these can be triggered with existing kit? If triggered, what problems could be seen with existing kit, and what might need extra kit? Try to trigger the problems that are reachable with existing kit.
Explore contents of help text (and similar) to identify unexpected functionality
Assess for usability from various points of view - expert, novice, low-tech kit, high-tech kit, various special needs
Take activity outside normal use to observe potential for unexpected failure; fast/slow action, repetition, fullness, emptiness, corruption/abuse, unexpected/inadequate environment.
Identify ways in which user-configurable technology could affect communication with the user. Identify ways in which user-configurable technology could affect communication with the system.
Pass code, data or output through an automated process to gain a new perspective (i.e. HTML through a validation tool, a website through a link mapper, strip text from files, dump data to Excel, job control language through a search tool)
Go through Edgren’s “Little Black Book on Test Design”, Whittaker's "How to Break..." series, Bach's heuristics, Hendrickson's cheat sheet, Beizer-Vinter's bug taxonomy, your old bug reports, novel tests from past projects to trigger new ideas.

Writing charters takes practice. A single charter often gives a scope, a purpose and a method (though you may see limits, goals, pathologies and design outlines). You could approach it by considering...

Existing bugs – diagnosis
Known attacks / suspected exploits / typical pathologies
Demonstrations – just off the edges
Questions from training and user assessment

A collection of charters works together, but should always be regarded as incomplete. You're investing resources as you work on whatever you choose to be best, not trying to complete a necesary set. A collection for a given purpose (to guide testing for the next week, say) is selected for that purpose and is designed to be incomplete.

Exercise: Writing Charters

20 minutes, groups or individually

Pick a subject – preferably your own system. If you want a new one, I suggest https://flowchart.fun/
Write at least four charters – one to explore in a new way, one to investigate a known bug, one to search for surprises, one to work through a list
We'll talk about those charters

Further work – prioritise and run the sessions.

Timeboxing

An arrow filled with of many smaller arrows

When entering a session, the tester chooses their approach to match the time; a 10-minute session would have different activities from a 60-minute session.

Awareness of time is important: you may follow your charter, yet want to pause after a while and think about alternatives. You might want to take time at the end to wrap up your notes. If you're a tester who

Regular review

Notes taken during the session are primarily for the tester.

To help the team to learn about the system from each other's explorations, the tester will review their session with a colleague while it is still in their mind. Records of the session will help the review. The business of reivew can also help the testers to share and improve their exploratory approaches.

Distributed peer reviews are more helpful to the team, and have fewer bottlenecks, than reviews with a single person (i.e. Test Lead).

The team may choose to bolster that learning by keeping session records – and may impose processes and standards to the archive. This can be both helpful and costly.