Workshop Costs

Articles Jun 9, 2025 (Jun 10, 2025) Loading...

tl;dr – less than a coffee morning

Costs for the API and environments (under $20), emissions (<1kg CO2), power (1.5kWh) water (5.5l)

On June 4, I ran a longish workshop where I gave a big crowd of testers unlimited access to my Anthropic and ChatGPT API keys. Each participant had a chunk of a server with pre-installed software, VSCode for the web, Simon Willison's LLM, webpages served via Flask and Nginx, and a handful of Python and JavaScript tools. It took months to set up, but people want to know what it cost to deliver.

These are good questions to answer. Here are my thoughts:

Imagining limitations

I wanted to be virtual, because I didn't want anyone downloading or installing anything. I wanted to be low-bandwidth, because conference wifi. I chose to use VSCode in the Browser, via DigitalOcean servers, set up via Ansible before the workshop. My DO account lets me run around 50 servers; experimenting earlier indicated that a reasonable sized server stayed responsive with 5-8 people, and seemed to have headroom to go to rather more.

Our space had 11 tables. I gave each a group: each group shared a server; each server was set up to have 10 environments.

The API keys for the workshop were all from my Anthropic / OpenAI accounts, which imposed different limitations.

Participants (mostly) used Anthropic's Claude 3.5 Sonnet. My account has 1000 requests / min and 80K input tokens / 16K output (from Rate limits) for Claude 3.5. I thought we'd not be likely to get to the request limit, but it seemed likely that we'd hit the token limit.

Prior to the workshop, I reckoned a typical request in my workshop would have 2000-2500 input tokens and ~400 output tokens (from by token-calculator.net)

So, each minute, my workshop might squeeze 40 input requests from 80K input tokens, and 40 outputs from 16K output tokens.

EuroSTAR told me we'd be 88 people. They were running scripts, not making requests, and those scripts would make up to three requests, typically over a 60-90 seconds. If everyone set off at the same time, that might mean an expected peak around 200 requests in a minute, and a less-likely peak somewhere north of that. I needed to spread out the peaks somehow, or guide people to expect failure – I had a spiel, and a trick with a timer.

Actual limitations

I ultimately forgot to do either. I went from table to table throughout the workshop, and only one or two people mentioned that they had seen a limit. All the groups seemed to get stuck in.

After the workshop, I saw that over 400 requests had been limited; 366 inputs and 47 outputs. My participants may not have expected failure, but they did accept failure.

LLM Costs

Sonnet 3.5 is $3/M in, $15/M out; a 2000-token request is ¢0.6, and a 400 token output is ¢0.6. One participant wrote:

having attempts linked to tokens with a tangible cost makes me feel more reticent to keep spamming the remake command. (or it would if I were the one paying ;) )

According to usage stats, my Anthropic account used about 2.8M input tokens and maybe 0.4M output tokens on June 4. That's about $15 dollars total. Anthropic billed me $13 for Jun04 – let's call it a tenner. For those of you in the room, most of the groups used $1-2 of Claude, but group05, 07 and 10 used $0.5 or less, while group09 used $2.11.

The workshop had access to OpenAI, but GPT4o-mini was poor at rewriting code to pass tests, and GPT-4 was s l o w. Total OpenAI usage seemed to be around 50 requests across 8 groups, using 0.4M tokens. Thirsty group09 had their share, but thrifty groups05, 07 and 10 didn't seem to make a dent. Perhaps their participants used their own tokens...

Workshop participants will remember that we ran out of (Anthropic) tokens after an hour or so. Handily, that was easy to find out about, and fast to fix. It wouldn't have happened at all if I'd remembered to top up the account beforehand, or had published the realtime cost on screen as planned.

LLM Consumption

Reading How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference, I find that Claude 3.7 Sonnet consumes 2.7 Wh for a query with 1k tokens input / 1k output, and 5.5Wh for 10k in / 1.5k out. From the same paper, it consumes around 20ml water and emits around 2.5g CO2 for the larger query.

Taking those values and scaling (by 2.8e6/10e3 =280 ) one might imagine that the workshop consumed 5.5 * 280 = 1.5kWh in energy and 5.6l water, emitting 700g CO2.

Let's put that into personal terms: That's as much liquid as two big supermarket bottles of milk, as much energy as is needed to boil water for 70 cups of tea, as much carbon as is emitted on a swift car ride to the shop. With domestic electricity in the UK around 25p/kWh, perhaps 40p of my £10 LLM cost is power – though only Anthropic knows what the actual bill might be.

This of course ignores the vast costs of training the models in the first place, the copyright theft needed to get the language models good-enough to be sensible (ingesting open-source software without language isn't enough), the risks inherent in training the lunatic chatbots to make us satisfied, and far more that I don't (yet) comprehend.