Making Democracy Accessible: A Data Pipeline for Dutch Council Meetings

Every Dutch municipality is legally required to publish its council meetings. Amsterdam posts them all: agendas, vote results, transcripts. Anyone can look. In practice, almost no one does, I certainly haven’t in nearly 10 years of citizenship. A single session fills fifty pages of procedural Dutch, legal references, document numbers pointing to other documents, parliamentary formulas no one has bothered to explain since they were written. The information is public, getting anything out of it is something most residents won’t do. It’s a case that underlines that availability does not equal accessibility.

AI020 is Amsterdam’s annual AI hackathon, run by Amsterdam AI around problems the city actually has. During the 2026 edition, AISO challenged teams to take a raw Dutch council meeting and bridge the gap between availability and accessibility. In the next paragraphs we’ll discuss our solution, hurdles and lessons learned.

What we built

A citizen opens the app and sees a calendar. Dates with council meetings on record are marked. Click one, and you get a page for that session: a plain-Dutch paragraph summarising the whole meeting, followed by decision cards. Each card shows the outcome (aangenomen, passed; verworpen, rejected; hamerstuk, nodded through without debate), a vote count, and any amendments or motions with their own nested outcomes.

Every card has a Bekijk bron (view source) button. Open it, and you see the passage from the official document that backs the card’s content. Every claim points to a specific text in a specific document. If you want to know whether the new cycle path got approved or how your representative voted on the housing proposal, this is the two-minute read the raw transcript doesn’t offer. This is how I actually found out about the expansion of the Brasapark near my house.

Landing page. The calendar on the left marks every date that has a council meeting on record. Clicking a date scrolls the right-hand list to that day.
A plain-Dutch paragraph at the top summarises the whole meeting: the two-minute read for a resident who wasn’t in the room.
One decision card. The outcome badge is the first thing a resident sees. Amendments and motions get their own nested badges so you can see what survived the debate and what didn’t.
Every card has a “Bekijk bron” (view source) button. Expanding it shows the verbatim passage from the official document that the summary was drawn from. This is the anti-hallucination contract made visible.

The Open Raadsinformatie (ORI) API already publishes meeting data for Amsterdam and dozens of other municipalities. The data exists, the problem is what it looks like when you pull it.

The data problem: getting session information

The ORI API is built on Elasticsearch, a search and indexing engine. Think of it as a database tuned for text search: fast retrieval across thousands of documents by date, type, or keyword. ORI uses it to index every meeting record across the municipalities it covers.

An issues we ran into is that we hit three undocumented behaviors in our first hour, the documentation did not match what we got back from it. This led to time-consuming trial and terror.

The attachment field has three possible types. On a Meeting document, attachment can be a string (a single document ID), an array of strings (multiple IDs), or absent entirely. The API documentation doesn’t mention this. We discovered it empirically when our code crashed on a single-attachment meeting. The fix was a normalization function that wraps any value in an array before iterating. A simple solution, the problem was navigating towards the root cause.

parent_id queries return zero results. Logically, you’d expect to find all agenda items belonging to a meeting by querying parent_id: <meeting_id>. The query runs without error and returns nothing. We only found the correct path by accident when we noticed a field called agenda.@list sitting quietly on the Meeting object the whole time. Following those IDs leads you to the AgendaItem documents, which in turn carry their own attachment fields.

Date fields require full ISO datetime strings. Submitting 2026-04-08 as a date filter causes a parse exception. The field expects 2026-04-08T00:00:00Z.

The attachments themselves are typically uitslagen (vote result summaries): short documents listing which items were voted on, what was decided, and whether anyone dissented. They are not verbatim notulen (meeting minutes). This matters for what the AI can extract, uitslagen tell you what was decided, but rarely why or what was said in the debate.

The diagram below shows how we work around the three-type problem and the agenda traversal:

Every council meeting links to dozens of supporting documents. This diagram shows how we work around three quirks in the data format, none of them documented, to reach the actual document IDs. The orange box is the fallback path we only discovered by accident.

The token budget: from 2M characters to 40K

Once we had the document IDs, the next problem was sheer information volume. A busy session can pull in a hundred documents or more, with combined text running past two million characters. Claude Sonnet 4.6 has a 200K token context window, but sending everything would be slow and expensive, and most of it is irrelevant. A uitslagen document runs two to four pages: meeting date, agenda items, vote outcomes. That’s the decision record. A supporting memo on a zoning regulation might run thirty pages and mention no decisions at all.

We sort by filename first. Documents whose names contain verslagnotulen, or uitslagen jump ahead of memos, appendices, and background reports. Among those front-runners, longer wins, because a longer uitslagen usually covers more agenda items.

From there we walk the sorted list, accumulating characters until we hit 40,000. Truncation lands on the last paragraph break before the cap, so we never cut mid-sentence. Each document keeps its filename as a header in the prompt so the model can attribute claims to a specific source.

The result in practice: four to six documents with the densest decision content, passed to Claude as a single structured prompt.

A council meeting can produce hundreds of documents. Most are background reading (policy memos, appendices) with no decisions in them. This step cuts the pile to the handful that do, so the AI isn’t wading through irrelevant material.

Structured extraction with Claude

We make one call to Claude Sonnet 4.6 with structured output via json_schema. The schema is a strict contract: additionalProperties: false at every nesting level rejects any field the model tries to invent rather than silently letting it through.

The schema captures meeting metadata (municipality, date, parties present with seat counts), agenda items with vote outcomes and vote counts, amendments, motions, resident impact, and source_text. It also captures commitments (who promised what by when) and processing metadata: which document IDs were included, whether the fallback was triggered, and a completeness note.

We build the system prompt from three parts. Base instructions establish the task. A plain-language guide sets B1-level Dutch writing constraints: short sentences, active voice, no jargon without explanation. A reference document on Dutch municipal government covers roles (wethoudergriffierfracties), decision types (hamerstuk vs. debat), and the difference between a motie (a non-binding request to the council) and an amendement (a proposed change to the wording of a decision). Without it, the model confused wethouders with griffiers and treated moties as amendementen. With it, the extractions held up.

The most important field is source_text. Every agenda item must include a verbatim excerpt from the input document that supports the summary. If no relevant text exists, the field is empty. The model is explicitly instructed never to invent a source. Every claim in the summary has to trace back to a named passage in a named document.

On vote data: we built a two-pass extraction, a cross-checking validator, and a backfill step for per-party vote tallies before we thought to check whether the ORI actually publishes per-party vote tallies. Dear reader, it does not. The uitslagen rarely mention them either. What we shipped is a careful funnel for data that wasn’t there. The pipeline will be ready the day we find something to feed it.

We send the selected documents to Claude with strict instructions on what to extract and how to format it. “Structured output” means Claude returns a machine-readable record with specific fields for decisions, votes, and sources, rather than a summary paragraph. That’s what lets us display each item as a decision card rather than a wall of text.

Architecture choices

The system has two parts that share only a SQLite file: a data pipeline that does the expensive work, and a web UI that renders what the pipeline produced.

The pipeline is a CLI script, not a web endpoint. It fetches from ORI, calls Claude, and writes to SQLite. Each meeting takes about thirty seconds, which is fine for batching and bad for a live demo. So we pre-processed this year’s available council meetings. Before the presentation we ran the pipeline over four months of Amsterdam council meetings, put every session in the database, and dropped page-load latency to whatever SQLite takes to answer a keyed read.

Keeping all the AI work behind the database means the pipeline is re-runnable without touching the UI. Tune the prompt, add a schema field, swap to a newer model, the pipeline runs again and new summaries appear on the next request. No frontend redeploy. The two parts don’t need to coordinate.

The UI just reads structured rows and renders decision cards, vote outcomes, and commitments in plain Dutch. It doesn’t know about ORI or call Claude. The civic problem lives here; the technical complexity lives upstream.

The app is split into two independent pieces. The pipeline does the heavy lifting: fetching documents, calling Claude, storing results. The website just reads from that store and shows it. The website feels instant because all the AI processing happened in advance.

What still needs work, and how we’d solve it

Five hours is long enough to prove the idea and short enough to leave real gaps.

Document selection

Currently this relies on pattern matching through a regex (/verslag|notulen|uitslagen/i) that works when filenames follow conventions. A document named Besluitenlijst_vergadering_20260408.pdf gets missed. The fix is a retrieval step in front of Claude: a RAG layer that ranks every document by semantic similarity to “decisions, votes, and motions from this meeting” and puts the top matches into the prompt. With semantic retrieval, the Besluitenlijst file gets picked up on content regardless of what it’s called.

Token budget

Right now it’s a flat cap. For a meeting with a hundred or more documents, anything past 40K gets dropped. The better approach: summarize each document individually (each summary is short and cheap), then pass all the per-document summaries into a final synthesis pass. Nothing gets dropped, and the summaries are reusable if the same document shows up in multiple sessions.

Vote data

The biggest honest gap. The ORI uitslagen record decisions but not per-party vote calls. Verbatim minutes (notulen) or video transcripts sometimes include the breakdown the uitslagen omit. A production pipeline would cross-reference all three uitslagen for outcomes, notulen for debate context, video transcript for vote calls. The vote validator we built is ready for richer input; the bottleneck is upstream.

Accessibility

Right now the summaries are Dutch-only. Amsterdam has large Turkish- and Arabic-speaking communities, plus residents who read English more comfortably than legalistic Dutch. None of them can use what we built. Adding English, Turkish, and Arabic translations probably reaches more residents than anything else we could ship next. The schema is field-by-field, so translation is a post-processing pass and the pipeline stays the same.

Frontend

Thin for now: no search, no topic or neighbourhood filter, no mobile layout. These are UI investments that don’t touch the pipeline. The data structure already supports them.

Real-time processing

This would need an upload endpoint and status polling. The extraction logic is already modular, so that part is done.

Summary

The technical problem turned out to be solvable in five hours. That’s not the hard part.

The hard part is trust. A resident reading an AI summary of a government decision is taking it on faith. The source_text field is the answer to that: every claim points to a specific passage in a specific document. Not “the model said so”, the document says so, here is the text. A single misattributed decision, reported as fact, would be enough to make the tool worse than useless for civic purposes. The anti-hallucination contract is load-bearing.

Five hours was enough to show that the gap between “technically public” and “actually readable” is closeable. The ORI API covers dozens of Dutch municipalities, not just Amsterdam. The same pipeline runs on Utrecht, Rotterdam, or any municipality in the network. The more that plug in, the more residents get access to decisions that previously lived inside a PDF no one would read.

At The Orange Force, we build software that connects people to systems that affect them. This hackathon was a good reminder that some of the most useful systems are already running, they just need someone or something to make them readable.

If you’re interested in a demonstration of what we built, working on civic tech, open data or a municipality that wants to make council meetings accessible to your residents, hit us a message at info@theorangeforce.com.

Built at the AI020 Hackathon, April 16, 2026, Amsterdam. Team: Beshoy Guirges, Vasilisa Soboleva, Trong Tran.

Scroll to Top