Eighth Conference on Digital Humanities and Digital History · C²DH, University of Luxembourg

Beyond Keywords

AI-Mediated Access to the Islam West Africa Collection through an MCP Server and Agent Skill

QR code linking to these slides online Scan for these slidesslides.frederickmadore.com

The Islam West Africa Collection (IWAC)

An open archive, built by one historian

Fifteen years of fieldwork and library work — Benin, Burkina Faso, Côte d'Ivoire, Togo — turned into an open-access collection.

  • Mostly newspaper clippings and Islamic publications: magazines, brochures, tracts
  • Behind it, a personal library of 30,000+ Zotero items
  • Online at islam.zmo.de
In numbers
14,700+documents
28.1Mwords
68,122pages
13document types
9,315audiovisual minutes
864references

Six countries · mostly French — also Arabic & Hausa · open access since 2023

The Islam West Africa Collection (IWAC)

An overview of the collection

The Islam West Africa Collection homepage at islam.zmo.de — featured items, a search box, and browse-by-country links

The "El Hadj" problem

Why keyword search falls short

The IWAC faceted search interface — year slider, type, newspaper and language filters

The faceted search of the IWAC.

Keyword search has built-in problems. I mitigated what I could:

  • Spelling variants → authority files merge 20+ forms
  • Forming a query → autocomplete and facets guide you
  • Polysemy → "hadj" = pilgrimage and the "El Hadj" honorific
The real gap The search box assumes you already know it.

Why not just ask AI?

Generic answers, dubious sources

A Gemini chat answering a question about Islam in Kpalimé, Togo in the 1990s — a long, fluent reply citing obscure sources

Gemini on Islam in Kpalimé: generic and dubiously sourced. See the chat ↗

  • People now ask AI directly, not a search box
  • Chatbots often invent citations — worse for Africa, with far less training data
  • Web search helps, but the sources are often random
  • The IWAC? Crawled by AI bots, yet barely used and badly cited

Two layers: plumbing and intelligence

The MCP server — plumbing

Structured, read-only access to the collection that any AI assistant can call.

The agent skill — intelligence

The methodology for using it well — a historian's method, written down.

The claim to hold onto The skill is where the real contribution lives. github.com/fmadore/iwac-mcp-server

Why a server, not just a chatbot?

An MCP server connects an AI assistant to one outside system. Like an app to your accounts:

Gmail Google Calendar Google Drive Slack GitHub IWAC
RAG

Retrieval-augmented generation. Proven for GLAM, but it chunks records, and retrieves by a similarity the user never sees.

Model Context Protocol logoMCP

Ingests nothing. The data stays put; the server holds the logic; the model just asks and answers. modelcontextprotocol.io

Agent skills

What is a skill.md?

A skill is a methods handbook for a machine reader — loaded only when a request matches its purpose.

  • Competence by reading, not retraining
  • Versioned, reviewed, refined — a prompt is typed once and lost
  • An open standard across agents — no vendor lock-in, like MCP
  • Born in coding agents — the format fits any domain
my-skill/
├── SKILL.md          # Required: metadata + instructions
├── scripts/          # Optional: executable code
├── references/       # Optional: documentation
├── assets/           # Optional: templates, resources
└── ...               # Any additional files or directories

The standard layout of an agent skill — SKILL.md is the only required file.

The IWAC MCP server

  • "Newspaper articles on Islam in Abidjan, in the 1990s" articlesplace · period
  • "Islamic publications on secularism in Togo and Benin" publicationssubject · country
  • "Academic references on Muslim women in Burkina Faso" referencessubject · country

One plain-language request; the model picks the tools and combines the filters.

22 tools · 6 subsets
  • Read-only: search, full text, authority records, sentiment, stats
  • Verifiable: every call a lookup, no AI at query time
  • Traceable: every result links to its canonical record
  • One click: installs in Claude Desktop

Search by meaning, not words

1 · Ask Your question

« Islam et laïcité au Burkina Faso »

Plain language — and any language works.

2 · Embed Turned into coordinates

An embedding model (Gemini) maps the text to 768 numbers — one point in a space of meaning.

laïcité → [0.07, -0.12, 0.93, …]

Every article was mapped the same way, in advance.

3 · Match Nearest points win

Cosine similarity ranks all 12,000+ articles by how close they sit — closest in meaning, not in wording.

The IWAC skill in motion

A five-phase research method

Two depths, chosen up front:

Briefdefault a quick scan and a few close reads
Extended the full five-phase analysis below
  1. Scoping — what does the collection even hold for this question, before any keyword?
  2. Systematic searching — French + transliteration variants (Tabaski = Eid al-Adha); log every search, including null results
  3. Deep reading — full text and sentiment, article by article
  1. Triangulation — cross-check articles, publications, references, index entries
  2. Synthesis — findings with source attribution and confidence grades
The point Twenty-two tools give reach. The "judgement" lives in the skill.

What a server never could

Discipline, disclosure — and a new role

  • Linguistic discipline — search the French corpus in French, whatever language the question is in
  • Bias disclosure — every synthesis states the skew: francophone tilt, missing Arabisants, thin Niger and Nigeria
  • Evidential discipline — claims tagged primary / secondary / AI-derived; absence of evidence ≠ evidence of absence

The curator's method, in full

iwac-mcp · SKILL.md

Loading the skill file…

Scroll to read the whole file — 190 lines, open source at github.com/fmadore/iwac-mcp-server

From a plain question to cited sources

The IWAC MCP server installed as an extension in Claude Desktop — read-only, ~22 tools, no API key for the core tools

Installed as a Claude Desktop extension — read-only, no API key for the core tools.

One real question, and the skill runs the method live.

  • Scopes the collection, then searches it in French
  • Reads the strongest hits in full
  • Answers with every claim linked to its IWAC record

Two saved runs below — press ↓

Brief — the default depth

claude.ai/share/d37cbcb6… Full chat ↗
A Claude conversation: asked in English what the Islam West Africa Collection says about secularism in Côte d'Ivoire, the iwac-mcp skill reads its method files and then offers a Brief-or-Extended depth choice, defaulting to Brief

Claude Sonnet 4.6 · high effort

Extended — the full method

claude.ai/share/2bdf2009… Full chat ↗
A Claude conversation: asked about Islam in Kpalimé, Togo, the user answers 'extended', and the iwac-mcp skill begins its five-phase analysis with Phase 1 scoping

The same question we put to Gemini earlier, now answered from the IWAC.

The stakes

Whose infrastructure? African collections and AI extraction

GLAM-E Lab report by Michael Weinberg, June 2025 — 'Are AI Bots Knocking Cultural Heritage Offline?'

39 of 43 heritage institutions hit by AI-bot traffic spikes (Weinberg, GLAM-E Lab, 2025); Wikimedia: 65% of its costliest traffic is bots. Read it ↗

  • The default is extraction — public institutions and open-access labour subsidising commercial AI
  • MCP, a third path — the collection stays put; the institution decides what the AI sees
Yékú's paradox · 2026 Un-digitised, an African archive is invisible to AI; digitised, it's exposed to scraping — the incomplete corpus becomes the single story, the colonial library rebuilt in the training set. Read the essay ↗

Conclusion

A real opening for public access — not a panacea: it finds what you ask for, not the serendipity of the stacks.

Next: a small open-source model on the IWAC server, answering on the website itself — no install, no account. github.com/fmadore/iwac-mcp-server

QR code linking to these slides online Scan for these slidesslides.frederickmadore.com