Expansion
Michael B. Currie has a long-term vision for the expansion of Curriepedia into the most comprehensive, high-conviction genealogical and historical knowledge graph of humanity ever attempted, grounded exclusively in primary and "low-background-steel" (pre-AI-era) evidentiary sources.
Ultimate Vision
Curriepedia aims to create a tightly linked global graph where the vast majority of humanity is connected through strong, verifiable genealogical traces. Rather than the broad but often shallow approach of large collaborative trees like Geni.com, Curriepedia prioritizes **depth, source density, and evidentiary rigor**.
Every person, place, organization, and event becomes a node in a richly layered network. Relationships are established through overlapping evidence: birth, marriage, and death records; land ownership; newspaper mentions; photographs; and institutional documents. The ultimate goal is a planetary-scale "map of humanity" where individuals are anchored to real historical reality, creating a resilient backbone of truth that can serve as a counterweight to synthetic information and AI-generated content in the decades ahead.
Why Thunder Bay is the Ideal Starting Nexus
Thunder Bay (and its predecessor cities of Port Arthur and Fort William) offers an exceptionally strong foundation for this work due to several unique characteristics:
- Geographic isolation: Located on the north shore of Lake Superior, the region is 8 hours from any other city, meaning the individuals who lived there are likely to stay put and not overlap with other records elsewhere.
- Demographic stability: Strong ethnic communities (Finnish, Italian, Ukrainian, Scottish, etc.) created dense, multi-generational family clusters that are easier to reconstruct; the population has stayed almost the same since amalgamation in the 1970s.
- Excellent documentation: The area has unusually rich archival holdings, including nearly continuous newspaper coverage since the 1870s, detailed city directories, land records, church registers, and institutional archives.
- Manageable scale: Roughly 300,000 unique individuals lived in the Thunder Bay area between 1885 and 2000, a number large enough to be meaningful yet small enough to allow high-density sourcing.
These factors make Thunder Bay one of the best "seed crystals" for building a high-conviction human graph.
Core Methodology
Curriepedia begins with fragmented offline sources and reconstructs lives with rigorous sourcing:
- Land records: Every individual possible is tied to specific land parcels (lots, concessions, and legal descriptions) through historical patents, tax rolls, and deeds. This grounds people in physical space and time.
- Newspapers: Full runs of the *Thunder Bay Sentinel*, *Fort William Daily Times-Journal*, *Port Arthur News-Chronicle*, and successor papers.
- Censuses and vital records: Canadian censuses, birth/marriage/death indexes.
- Institutional and academic sources: Books, reports, and articles published by the Thunder Bay Historical Museum Society and related organizations.
- Photographs and ephemera: From the Museum's collection of over 500,000 images.
All articles maintain strict sourcing standards using only pre-2022 (preferably pre-1956) material to preserve signal quality from the "low-background steel" of the pre-AI era.
Total Raw Volume Estimate:
- Text pages to process: 600,000 – 1,000,000+ (newspapers + directories + books + archives).
- Photos: 500,000+ (many usable for LIFE PHOTO feature).
- Storage (after scanning/OCR): 5–15 TB raw (high-res scans + OCR text + metadata). Compressed/processed: much less.
AI cost Estimate; For 200,000 entities: ~1 trillion input tokens + 140 billion output tokens (rough). $600,000 – $1.4 million, or $100k - $200k with on-premises AI and other optimizations.
Additional Non-AI Costs:
- Scanning/microfilm digitization: $50k–$200k+ (professional services or grants).
- Storage + hosting: $5k–20k/year.
- Developer time for agents/graph: Significant but fundable.
The Limits of Current Legal Barriers
Today, the primary remaining obstacle to the widespread creation of comprehensive human biography databases and advanced facial-recognition services is not a lack of data or insufficient technology, but rather the existing legal and regulatory framework.
Governments in Canada, the European Union, and parts of the United States have placed significant restrictions on large-scale scraping of biometric data, unconsented aggregation of personal information, and the commercial sale of facial-recognition tools. These laws currently function as the last meaningful brake on the construction of planetary-scale identity graphs.
However, this legal position is structurally untenable over the next 10–20 years. As AI inference costs continue their steep decline and become effectively negligible for even ambitious private actors, the economic barriers that once made such projects impractical will disappear. What is currently expensive, legally risky, and therefore rare will become cheap, technically trivial, and therefore inevitable.
In this respect, today’s privacy and biometric laws risk becoming as quaint and ineffective as anti-spam legislation (such as Canada’s CASL or the U.S. CAN-SPAM Act) has become: formally still on the books, yet largely powerless to stop the flood once the underlying economics shifted. Determined actors — whether state intelligence services, large technology firms, or sophisticated private entities — will simply operate from permissive jurisdictions, use decentralized or offshore infrastructure, or pressure governments to relax enforcement in the name of national security, economic competitiveness, or public safety.
Curriepedia’s approach is a direct response to this coming reality. By building a transparent, high-conviction, publicly verifiable graph anchored in primary historical sources rather than secret or opportunistic scraping, we create a constructive, community-governed alternative to the inevitable wave of lower-quality, less accountable systems.
Phased Expansion Approach
Phase 1: Foundation (1803–1900)
- Focus exclusively on the pre-1901 period.
- Import and create stub entries for every named individual in available censuses.
- Cross-reference with early newspapers, land records, and city directories.
- Build initial graph of relationships, businesses, streets, and organizations.
Phase 2: Dense Core (1900–1956)
- Expand using full newspaper runs, land records, and Historical Society publications.
- Create detailed articles for people with multiple independent sources.
- Map complex relationship networks: family, employment, ethnic associations, land ownership, and social organizations.
- Volunteers from the Thunder Bay Historical Museum Society review articles and assist with entity de-duplication and verification.
Phase 3: Modern Extension and Global Linkage
- Carefully extend coverage post-1956 where strong evidence exists and copyright allows.
- Link Thunder Bay residents outward through migration patterns to other regions of Canada and the world.
- Integrate Wikipedia's approximately 1 million biographical articles as high-value connector nodes.
- Establish bidirectional links with major trees such as Geni where high-confidence matches exist.
Phase 4: Global Expansion
- Expand to other high-conviction densely-interrelated cities to accumulate more nodes
- Eventually expand to as many of the 7 billion or historical population as possible
- Note that we don't NEED to cover all people, many are not important and don't matter to the global knowledge graph? Or is numerical completeness important to avoid Sybill attacks? Not sure?
- Expand to include other primary sources, include user submissions, so the knowledge graph can get tighter and tighter, like Wikipedia but hyperlocal and with no rules on "significance" etc.
- Truth API (see below), a paid service to check whether something is true against our knowledge graph
- People can check things like "is this a real person?" "is this picture actually real, or is it generated by AI?" "is this story true?"
Technical Infrastructure
- All primary source scans and documents are stored in Dropbox with permanent links.
- Every primary source document is cryptographically hashed.
- Hashes are recorded on a public blockchain (Ethereum or Solana) to create an immutable, forgery-resistant record of provenance.
- This blockchain hierarchy ensures the integrity of the underlying evidence base even as the knowledge graph grows.
Note on Copyright: Many 20th-century sources remain under copyright. Where full texts cannot be made publicly available, articles link to the Thunder Bay Museum or Library catalog entries for purchase or in-person consultation. Public domain and openly licensed materials are provided directly.
Significance
By starting with a densely documented, contained population like Thunder Bay and methodically expanding outward through genuine familial, economic, and migratory connections, Curriepedia seeks to become the high-conviction nexus for an eventual global human graph.
This can be the foundation of an attempt to document a growing core of humanity with deep, multi-source evidentiary rigor — creating a permanent, trustworthy layer of historical truth that can withstand the coming challenges of the AI era, where every threat surface will face an unlimited flood of AI-generated "humans" on message boards, in elections, and in our historical records, making it impossible to distinguish truth from reality without a stable cryptographically verifiable knowledge graph to use as a reference.
How to Monetize a Public Good?
A white-pages like site like Curriepedia is inherently a public good and thus tricky to monetize. Once high-quality, well-sourced articles exist, they benefit everyone — descendants, historians, researchers, and AI systems seeking ground-truth human data. This mirrors the old White Pages and Who's Who directories that once existed in every home before privacy norms made them socially unacceptable, even though people now voluntarily share far more personal information on social media.
This public-good nature is both a strength and a challenge. Ancestry, Geni, and similar platforms, have followed a monetization strategy of restricting access to historical records and built successful businesses by turning overlapping public records into private "treasure hunts". A single marriage certificate can be "discovered" and paid for separately by dozens or hundreds of descendants, month after month. However, AI makes this model even more outdated — machines can accurately attach source documents to the correct individuals in seconds — yet these platforms have little incentive to eliminate the friction that drives their subscriptions.
Curriepedia takes the opposite approach: make the core knowledge freely available while building sustainable value on top of it. We lean into the Public Good nature of human directories. However, that does create a major problem: how to fund it then? Where is the monetization?
The Blockchain of Truth Layer
To solve the monetization and authority problem inherent in public goods, Curriepedia will maintain a cryptographic hash tree (Merkle tree) of all primary sources and entity records. Every scanned document, newspaper article, land record, and photograph used in an article will be hashed. These hashes are then anchored on a public blockchain (likely Ethereum or Solana for cost and ecosystem reasons) in the form of NFTs or non-fungible tokens that represent verifiable provenance.
This creates something Wikipedia has never attempted: a single, canonical, tamper-evident root of truth for historical human data.
How it works technically:
Each entity (person, street, business, etc.) has a master record containing cryptographic hashes of all supporting primary sources. These hashes form a Merkle tree, allowing efficient verification that no source has been altered. Periodic "anchor transactions" are published to the blockchain. Anyone can independently verify that the current state of Curriepedia matches the blockchain record. New sources or corrections can only be added to the canonical record through Curriepedia's governance process (initially controlled by the project team, later potentially including vetted volunteers from historical societies).
Gatekeeping the Record:
Yes, we can (and must) act as gatekeepers. While anyone can fork the public data and create their own copy, only additions that go through Curriepedia's verification workflow can receive an official Curriepedia hash and blockchain anchor. This maintains our position as the trusted root. Other copies will always be viewed as derivative. The stronger the network effect becomes — the more external sites, researchers, AI models, and institutions cite and link to Curriepedia entries — the more valuable and authoritative the canonical version becomes. No other copy will have the same density of verified sources, community review (e.g., Thunder Bay Historical Society volunteers), or blockchain-backed provenance.
Monetization Through Trust
Because Curriepedia controls the canonical verification layer, several sustainable revenue streams become possible without hiding the public good:
- Paid verification and certification services
- Premium enrichment (adding new sources or corrections to the official record)
- API access for high-trust queries
- LIFE PHOTO and advanced visualization features
- Institutional licensing for AI companies and researchers who want guaranteed provenance
The more the public good grows and is referenced across the internet, the stronger the network effect, and the more valuable it becomes to attach new information to the official, trusted hash tree. In this way, Curriepedia can remain maximally open and truthful while still building a viable, self-sustaining project — one that counters the coming flood of low-quality, synthetic, and opportunistic human data with a high-conviction alternative.
The more Curriepedia becomes a recognized public good, the more valuable it grows as a trusted repository of truth. This creates powerful network effects: as adoption increases, more people and organizations will want to anchor their own documents, research, and claims to the Curriepedia hash tree.
Curriepedia Truth API
On top of this foundation, Curriepedia will offer an **AI-powered verification layer** and public **Truth API**. This system can:
- Analyze proposed new sources (news articles, documents, photographs, etc.) submitted by users or external parties.
- Cross-reference them against the existing high-conviction knowledge graph.
- Automatically criticize, validate, or flag inconsistencies using multiple dimensions (chronological conflicts, location mismatches, relationship contradictions, source reliability, etc.).
This functions as a modern, vastly more capable **Snopes-like service** — but grounded in dense, structured, historically verified data rather than ad-hoc web searching. Other businesses and AI developers will be able to build checker tools, fact-verification agents, and research assistants on top of the Curriepedia Truth API.
This approach delivers far superior **Retrieval-Augmented Generation (RAG)** compared to the open web. The plain internet is full of uncorrelated claims, synthetic content, propaganda, and low-quality data. In contrast, Curriepedia offers a dense, internally consistent, cryptographically anchored graph of real human lives and relationships — providing AIs with reliable context and contradiction detection that is currently unavailable at this scale.
In this way, Curriepedia evolves from a historical knowledge base into critical infrastructure for truth in the AI era.
Competitors and Similar Efforts
Curriepedia operates in a space with several established players, but differs significantly in philosophy and approach:
- Ancestry.com and MyHeritage: Large commercial platforms focused on private family trees and subscription-based access to historical records. They prioritize user engagement through perpetual "treasure hunts" rather than creating a comprehensive public knowledge graph.
- Geni.com: A collaborative global family tree that allows broad user editing. While it has extensive coverage, it suffers from inconsistent sourcing quality and duplication issues.
- WikiTree: A free, community-driven genealogical wiki with strict sourcing guidelines. It is one of the closest models but remains mostly focused on collaborative editing rather than dense, high-conviction regional clusters or blockchain-anchored verification.
- Wikipedia: Excellent for notable individuals, but covers only a tiny fraction of humanity and is not optimized for deep genealogical relationships or local history.
- Modern AI/Blockchain Efforts: Projects such as OriginTrail (a decentralized knowledge graph with NFT-based provenance) and various fact-checking or content-verification protocols are building infrastructure layers for verifiable data. Most function as general-purpose “proof checkers” for narratives rather than owning the underlying chain of atomic facts.
Curriepedia’s Truth API goes further: it functions as a Lean-style proof checker for knowledge creation itself. Every proposed sentence, article, or document can be rigorously tested against the canonical NFT chain of primary facts. While most sources are narratives built on top of facts, Curriepedia maintains the bottom-layer NFT chain of facts (which need not all be true). Using antifragile techniques — such as betweenness centrality within the graph and “low-background-steel” recency weighting — false or poorly supported claims are naturally isolated and de-emphasized. This creates a far more robust foundation for verifiable journalism, legal documents, academic work, and AI reasoning than any existing effort.
It also leads to a powerful network effect that will make the underlying API business extremely valuable, funding the public good efforts and making the business sustainable.
The Mechanism of the Moat: Physical Scarcity in a Digital World
The most formidable competitive moat in the AI era will not be built on compute or algorithms, which are rapidly commoditizing, but on verified, non-synthetic data.
Curriepedia’s moat relies on "low-background-steel"—pre-1956 physical primary sources. A foundational AI model company (like OpenAI or Anthropic) cannot brute-force the creation of 19th-century land registries or physical newspaper microfilms through pure compute. They are allergic to the messy, physical logistics required to digitize and hash local archives. By doing the grinding physical-to-digital translation, Curriepedia captures a scarce resource. The Web of Trust, anchored to physical reality, becomes mathematically immune to the synthetic generation that foundational models rely on.
- Network Effects and the "Oracle Problem"*
Blockchains have historically struggled with the "Oracle Problem"—how does a smart contract know what is actually happening in the real physical world? Curriepedia essentially builds an Oracle for human history and identity.
This triggers a powerful, multi-sided network effect:
The Supply Side: Genealogists, historical societies, and individuals contribute primary sources, seeking the permanence of the cryptographic ledger.
The Graph Density: As the graph grows, the Web of Trust strengthens. The more interconnected the nodes (Thunder Bay linking to global migration patterns), the harder it is for a malicious actor to squat on an identity or inject a false node.
The Demand Side: Foundational AI models, autonomous agents, and fact-checking systems require a baseline of reality to function.
As the density of the graph increases, the reputational and operational cost for an AI model not to check its outputs against the Curriepedia Truth API becomes too high. It becomes the definitive standard for "humanness."
Economic Rents via RAG (Retrieval-Augmented Generation) In an ecosystem where AI agents are constantly generating claims, answering queries, and performing research, each unverified output is essentially an informational liability. To resolve these liabilities, AIs must reconcile their outputs against a trusted asset base.
This is where the economic rents are extracted. If an LLM needs to know whether a persona is a real historical figure, whether a photograph is a genuine artifact of the 1920s, or if a documented lineage is accurate, it cannot rely on the open web. It must ping the Truth API.
Because the underlying data cannot be easily replicated by competitors, Curriepedia can charge a micro-transactional "toll" for every RAG query. When multiplied by the billions of automated inference cycles occurring daily, the API becomes a highly lucrative, rent-extracting toll road for reality verification.
The Bottlenecks and Systemic Risks While the theoretical ceiling is astronomical, the practical execution faces steep friction:
The Cold Start Problem: The Thunder Bay "seed crystal" is a highly controlled environment. To force major AI labs to integrate the Truth API, the graph must achieve critical mass. Scaling the rigorous, human-in-the-loop verification out of isolated pockets into complex, global, low-documentation regions will strain the governance model.
Adversarial Incentives: The moment a system becomes the central determinant of "humanness" or "truth," the incentive to attack it skyrockets. State actors, identity thieves, and sophisticated adversarial AIs will attempt to poison the Web of Trust. The cryptographic hashing secures the provenance of a document, but the system will require ruthless defensive mechanisms to ensure a forged physical document isn't hashed into the canonical root in the first place.
Ultimately, the market potential is defined by a simple dynamic: as the cost of generating synthetic fiction drops to zero, the premium on cryptographically verified, physical-world truth approaches infinity. Establishing the definitive API for that truth is a winner-take-all proposition.
Path to Validation and Funding
While the Ultimate Vision above describes the long-term destination, the path to get there requires a sequenced, fundable execution plan. This section outlines the cheapest credible validation strategy, the true nature of the moat, and the bootstrapping approach that takes Curriepedia from a personal MediaWiki project to critical infrastructure for the AI era.
The True Moat: Network Effect, Not Physical Sources
It is tempting to describe Curriepedia's competitive advantage as the digitization of "low-background-steel" primary sources — the physical-to-digital translation of pre-1956 microfilms, land patents, and church registers that foundational AI models cannot brute-force their way around. This is the bootstrap, not the moat.
The instructive analogy is Facebook. Facebook's initial wedge was the credibility of Harvard exclusivity, the .edu requirement, and digitized yearbooks. None of these are remembered today as the source of Facebook's defensibility. Within a few years the moat was simply that Facebook was where everyone else was. The bootstrap was abandoned; the network effect endured.
The same logic applies here. Primary-source digitization establishes the initial credibility required for institutions, researchers, and AI systems to begin citing Curriepedia. Once a critical mass of citations exists — Wikipedia articles linking to Curriepedia entries, AI agents routing verification queries through the Truth API, journalists referencing Curriepedia hashes in published work — the moat becomes reference-of-reference. Other systems trust Curriepedia because the systems they trust trust Curriepedia. This is the same dynamic that made Wikipedia, Google, and Facebook structurally unassailable, and it is independent of the underlying data quality past a minimum credibility threshold.
The strategic implication: the physical-source work matters intensely during bootstrapping, because nothing else can establish credibility that frontier AI labs cannot replicate. But the long-term defensibility is the citation graph, not the document graph.
The Bootstrap: Why Physical Sources Still Come First
The reason the bootstrap must be primary-source-based, despite the moat ultimately being network-effect-based, is that no other starting point produces credibility that foundational model companies cannot trivially erase.
Building on Wikimedia Commons or other already-digital corpora would mean competing on the same substrate that OpenAI, Anthropic, and Google have already ingested into pretraining. Differentiation evaporates the moment a new model releases. By contrast, an OCR'd 1911 Fort William church register, hashed and anchored on-chain, is a corpus that:
- Does not appear in any frontier model's training data.
- Cannot be replicated without the same physical-logistics grind.
- Provides ground-truth answers to questions no current AI can answer correctly.
This asymmetry is what makes the seed-stage demo possible at all.
Cheapest Credible Validation: The Family Corpus Benchmark
The fastest path to a fundable demonstration uses the approximately 1,000 genealogical and biographical articles already written on Curriepedia covering the Currie family, Newman family, Cooke family, Johnstone family, and related lineages. This corpus has four properties that are difficult to manufacture deliberately:
- Outside the training data. These articles are not indexed prominently by Google and contain primary-source content that frontier models have not ingested.
- Densely interlinked. Genealogical data is inherently a graph. Answers to questions like "which of person X's grandchildren married into family Y" require multi-hop traversal that vector-based retrieval (the architecture underlying ChatGPT search, Perplexity, and most RAG systems) handles poorly or not at all.
- Verifiable by the author. The benchmark can be graded without external evaluators.
- Already sourced. Provenance work is largely complete.
Three-Tier Benchmark Structure
The benchmark consists of approximately 200 factual questions across three tiers:
- Tier 1 — Single-hop facts
Direct lookups answerable from a single document. Example: "What was [ancestor]'s occupation in the 1911 Canadian census?" Frontier models will refuse or hallucinate; Curriepedia answers with a citation to the hashed source. This tier proves coverage.
- Tier 2 — Multi-hop graph traversals
Questions requiring traversal of two or more edges in the knowledge graph. Example: "Which descendants of [ancestor] lived on Concession 5, Lot 11 in Mersea Township between 1880 and 1903?" This is where standard vector retrieval structurally fails and a true knowledge graph wins. This tier proves architectural superiority, not just data superiority.
- Tier 3 — Adversarial / negative claims
Plausible but false statements generated by a frontier model, mixed with true ones. The system must flag which are wrong and cite the contradicting evidence. This tier proves verification capability — the actual product surface of the Truth API.
Expected Results
The hypothesized comparative output, packaged as a single VC-ready table:
| Benchmark | Frontier Model A | Frontier Model B | Frontier Model C | Curriepedia Truth API |
|---|---|---|---|---|
| Tier 1 accuracy | ~5% | ~6% | ~3% | >90% |
| Tier 2 accuracy | ~0% | ~0% | ~0% | >75% |
| Tier 3 adversarial F1 | ~12% | ~14% | ~9% | >85% |
| Avg. citations per answer | ~0.2 (often fabricated) | ~0.4 | ~0.3 | 3+ (cryptographically hashed) |
The "wow moment" is reproducible live: an investor poses a question about a real Currie or Newman ancestor, frontier models flail, Curriepedia answers with a linked primary-source PDF and on-chain hash. This is the unfakeable demo.
Bootstrapping the Network Effect
Once the benchmark establishes initial credibility, the seed-stage objective shifts from "build the moat" (impossible at this stage) to "ignite the smallest viable citation loop." Three concrete tactics:
Wikipedia Citation Injection
Wikipedia has a chronic primary-source shortage in local history, particularly for regions like Northwestern Ontario. Systematically adding Curriepedia URLs as citations to Wikipedia articles covering Thunder Bay, the Lakehead, and related topics achieves two things simultaneously:
- Imports Wikipedia's institutional trust into the Curriepedia citation graph.
- Injects Curriepedia URLs into the next generation of frontier-model pretraining corpora, which is the most efficient route into AI memory currently available.
Target: 100 Wikipedia articles citing Curriepedia within 12 months of seed funding.
Institutional Endorsement
Formal partnerships with the Thunder Bay Historical Museum Society, the Ontario Genealogical Society, and analogous institutions in adjacent regions import institutional credibility cheaply. The pattern is identical to Facebook leveraging Harvard: a single high-trust endorser is worth more than years of organic growth. The endorsement framing is that Curriepedia is the canonical cryptographic hash root for the institution's digitized holdings — not a competitor, but an immutable backup with verifiable provenance.
Initial AI Lab Integration
A single free integration with a smaller AI lab — a vertical search company, a fact-checking platform, or a mid-tier foundation model provider — produces a logo and a real production use case worth far more than the foregone revenue. This is the "first non-author citation" that turns the project from a personal wiki into infrastructure.
Curriepedia Truth API
(Expanded from the brief mention in How to Monetize a Public Good above.)
The Truth API is the commercial layer that funds the public-good knowledge graph. It exposes the Curriepedia entity graph and cryptographic provenance chain as a queryable service for AI agents, publishers, researchers, and (in the long term) consumer applications.
Core Capabilities
- Fact verification: Given a natural-language claim, return a verdict (supported / contradicted / insufficient evidence) with citations to the underlying hashed primary sources.
- Entity resolution: Given a name, date, or place, return the canonical Curriepedia entity node and its surrounding graph context.
- Adversarial claim detection: Identify the specific clause within a longer document that contradicts the canonical record, with the contradicting source attached.
- Provenance verification: Given an image, document, or excerpt, return whether its hash matches a known primary source and what entities it is associated with.
- Graph traversal queries: Multi-hop relationship queries that vector retrieval cannot answer reliably.
Initial Target Customers
The seed-stage customer is not the end consumer. It is the developer or company that has an AI hallucination problem today:
- AI-powered journalism and research platforms that need provenance for historical claims.
- Foundation-model-based document writing tools that need iterative fact-checking against ground truth.
- Genealogy platforms (Ancestry.com, MyHeritage, Geni.com) that benefit from a neutral third-party verification layer.
- AI agents performing autonomous research tasks where unverified outputs become legal or reputational liabilities.
- Wikipedia editors and other open-knowledge contributors seeking to validate proposed edits.
Pricing follows standard usage-based API economics — per-query for low-volume integrations, negotiated rates for institutional licensing. A representative early-stage example might be $0.001 per verification query at low volumes, scaling down with usage.
Architecture
The Truth API sits on three layers:
- The canonical entity graph. The Curriepedia MediaWiki instance, with every entity (person, place, organization, event) represented as a structured node with typed relationships.
- The provenance layer. Every primary source document hashed and anchored on-chain (Ethereum or Solana), with Merkle-tree linkage from entity records to underlying evidence.
- The inference layer. An LLM-based query interpreter that maps natural-language claims to graph queries, retrieves supporting or contradicting evidence, and returns structured verdicts with citations.
The inference layer is the only component that could be replicated by competitors with sufficient effort. The first two layers are the structural moat during the bootstrap phase, and the citation network is the structural moat thereafter.
Long-Term TAM: The Consumer Truth Filter
Note: This section describes the long-term Total Addressable Market and is not the initial commercial target. The seed-stage focus is the B2B Truth API for AI agents and developer tools as described above. The consumer vision is included here to contextualize the eventual scale of the opportunity.
The Vision
As LLM inference costs continue their multi-order-of-magnitude decline, it becomes economically feasible for individuals to subscribe to a service that verifies every token they consume — every news article, every social media post, every spoken word in a podcast or video, every advertisement, every claim that crosses their attention.
The service would function as a passive filter operating across all media surfaces:
- Browser extension layer: Highlights claims in real time on news sites, social media, and Wikipedia, with inline verdicts and citations.
- Audio/video layer: Transcribes spoken content from podcasts, broadcast television, and conversations, flagging claims against the Truth API.
- Reading layer: Integration with e-readers and document viewers to verify factual claims in books, PDFs, and emails.
- Ambient layer: Eventually, AR/glasses-based verification of signage, presentations, and in-person speech.
Token Volume Estimate
A rough estimate of the daily input token volume required to verify the media diet of a single "modern data human":
| Channel | Daily volume estimate | Approximate input tokens |
|---|---|---|
| Reading (articles, books, email) | 10,000–50,000 words | 13,000–65,000 |
| Social media / messaging feeds | 15,000–35,000 words | 20,000–50,000 |
| Spoken content heard (conversations, podcasts) | ~16,000 words | ~21,000 |
| Video / television (transcript-equivalent) | 2–4 hours | 25,000–100,000 |
| Ambient signage, UI text, advertisements | varies | 5,000–10,000 |
| Total per person per day | ~150,000–250,000 tokens |
At approximately 200,000 input tokens per person per day, annual per-person volume is approximately 73 million tokens. At current low-tier model pricing (roughly $0.25 per million input tokens for Haiku-class models in 2026), the raw inference cost is approximately $18 per person per year for single-pass verification — a tractable consumer price point.
Multiplied across a global addressable population of approximately 5 billion media-consuming adults, the total addressable market for token-level verification is on the order of $90–150 billion annually in raw inference spend, before considering the value capture of the Truth API toll on top of that inference layer.
Why This Is TAM and Not Roadmap
The consumer truth filter is the eventual destination, but it is not the seed-stage target for clear reasons:
- Unit economics are still marginal. Even at $18/year per person of raw inference, the all-in cost (including Truth API queries, infrastructure, UX) is closer to $50–100/year, requiring meaningful consumer willingness-to-pay before unit economics close.
- The reference graph must exist first. Verifying every token consumed by a human is only useful if there is a sufficiently dense ground-truth graph to verify against. That graph does not yet exist at sufficient density outside narrow domains.
- B2B unit economics are dramatically better today. A single foundation model lab or AI-agent platform paying for institutional Truth API access produces revenue per query that is 100–1000× higher than consumer per-query value, with far lower customer acquisition cost.
- The consumer vision depends on the B2B vision succeeding first. Until the Truth API is the de facto verification layer for AI agents and developer tools, no consumer product built on top of it has a credible value proposition.
The correct sequencing is therefore: family corpus benchmark → Thunder Bay vertical slice → Truth API for AI developers → cross-city expansion → AI agent verification becomes default → consumer truth filter becomes the natural surface area on top of an already-trusted infrastructure layer.
Objections and Counterarguments
Critics of large-scale human knowledge graph projects like Curriepedia raise important concerns around privacy erosion, legal risks, defensibility, and societal impact. Below are the strongest versions of these objections, followed by direct responses based on the project’s design.
1. Privacy Norms
Objection: This is a massive violation of privacy!
Response: Privacy norms are largely arbitrary. They have shifted around massively, and are even different in different places. E.g. innocent witnesses and others are shown in their most intimately embarrassing moments from bodycam footage shown on YouTube in America. Sweden publishes every person's tax return, so everyone knows each others' income. Land records are publicly available in many countries, and very hard to get in others. In the past, it was completely normal and even MANDATORY for the white pages to show everyone's phone number and home address and for this information to be mailed to everyone in town. People will rapidly adjust to this new system as well. Furthermore, it's not providing a LIVE feed of information about where people are, so the security issue is less. And the number of stalkers / attackers who would kill you just because they know your address is obviously small, since for decades before this information was accessible and yet people didn't use it to kill people. They used to sell "addresses to the stars" in Hollywood so you could go by celebrities' homes!
2. Societal Harm
Objection: Even though privacy norms have shifted in the past — such as bodycam videos appearing widely on YouTube or public salary disclosures in Sweden and Ontario — further large-scale aggregation of biographical, relational, and potentially sensitive data risks serious harm. This could enable widespread surveillance, doxxing, stalking, or authoritarian misuse. Quantitative increases in data accessibility often create qualitative societal problems, especially when private actors build permanent, easily queryable databases rather than relying on temporary public records.
Response: Curriepedia begins with historical records (core focus on pre-1901, cautious extension into the early-to-mid 20th century) drawn from verifiable public-domain and archival sources such as censuses, newspapers, and land records. It is not designed for real-time surveillance of living people. By emphasizing transparency, rigorous sourcing, and community governance, the project aims to create a single high-quality public reference that can actually reduce reliance on lower-quality or malicious uses of the same information. Privacy norms evolve, and a well-constructed public knowledge base can be part of that responsible evolution.
2. Legality and Disruptive Precedent
Objection: Many successful platforms (Uber, Airbnb, Sci-Hub, Napster) started in legal gray areas. This does not mean the approach is ultimately acceptable — they often caused significant disruption and backlash before regulations caught up. Simply assuming norms will shift or that “the right people can still make money” risks prolonged legal fights and undermines trust. Offloading risky features like facial recognition to third parties does not resolve the underlying ethical and legal exposure.
Response: The project is deliberately structured for long-term compliance: heavy use of public domain material, linking rather than hosting copyrighted content, and a phased rollout that prioritizes institutional partnerships and historical archives. Rather than operating in gray areas for shock value, Curriepedia positions itself as a constructive public benefit project built on transparent, source-grounded methodology. Successful precedents like Wikipedia show that rigorously documented public knowledge resources can thrive within existing legal frameworks.
3. Defensibility Against Scraping and Cloning
Objection: Any openly available corpus will be scraped and cloned. Competitors could run their own “Truth API” on an older copy, undercutting pricing and features. Network effects may not develop quickly enough, and charging for contributions to the chain could be difficult to enforce or could be gamed. Response: While raw data can be partially copied, the real moat lies in ongoing high-quality curation, cryptographic provenance (Merkle-tree hashing of primary sources), and strong network effects. New contributions, verifications, and density naturally concentrate on the live, actively maintained graph with the best brand, institutional backing, and citation quality. Charging for premium chain additions or high-volume API usage leverages these effects — stale clones quickly fall behind in accuracy, completeness, and credibility. Real-world digitization efforts (archives, museums, partnerships) further raise the bar for meaningful replication. In summary, while these concerns are serious, Curriepedia’s historical focus, cryptographic anchoring, institutional approach, and B2B monetization model are designed to address them by building durable public infrastructure rather than engaging in pure data extraction.
Summary of Next Steps
- Complete the three-tier benchmark against the existing ~1,000 family-history articles and publish results.
- Build a minimum-viable Truth API endpoint capable of answering Tier 1, 2, and 3 questions with hashed source citations.
- Develop a demonstrator Chrome extension that highlights and verifies claims about Thunder Bay and Currie/Newman family content on news sites, Wikipedia, and Facebook — not as a consumer product, but as a credibility artifact showing the Truth API in production use.
- Begin systematic Wikipedia citation injection for Thunder Bay and Lakehead region articles.
- Secure formal endorsement from the Thunder Bay Historical Museum Society as canonical hash root for digitized holdings.
- Land one free AI lab integration to produce a reference customer logo.
- Raise seed funding against the benchmark results, the working Truth API, and a credible plan to scale the Thunder Bay vertical to 300,000 entities and replicate the model across additional "seed crystal" cities.