Tech
How Jose M. Plehn Plans to Make AI Accountable
Artificial intelligence keeps producing spectacle—chatbots that sound human, models that write code, agents that reason across domains. But beneath the fireworks is the problem that actually determines whether any of this is durable: can we trust the data these systems are built on? For Jose M. Plehn, Ph.D.—an academic turned data entrepreneur—the answer will decide whether AI strengthens democratic institutions or corrodes them. Plehn, the founder of BrightQuery, the creator of OpenData.org, and a board member of the AI Alliance, argues that the future of AI won’t be won by whoever stacks the most parameters, but by whoever can prove that their data is real.
He likes to put it bluntly: “AI will only be as honest as its sources.” BrightQuery, which he launched in 2019, was built around that premise. The company ingests legal, regulatory, and tax filings from more than 100,000 jurisdictions worldwide and turns them into structured, machine-readable datasets. What emerges is a high-fidelity economic graph linking hundreds of millions of entities—companies, individuals, places—anchored in verifiable public records. It is exactly the kind of factual infrastructure large language models now need if they are to reduce hallucinations, mitigate bias, and produce answers that can be cited rather than merely asserted.
At the center of what Plehn is advocating is provenance: knowing where data came from, how it was transformed, under what authority it was published, and what its permissible uses are. Without provenance, even highly capable models operate in what he calls an “epistemic fog.” His view tracks with emerging global norms, from the FAIR principles—Findable, Accessible, Interoperable, Reusable—to the data-provenance frameworks promoted by groups such as the Data & Trust Alliance and OASIS Open. It’s an insistence on chain-of-custody thinking applied to information.
“Every claim made by a model should be traceable to a verifiable record,” he has said in public forums. BrightQuery’s collaboration with the National Secure Data Service (NSDS), a U.S. federal effort to make government data secure, shareable, and usable across agencies, is meant to operationalize that idea. If federal data can be shared with provenance intact, then AI systems built on top of it can be audited, reproduced, and ultimately trusted.
By 2024, as the industry was busy fighting over what counts as “open” and whether closed-weight models could ever be trustworthy, the Open Source Initiative stepped in to clarify that openness in AI must include transparent datasets and metadata about their origins. Plehn’s thinking slots neatly into that shift. Within the AI Alliance, he has helped shape the Open Trusted Data Initiative (OTDI), which convenes IBM, Meta, Hugging Face, and other players to build datasets with lineage that can be inspected and licenses that are unambiguous.
What’s taking shape is an AI paradigm that prioritizes factuality—systems that can cite, verify, and justify their outputs. That’s subtly different from the prevailing “responsible AI” talk, because it makes responsibility testable. “If you can’t trace the data,” Plehn argues, “you can’t trust the result.” A model that can not only tell you what it concluded but also where the underlying data came from is far more compatible with public oversight, procurement rules, and journalistic scrutiny.
Then, in 2025, Plehn pushed the idea further with OpenData.org, a nonprofit initiative that releases portions of BrightQuery’s global entity graph for open use. He could have walled it off and sold access to the highest bidder. Instead, he framed verified data as civic infrastructure. Journalists, researchers, NGOs, and policymakers can now tap the same bedrock factual records that regulators and major financial institutions rely on. During the launch, Plehn said that “open data is just as vital to democracy as the right to free speech”—a line that makes clear he sees information provenance not as a technical nicety, but as part of democratic practice.
OpenData.org is already feeding into collaborative efforts with the United Nations Global Network of Data Officers and Statisticians, which is trying to raise data-capacity standards across governments. The shared premise: you cannot have trustworthy AI in a vacuum. You need shared, inspectable evidence that everyone—public agencies, watchdogs, the press—can see.
Part of why Plehn can move in these different spheres is that his career spans them. His doctoral work was in computational economics; he later taught quantitative modeling at MIT, UC Berkeley, and UCLA before shifting to applied data systems. That mix of theory, pedagogy, and practice makes him unusually fluent in the technical, ethical, and regulatory layers of AI governance. In Washington, he works with the Data Foundation, a nonpartisan organization that promotes evidence-based policymaking and the better use of federal data.
At a 2025 Data Foundation panel, Plehn offered what might be the cleanest expression of his worldview: “AI without verified data is like science without peer review.” It is an argument for institutional rigor over hype. Scientific claims are credible because they can be checked; AI claims, in his view, should be held to the same standard. That is how, he says, you preserve the “health of our digital society.”
If BrightQuery and its public-interest offshoots succeed, provenance will stop being an afterthought and become a first-class property of AI pipelines. Every dataset, every model, every synthetic answer would carry an auditable chain of custody. That vision overlaps with IBM’s and Meta’s AI Alliance work on standardized data cards, transparency metrics, and lifecycle audits, as well as with the research underway at MIT’s Data + AI Lab on explainability and factual verification.
Skeptics will note that this is hard to do on a global scale. Regulatory filings arrive late and in different formats; some jurisdictions barely publish at all; and the notion of fully comprehensive coverage is still aspirational. Plehn doesn’t deny any of that. His strategy is incremental: make provenance useful, make it convenient, and make it economically advantageous. Once that happens, he thinks, the market and the public sector will begin to expect it—even demand it.
As AI matures, the conversation is drifting away from “How big is your model?” to “How strong is your evidence?” The most durable players over the next decade may not be the ones with the flashiest generative demos, but the ones who can prove their systems’ claims. What Plehn is demonstrating through BrightQuery and OpenData.org is that transparency can scale—that the same verified data can serve commercial analytics and also fuel public-interest journalism, watchdogging, and policymaking.
Plenty of people talk about “trustworthy AI.” Very few build the plumbing that trust actually rests on: clean inputs, documented provenance, repeatable queries, audit trails. Jose M. Plehn is trying to build exactly that. If he’s right, the next social contract between data, algorithms, and democracy won’t be written in code first—it will be written in records we can all verify.