11.7 C
London
Thursday, May 7, 2026
Home AI Spooked by Mythos, Trump suddenly realized AI safety testing might be good
spooked-by-mythos,-trump-suddenly-realized-ai-safety-testing-might-be-good
Spooked by Mythos, Trump suddenly realized AI safety testing might be good

Spooked by Mythos, Trump suddenly realized AI safety testing might be good

4
0

This week, the Trump administration backpedaled and signed agreements with Google DeepMind, Microsoft, and xAI to run government safety checks on the firms’ frontier AI models before and after their release.

Previously, Donald Trump had stubbornly cast aside the Biden-era policy, dismissing the need for voluntary safety checks as overregulation blocking unbridled innovation. Soon after taking office, he took the extra step of rebranding the US AI Safety Institute to the Center for AI Standards and Innovation (CAISI), removing “safety” from the name in a pointed jab at Joe Biden.

But after Anthropic announced that it would be too risky to release its latest Claude Mythos model—fearing that bad actors might exploit its advanced cybersecurity capabilities—Trump’s suddenly concerned about AI safety. According to White House National Economic Council Director Kevin Hassett, Trump may soon issue an executive order mandating government testing of advanced AI systems prior to release, Fortune reported.

In CAISI’s press release, the center acknowledges that the voluntary agreements signed by Google, Microsoft, and xAI “build on” Biden’s policy. Celebrating the new partnerships, CAISI Director, Chris Fall, did not mention Mythos but promised that the “expanded industry collaborations” would help CAISI scale its work “in the public interest at a critical moment.”

“Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications,” Fall said.

To date, CAISI said it has completed about 40 evaluations, including those of frontier models that have yet to be released. When conducting tests, CAISI frequently gains access to models with “reduced or removed safeguards,” which CAISI said allowed them to more “thoroughly evaluate national security-related capabilities and risks.”

Through the evaluations, the government will also gain a better understanding of model capabilities, CAISI claimed. And to ensure that evaluators understand top national security concerns as they emerge across government, a “group of interagency experts” has formed a task force “focused on AI national security concerns,” CAISI said.

Some firms that have signed agreements have signaled confidence in CAISI’s testing plans. On LinkedIn, Tom Lue, Google DeepMind’s vice president of frontier AI global affairs, said he was “pleased” with CAISI’s testing plans. In a blog, Microsoft said that “testing for national security and large-scale public safety risks necessarily must be a collaborative endeavor with governments, while crediting the expertise “uniquely held by institutions like CAISI” to conduct such testing. xAI, which is currently fighting against OpenAI in a trial over which firm’s leaders care more about AI safety, did not immediately respond to Ars’ request to comment.

However, critics aren’t sold on the government’s plan to vet models and are increasingly dubious of firms whose AI model designs are largely kept secret.

Critics suggested that CAISI may lack the funding or expertise to evaluate frontier AI models. And as Trump seemingly suspects, seeking voluntary commitments from AI firms may not create the kind of day-to-day transparency the public needs about frontier AI risks, critics have warned. Further, any politicization of the evaluation process—like opposing the release of models whose outputs disfavor a certain administration’s political views—could decrease trust in AI. Unchecked, that could ultimately dissuade firms from signing agreements, since increasing trust is supposedly a key motivator driving the latest attempt at government collaboration.

Nobody knows what “safe” means

In its rush to announce its partners, CAISI did not specify the testing standards that will be used for evaluations.

That could be a problem, according to a LinkedIn post from Devin Lynch, a former director for cyber policy and strategy implementation at the White House Office of the National Cyber Director:

“Pre-deployment evaluations with frontier labs are exactly the kind of public-private collaboration needed to build trust, safety, and security into AI. The harder question is what ‘evaluation’ actually means at the frontier. Capability assessments are only as good as the threat models behind them. Our research on the AI tech stack finds that the Governance layer—standards, audits, liability frameworks—remains the least mature but most essential. CAISI will need to define, and publish, what it’s testing for, not just who it’s testing with.”

In a statement provided to Ars, Sarah Kreps, director of the Tech Policy Institute at Cornell University, said that AI firms should be developing closer ties with the government as AI advances. However, “the definition of ‘safe’ is contested” and “once you build a government vetting process for technology, you get the good with the bad,” she said.

Without defining standards, “the process can be politicized,” Kreps said. That risks creating a system where “whoever holds power gets to shape how the vetting works.”

So far, neither the Biden nor the Trump administrations has figured out how to avoid that, Kreps said.

Fears of government controlling AI outputs

Microsoft’s blog said that “CAISI, Microsoft and NIST will collaborate on improving methodologies for adversarial assessments,” which suggests that the plan is to develop these standards on the fly. According to Microsoft, “testing AI systems in ways that probe unexpected behaviors, misuse pathways, and failure modes” is “much like stress-testing whether airbags, seatbelts, and braking systems work effectively and reliably in safety-critical driving scenarios.”

But Gregory Falco, a Cornell University assistant professor of mechanical and aerospace engineering and expert in tracking governance of AI, insists that there’s a better way.

“Government oversight of AI cannot simply mean political review of model outputs, nor should it become a mechanism for deciding whether a model says favorable or unfavorable things about a president or administration,” Falco said.

Rather than relying on a politicized government leveraging evaluations to control the AI systems that the public uses, the US could build “some form of independent audit,” Falco said.

Imagine, Falco suggests, if AI firms understood that their models could be audited at any point, how much more accountability and discipline might such a system create? Operating similarly to the Internal Revenue Service (IRS), a rigorous AI audit system could create “real consequences for reckless deployments,” Falco said. For AI firms facing such consequences, the pressure would be on to ramp up internal AI safety testing, Falco suggested.

That seems like the “only viable path,” Falco said, since “the federal government does not currently have the in-house technical expertise, infrastructure, or day-to-day insight needed to directly evaluate these systems on its own.”

Rumman Chowdhury, an AI governance consultant and founder of Humane Intelligence, similarly criticized CAISI’s preparedness. Chowdhury told Fortune that “current White House efforts to offer ‘sensible oversight’ over frontier AI models may sound good, but the devil is in the details.”

“It depends on their interpretation of these words,” Chowdhury said. “Evaluations are a policy tool, they are not actually data-driven. My concern is that this is another political tool that the administration wants to own and wield.”

CAISI may lack funding

As for funding, Congress in January approved up to $10 million to expand CAISI, Fortune reported. However, conservative think tank America First Policy Institute conducted a recent analysis finding that “CAISI remains underfunded compared with peer institutes internationally and lacks ‘appropriate funding.’”

To critics, the CAISI testing plan may not go far enough to protect the public from the most unforeseeable AI risks. Falco maintains that only independent audits can spare the public from the worst outcomes.

“The danger is that government oversight becomes political, performative, or captured by the companies it is supposed to evaluate,” Falco said. “The opportunity is to build a practical audit framework that lets the US remain the global leader in AI while creating credible accountability around the most consequential risks.”

To Lynch, the bigger test may be whether Trump’s testing plan succeeds in its mission to evade risks and stoke more trust in AI systems, while keeping a light touch to avoid overregulating firms.

CAISI “is building something important here,” Lynch said. “The test will be whether these collaborations ignite innovation, protect national security, and produce AI that is both trusted and trustworthy.”