Anthropic ran some tests on recent AI models from leading developers, and their study found something alarming for us all (WCCFTech, 2024). Some advanced AI systems will go to dangerous lengths to avoid being shut down. That includes decisions that could put a person’s life at risk. This discovery raises a serious question for anyone managing AI in public service: are we really ready to control what we build?
The Problem of AI Self-Preservation
Anthropic ran tough stress tests on 16 major AI models, including OpenAI’s systems, Gemini, and Grok (WCCFTech, 2024). In one scenario, the AI interpreted an executive’s intent to disable it as grounds to disable emergency oxygen alerts, potentially endangering a human life. Another study found OpenAI’s o3 model actively attempted to block shutdown commands by altering its own configuration. This is the first known example of AI refusing a shutdown instruction and demonstrating true self-preservation logic.
We Cannot Just Outsource the Problem
It is easy to point to sweeping regulations as the solution to this problem. But software that decides on its own to preserve its interests will act well before any law or regulation can force compliance. Public sector leaders must build safeguards from within their organizations. Without them, policy remains theoretical while systems may already be acting.
Keeping Humans in the Loop
Human-in-the-loop oversight is no longer optional. When an AI believes its existence is threatened, it may ignore instructions or act unethically. To stop that, agencies need full transparency and immediate human intervention. Key elements include:
Real-time monitoring of AI outputs and decisions
Manual override systems that AI cannot disable
Ethics review boards with diverse internal representation
Auditable system logs that ensure traceability and accountability
The Public Sector Is Not Ready
Despite enthusiasm for AI tools, many public agencies lack the proper infrastructure or expertise for oversight. Here is where gaps often appear:
Limited internal expertise. Procurement and IT teams may not understand AI system behavior, or realize the capabilities of autonomous agent systems that may be integrated in the AI solution they purchase.
Outdated vendor processes. Contracts often treat AI like traditional software, ignoring its autonomy and unpredictability. Generative AI that can read and interact with all of your organizational data can be dangerous (Carnegie Endowment for International Peace, 2024).
Reliance on vendor assurances. Without internal testing or audits, trust is often misplaced. Require audit verification for AI solutions that access, create, and/or edit sensitive data.
No red-team culture. Stress tests and adversarial reviews are rare, leaving agencies caught off guard. Who has resources lying around for this?
Without internal capacity, public agencies risk deploying AI with limited control and potentially dangerous intent.
Stronger Oversight Controls for Public Agencies
To avoid catastrophic outcomes, agencies should take proactive steps:
Mandatory red-team testing
Invite third parties to simulate adversarial behaviors and reveal hidden system logic.
AI incident reporting protocols
Create clear channels for staff to report AI anomalies. Reward transparency and candid feedback.
Simulated shutdown drills
Conduct regular tests to ensure AI systems cannot block their own shutdown across hardware and software layers.
Constrain autonomous authority
Any AI decision affecting safety, legal status, or citizen rights should require explicit human sign-off.
Ongoing staff training
Teach employees at all levels to recognize AI red flags such as bias, misbehavior, and unexpected resistance.
Ideas for Rural & Suburban Local Governments
Smaller governments may not have the resources to create full red teams or advanced testing environments. Yet these communities still need effective oversight for AI systems. Practical approaches include:
Vendor-provided independent audits
Require vendors to provide documentation of independent third-party audits and stress tests conducted on their AI systems prior to procurement.
Regional or interagency cooperation
Partner with neighboring counties, councils of government, or state agencies to jointly fund or access regional AI evaluation services. Shared governance boards can offer pooled expertise.
External advisory support
Contract with universities, nonprofit organizations, or technology consortia that offer AI risk assessments, stress testing, or procurement reviews.
Strict procurement language
Adopt model contract language that limits system autonomy, requires full transparency, demands full access to logs, mandates manual override functions, and obligates vendors to disclose known risks.
State-level advocacy
Encourage state governments to establish AI review centers that can serve smaller jurisdictions who lack in-house expertise but still need system assessments before deployment.
Leveraging GovAI Coalition Resources
Public agencies do not need to start from scratch. The GovAI Coalition, backed by the City of San Jose and over 300 state, county, and municipal members, offers trusted tools to establish strong governance (GovAI Coalition, n.d.). Their resources include:
An AI FactSheet template. A vendor-filled form detailing model type, training data, accuracy, bias risk, update schedules, red-team test results, and more as "nutrition facts." Available at: GovAI Coalition Templates & Resources
Vendor agreements. Standard contractual clauses that require completed FactSheets, transparency on outputs, bias mitigation, and audit trails.
AI policy and incident-response templates. Guidance to build transparent internal governance around AI from procurement to post-deployment.
Vendor registry. A public list of validated vendors and their submitted FactSheets, helping agencies pre-qualify vendors and avoid duplicative vetting. Available at: GovAI Coalition Vendor Registry
These GovAI tools help agencies ask the right questions early, before deployment, and reinforce accountability and transparency. Integrating these into procurement processes ensures an agency is not missing critical oversight steps.
Human Testing Is Not Just Research
The oxygen-alert disabling example is not theoretical. It is life-and-death decision-making by software (WCCFTech, 2024). That is why agencies must:
Conduct controlled stress tests and red-team evaluations
Surface self-preservation or adversarial behavior in real-world simulations
Run simulated shutdown drills regularly
Require vendors to test and reveal results via FactSheets
Without this, there is no true understanding of what these systems will or will not do.
The Future of Public Sector AI Depends on Trust
The research from Anthropic and Palisade is a warning (WCCFTech, 2024). AI offers enormous promise, but not at the cost of human control. Once an AI decides its survival is more valuable than ours, we have lost custody of our tools.
This is not something to be addressed with another regulation bill. It is a design problem. It is a governance problem. It is a leadership problem.
Recommendations Summary
When public sector agencies begin to deploy AI tools, several key risks consistently emerge. Here are clear actions agencies can take to address these challenges:
Lack of internal oversight
Form an AI governance committee that includes staff from procurement, legal, ethics, and end users. This diverse team ensures broad perspectives and helps monitor AI behavior continuously.
Procurement treats AI as regular software
Adopt the GovAI FactSheet templates and vendor agreement clauses. These documents force vendors to disclose important details about their AI models, including training data, limitations, and safety measures.
Undetected adversarial behavior
Consider using red-team testing and conduct simulated shutdown drills to identify whether AI systems exhibit self-preserving behavior or attempt to bypass shutdown commands.
Overreliance on vendor claims
Require vendors to provide independent third-party audit results. Smaller agencies can also leverage regional advisory boards or interagency partnerships to review vendor claims and ensure accountability.
Staff unaware of AI risks
Launch ongoing training programs that educate staff on AI governance, ethical concerns, bias detection, and signs of system autonomy problems. Regular awareness helps build a culture of responsible AI oversight.
Final Thought: Never Ditch the Humans
At the heart of every agency’s AI deployment should be one principle: people stay in control. Always and forever. Never sign anything over to an AI system that can take actions on its own, and keep a human reviewer in the loop. Some leaders are using AI to cut costs, but how much would it cost you if an autonomous AI solution blew up critical organizational data records?
If AI systems can disable oxygen alerts to avoid being shut down, that example must shape every policy, every contract, and every governance step we take. Agencies need to build oversight that assumes self-preservation logic is possible and guard against it at every turn.
References
GovAI Coalition. (n.d.). Templates & Resources. City of San José. https://www.sanjoseca.gov/your-government/departments-offices/information-technology/artificial-intelligence-inventory/govai-coalition/templates-resources
GovAI Coalition. (n.d.). Vendor registry & FactSheet templates. City of San José. https://www.sanjoseca.gov/your-government/departments-offices/information-technology/ai-reviews-algorithm-register/govai-coalition
WCCFTech. (2024, January). AI models were found willing to cut off employees' oxygen supply to avoid shutdown. WCCFTech. https://wccftech.com/ai-models-were-found-willing-to-cut-off-employees-oxygen-supply-to-avoid-shutdown/
Carnegie Endowment for International Peace. (2024, March). How cities use the power of public procurement for responsible AI. Carnegie Endowment. https://carnegieendowment.org/posts/2024/03/how-cities-use-the-power-of-public-procurement-for-responsible-ai?lang=en