AI Governance Forums and the Military Question: From Declarations to Duty

As of June 4, 2024, an unprecedented constellation of high‑level AI governance forums has begun to shape the strategic environment in which militaries operate. Nation states and major developers have met at Bletchley Park, in Seoul, and in bilateral fora to spitball principles, commit to safety testing, and establish nascent institutions for model evaluation. These gatherings have produced useful rhetorical consensus but also exposed deep practical tensions between public safety, operational secrecy, and the logic of competition.

The basic outputs of these forums matter for defence in several concrete ways. First, they signal the contours of acceptable behaviour for state and non‑state actors. The Bletchley Declaration and related chair summaries set out an ambition for shared, risk‑based approaches to frontier AI and for state‑led evaluation of powerful models before release. Such statements, even when non‑binding, create expectations that will inform alliance politics, export controls, and procurement requirements.

Second, fora like the Seoul summit have pushed companies toward voluntary safety commitments and toward cooperation with public testbeds and safety institutes. When developers agree to subject models to independent evaluation, they change the information environment on which defence planners rely. That said, voluntary commitments are uneven and subject to limits of competitive secrecy and commercial incentives. The Seoul commitments are a useful start, but they do not substitute for operationally relevant verification regimes.

Third, domestic governance instruments alter acquisition and fielding calculus. The United States Executive Order on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence and allied national actions create new regulatory expectations for documentation, risk assessment, and sharing of safety test data with government entities. Militaries will therefore face layered requirements: compliance with defence ethics and test regimes plus adherence to civilian regulatory frameworks governing the same technologies.

These normative developments intersect with existing military doctrine and ethics. The United States Department of Defense has had a set of AI ethical principles since 2020 and is pursuing an implementation pathway that emphasizes governance, trustworthy testing, and workforce adaptation. NATO and allied organisations have articulated similar principles for responsible use in defence contexts. Translating high‑level commitments into operational rules for weapons and support systems remains the central technical and legal challenge.

Beyond doctrine lie harder realities. The Science commentary by leading AI researchers highlights the possibility of rapid capability leaps and argues that governance is not keeping pace with technical risk. That argument cuts directly to the military domain. If frontier models acquire capabilities that materially affect command, sensing, or autonomy, states will confront a set of questions they are poorly prepared to answer: how to verify capability claims; how to test models under contested electromagnetic, cyber, and information environments; and how to trust outputs under adversarial manipulation. The paper therefore reinforces the need for robust, technically credible testing regimes and for governance that can respond to fast changes in capability.

A persistent tension runs through contemporary governance efforts. Public fora encourage transparency, yet operational security and competitive advantage push defence programmes toward classification. Similarly, private actors are incentivised to limit disclosure of model internals and training data while regulators and safety institutes ask for openness sufficient to evaluate risk. The result is an uneasy compromise: state‑led testing and safety institutes that perform examinations in trusted channels. These structures can work, but they must be designed to provide militarily relevant assurance without handing adversaries a road map to exploitation. The Bletchley discussions explicitly contemplated state‑led model evaluation and the creation of safety institutes as an intermediary step.

For militaries, the governance turn implies four operational priorities.

1) Invest in rigorous testing, evaluation, verification and validation capacity. AI safety forums are producing technical standards and shared testbeds. Defence organisations must be present inside those institutions and must build their own TEV&V capabilities that account for contested operational conditions and adversarial inputs. Public‑private test partnerships will matter only if their protocols are stress‑tested against realistic attack surfaces.

2) Operationalise ethical and responsible principles through procurement and contracts. High‑level principles from the DoD and allied bodies are necessary but insufficient. Ethics must be written into contracts as measurable requirements: traceability, explainability thresholds, failure‑mode documentation, and the right to independent verification. The DoD Responsible AI pathway already points in this direction, and defence acquisition authorities should harden these pathways into gating criteria.

3) Harmonise alliance standards where possible, and plan for strategic divergence where not. Forums like Bletchley and Seoul create political momentum for common standards. NATO partners and other allies should treat shared evaluation frameworks and compatible TEV&V as force multipliers. When allies diverge, interoperability and liability issues will follow, especially in coalition operations that integrate AI‑assisted decision aids or autonomous systems.

4) Retain a sober posture on arms race dynamics and dual use. Governance forums are not disarmament forums. They are places where states and firms define responsible behaviour. Militaries must therefore prepare both to adopt improved safety practices and to deter adversarial misuse. This requires investment in detection, attribution and resilience against AI‑enabled deception, misinformation, and autonomous weapons employment. The Science commentary on extreme risks underscores that prevention and resilience must proceed in parallel.

Finally, a normative observation. Governance fora have a civic function. They create public standards that, over time, can become expectations embedded in law and custom. Militaries frequently view such developments as constraints. They should instead view them as shapeable parameters. If defence establishments ignore forums and leave standard setting to technocrats or private industry, the resulting rules will be ill‑fitted to the realities of combat and command. Conversely, if militaries treat governance solely as a compliance burden, they will miss an opportunity to lead in building verification regimes and to demonstrate how responsibility and operational effectiveness can be mutually reinforcing.

In short, AI governance forums are more than theater. They are nascent instruments of global ordering that will affect the life cycle of military systems from lab to battlefield. The work ahead is not to choose between safety and advantage. It is to integrate safety‑first technical practices into military acquisition, to build alliance testbeds that produce actionable assurance, and to sustain diplomatic channels that reduce incentives for reckless capability races. Without that integration governance will remain a set of declarations rather than a durable architecture for limiting harm while preserving legitimate defence needs.