The Department of Defense has moved from conceptual debate to concrete expenditure on so-called frontier AI. In December 2024 the Chief Digital and Artificial Intelligence Office, working with the Defense Innovation Unit, stood up an AI Rapid Capabilities Cell to accelerate generative and frontier model experimentation and announced roughly $100 million in initial resourcing for FY2024 and FY2025.

Those headline numbers mask how the money is being apportioned. The department earmarked roughly $35 million to four initial frontier pilots, about $20 million to cloud compute and sandbox capability, and roughly $40 million for small business innovation awards and related experimentation lines. These allocations tell us what the Pentagon values in the near term: short pilots to test concepts, a controlled development environment, and a deliberately broad supplier base to capture commercial innovation.

A companion data point from early 2025 illustrates how the department intends to operationalize that approach. DIU briefings and reporting show the launch of new prototype efforts such as Thunderforge, where the DoD has engaged commercial LLM and systems integrators to fold large language models, workflow agents, and simulation tooling into theater-level decision aids and planning systems. That programmatic arc moves the DoD from sandbox experiments to multi-vendor prototype contracts.

These budget lines are helpful because they are concrete. They are not sufficient. There is a predictable gap between prototype funding and the full lifecycle cost of fielded AI capabilities. For frontier AI pilots the hidden costs typically fall into several categories: compute at scale, data engineering and labeling, security and red teaming, human-in-the-loop interfaces and training, integration into legacy systems, audit and compliance, sustained vendor support, and operational risk mitigation. Each category can multiply the initial pilot price tag several times over when the DoD contemplates transition from prototype to program of record.

Compute is the most obvious multiplier. Sandbox and cloud credits cover the early experiments, but production-grade agentic workflows and continual fine-tuning demand persistent GPU and TPU capacity, specialized orchestration, high-availability infrastructure, and significant egress and storage costs. The DoD’s initial $20 million for sandboxes buys capability and time to learn. It does not buy multi-year inference fleets, isolated sovereign training environments, or the permanent edge compute required for distributed warfighting at scale. The arithmetic here is unforgiving: pilots expose requirements; scaling them imposes exponential additional cost.

Data and human capital are the second and third multipliers. Frontier models are only as useful as the data pipelines, labels, and domain adapters that make them relevant to military contexts. Curating training and evaluation data, securing classified data flows, building model evaluation suites that capture performance in adversarial conditions, and staffing the necessary subject matter experts and engineers is labor intensive. The DoD’s $40 million small business tranche signals an intent to harvest commercial ingenuity but cannot substitute for a sustained in-house engineering base or the cost of long-term contractor teams required to maintain and harden these systems.

Security, verification, and governance add both time and cost. The DoD is rightly investing in sandboxes and red teams to find vulnerabilities and biases early. But red teaming work is ongoing and often bespoke: it requires scenario design, adversarial model testing, formal verification where feasible, and penetration testing across supply chains. Mitigations discovered in pilots frequently trigger redesigns, additional controls, and re-certification steps that lengthen timelines and increase budgets. Responsible AI toolkits and policy work reduce risk, not price, and they create additional lines in the budget for oversight, auditing, and continuous monitoring.

There is also the economics of acquisition and vendor relationships. Short, high-ceiling pilot awards attract frontier vendors and prime integrators. They also create incentives for vendor lock-in if the DoD accepts proprietary model formats, inaccessible fine-tuning toolchains, or opaque evaluation metrics. The cost of unwinding such dependencies, or of re-implementing capabilities on competing architectures, is rarely captured in pilot budgets. Conversely, investing in open evaluation frameworks, model interchange formats, and modular interfaces increases upfront integration cost but reduces long-term fiscal risk.

Finally, opportunity and operational costs matter. A pilot that accelerates decision making in one domain may create downstream personnel changes, new training pipelines, and maintenance burdens. Conversely, if a pilot fails to meet reliability thresholds, the DoD incurs the expense of rollback, additional testing, and lost operational time. Short 90-day experiments can yield fast learning. They can also produce incomplete answers that require further investment to resolve.

What then should count as the honest budget for frontier AI pilots in the DoD? At minimum a credible cost estimate must include: the pilot execution cost, sandbox and compute provisioning for sustained validation, red teaming and security hardening, data engineering and labeling, workforce and training for operators and maintainers, and a transition reserve for acquisition and integration into existing programs of record. A useful rule of thumb from commercial and large-scale government software analogs is that prototype cost is often 10 to 30 percent of the initial total lifecycle cost when amortized over useful life. That implies the DoD’s $35 million pilot slice could imply hundreds of millions in follow-on investment if a capability is taken to scale across a combatant command or enterprise domain.

This prospect is not an argument against experimentation. It is an argument for honest accounting and governance. If the goal is to accelerate capability while protecting forces and citizens, the DoD must budget for the long arc of maturation rather than the euphoric sprint of proofs of concept. That requires transparent cost-benefit frameworks, publishable evaluation metrics, multi-year provisioning for compute and sustainment, investments in in-house talent to avoid permanent contractor dependency, and procurement structures that balance speed with portability.

The philosophical point is simple and persistent. Militaries do not buy software alone. They buy operational promises. Those promises require infrastructure, people, oversight, and time. The initial frontier AI pilots and the roughly $100 million commitment are the opening lines of a much longer fiscal narrative. How the DoD writes the rest will determine whether frontier AI becomes an enduring multiplier of advantage or a catalog of expensive, brittle experiments.

Practical next steps for policymakers and program managers include rigorous lifecycle costing of pilots before award, mandatory red-team and explainability milestones before any transition decision, explicit funding lines for sustainment and sovereignty of compute, and metrics that value resilience and auditability as highly as raw performance. If the DoD treats pilots as miniature programs of record in cost accounting and governance, the department will better manage both the promise and the price of frontier AI.

The technical and moral stakes are high. We can run the experiments. We must also count every cost as an act of strategic foresight rather than mere accounting. The difference will be the difference between durable advantage and expensive illusion.