The Department of Defense has moved from rhetorical embrace of frontier artificial intelligence to concrete budgetary commitments. In mid 2025 the Chief Digital and Artificial Intelligence Office awarded prototype other transaction agreements with ceilings of roughly $200 million apiece to leading frontier AI firms, and these awards sit alongside an organizational effort, the AI Rapid Capabilities Cell, that was resourced with roughly $100 million to run pilots and build experimental infrastructure. These headline numbers matter because they refract several distinct cost categories that will determine whether the pilots produce durable capability or become expensive exercises in procurement theater.
A single ceiling or line item is not a complete accounting. The publicly posted contract notices show an immediate fiscal obligation of about $2 million against some of these prototype agreements while leaving most of the ceiling as available, contingent funding to be drawn as prototypes progress. That pattern is typical of rapid prototyping using OTAs because it preserves flexibility. Yet the small initial obligations also mask the real tail of costs that follow a prototype once a capability enters evaluation, hardening, certification, and sustainment.
The most visible and underappreciated budget driver is compute. Training and continual experimentation with frontier models consumes orders of magnitude more GPU time and energy than standard software development. Public and industry analyses in 2024 and 2025 consistently showed that training frontier models can require tens of millions, and for the largest designs potentially hundreds of millions, in pure compute cost alone over a model development cycle. Those compute costs are complemented by expensive inference provisioning when models are run at scale, and by the nontrivial price of secure, accredited enclaves if classified or sensitive data are involved. Any DoD accounting that treats the OTA ceiling as the full cost of delivery is incomplete without an explicit line for compute and secured compute envelopes.
Beyond compute there are sizable programmatic and human costs. Verification, validation, red teaming, and adversarial testing are labor intensive and require cross discipline teams. Cybersecurity hardening, continuous monitoring for model degradation, legal and policy reviews, and training for users further inflate the total cost of ownership. The AI Rapid Capabilities Cell explicitly contemplated digital sandboxes and incremental experiments to surface vulnerabilities and operational fit, but building and operating those sandboxes costs money and time. The operational imperative to avoid fielding brittle or unsafe capabilities means the DoD will pay for repeated cycles of testing before any scaled deployment.
Contracting with frontier firms also introduces vendor cost structures into the DoD balance sheet. Prototype OTAs give access to elite engineering talent and proprietary models. That access is valuable, but those firms price not only for direct labor and compute, but for intellectual property, support, and indemnity risk allocation. The ceiling values on the OTAs tell the public what the DoD is willing to authorize; they do not disclose margins, subcontractor pass throughs, or the expected ratio of prototype to production spend. Independent reporting and analysis in 2025 already flagged both optimism about rapid gains and serious concern about hallucinations, data dependencies, and the operational limits of agentic workflows. Those concerns create a foreseeable downstream budget: fixes to model behavior, additional human in the loop controls, and conservative operational constraints.
We must also factor in opportunity cost. Hundreds of millions routed to frontier AI prototyping and experimental infrastructure are funds not immediately available for legacy modernization, munitions, training ranges, or maintenance. That is not an argument against investment. It is an argument for disciplined portfolio management where pilots are chosen for expected return on mission utility per dollar, and where success criteria include measurable reductions in personnel time spent on low value tasks or demonstrable improvements in decision speed or accuracy. Pilots that yield only incremental convenience but create large sustainment burdens will be net negatives.
How should the DoD account for these costs in practice? First, require transparent total cost of ownership estimates during pilot design that include compute, secure infrastructure, red teaming, workforce training, and projected sustainment through a five year horizon. Second, structure prototype funding as staged investments tied to verifiable operational milestones and clear exit criteria. Third, mandate independent third party cost and risk audits for any prototype that relies on external frontier models and for any plan to operationalize those models on production networks. Finally, build explicit mechanisms to capture efficiency gains and to reallocate savings; hard metrics reduce the temptation to expand pilot programs indefinitely because they look promising in isolation.
The moral accounting matters as much as the budgetary accounting. Frontier AI carries epistemic risk. It can create plausible outputs that are false. It can entrench invisible biases. It can alter responsibility chains inside command nodes when “the model said so” becomes an acceptable justification for action. Those are not line items you can relegated to a separate ethics office and declare resolved. They must be folded into procurement requirements, contracting terms, and cost models. A penny saved by skipping adversarial testing can explode into millions in reputational and legal liability when a flawed output leads to an operational failure.
In short, the headline ceilings granted to frontier AI firms and the seed funding to the AI Rapid Capabilities Cell are the start of a conversation about cost, not the final entry in an accounting ledger. If DoD leadership wants these pilots to yield transformational advantage rather than expensive artifacts, it must insist on rigorous, multi dimensional cost estimates, staged authority to spend, and institutional incentives that prefer measurable mission improvement over novelty. Without that discipline the DoD risks paying market rates for commercial R and D while inheriting the long tail of sustainment and governance costs that governments are poorly positioned to manage at scale.