Discussion about this post

User's avatar
Jack Shanahan's avatar

This is very helpful, thanks.

As someone who’s generally optimistic about the integration of AI for national security, this post should be “must reading” for everyone considering the rapid integration of frontier models (or most LLMs generally) into military or intelligence operations.

Absent the kind of oversight and governance capable of addressing each of the five critical criteria, the current “go fast and break things” attitude is fraught. To say the least.

Om Prakash Pant's avatar

Retail deployments usually discover the accuracy-reliability gap after go-live, not before.

A product recommendation agent that hits 85% in demo looks deployable. Then in production the consistency failures start - same customer, same question, different answer on different days.

The calibration problem compounds it: agents that don't surface uncertainty push wrong answers confidently instead of handing off to a human. Neither shows up in POC evaluations because the benchmarks measure task completion, not how failures behave.

That's usually where retail AI projects quietly stall.

No posts

Ready for more?