Humans required: Why AI coding still requires human experts to work most effectively
As agencies explore and adopt AI-assisted coding tools, there’s a growing body of research on how to use them most effectively. The findings are beginning to converge on a view that cuts against a prevailing assumption in how these tools get deployed.
The typical assumption is that AI agents are largely self-sufficient; point them at a codebase, give them a task, and they’ll figure out what they need. New research suggests this is incorrect, and incorrect in a specific way that matters, especially for organizations working with specialized, institutional knowledge—like government agencies.
Three recent studies tell a remarkably consistent story: the value that AI agents can deliver depends substantially on the quality of human-authored guidance given to them. Where that guidance is absent, or left to the AI agents to generate themselves, performance suffers. Where it’s present and well-crafted by people who actually understand the domain, the gains can be substantial.
A summary of recent research on agent instructions
A recent study by ETH Zurich and LogicStar.ai tested if AGENTS.md files, which guide AI coding agents, actually improve performance. Testing four agents on 138 GitHub issues under three separate conditions (no instructions, AI-generated instructions, and developer-written instructions), the results were mixed.
AI-generated files reduced task success by 3% and increased token usage by 20% due to being overly comprehensive and introducing redundant requirements. When other documentation was removed for the AI agent’s context, these instruction files became helpful, indicating the issue was redundancy. Developer-written files, which benefited from human editorial judgment on what to include and what to omit, offered a modest improvement in performance.
A second study, by Singapore Management University and Heidelberg University, focused more narrowly on the operational efficiency of AGENTS.md files. Researchers observed that AI tools using an AGENTS.md file finished tasks more quickly, supporting one finding from the ETH study that context files are referenced by AI tools, and can enable a faster commitment to a specific coding approach.
A third study by Stanford, CMU, Berkeley, and others examined AI “skills files”—which are used to share structured procedural knowledge with AI agents. Testing seven AI models on 84 tasks across 11 domains under three conditions (no skills, human-authored skills, self-generated skills), researchers found that curated, human-authored skills boosted task success by 16 percentage points on average.
However, the impact varied significantly across domains. And where models generated their own skills files, rather than humans generating or curating them, performance dropped by 1.3 points on average.
What this research tells us
These three studies, taken together, establish something that practitioners building AI-assisted workflows should take seriously: AI agents cannot generate the specialized knowledge they need to work effectively in specialized domains. Humans have to provide it.
This isn’t a criticism of AI capabilities in general. On tasks where models have strong training coverage—general software engineering, common programming patterns, widely-documented processes—the benefits of external guidance are muted because the models already know a good bit about what to do. The gap shows up at the edges of that coverage. And for many organizations, the most important work happens at exactly those edges.
Government agencies are a clear example. Benefits eligibility. Tax calculations. Licensing workflows. Regulatory compliance. Accessibility implementation. Security controls. These processes are highly specific to particular jurisdictions, subject to interpretation by domain experts, and often documented in ways that aren’t publicly indexed.
For example, an AI agent may know that FedRAMP authorization is required for cloud services or that NIST controls need to be implemented for a solution. But it won’t know exactly how your agency has interpreted those requirements, which controls you’ve implemented and how, what your current authorization boundaries look like, or where your specific security posture creates constraints that affect how code gets written and deployed. That institutional knowledge lives with your staff, not in an AI model.
These aren’t obscure edge cases; they’re the operating reality of government technology work, and they represent exactly the kind of human knowledge the research shows AI agents need to function effectively in specialized environments.
What good guidance looks like
The research also has useful things to say about how to write effective agent context files and skills files, and several findings run counter to widely held opinions and conventional wisdom.
More is not better. The skills benchmark study found that tasks with four or more skills files actually saw smaller gains than tasks with only two or three, and comprehensive skills documentation actually hurt the performance of AI tools rather than helping.
The ETH study found the same pattern in AGENTS.md files: instructions that aren’t essential add cognitive load that degrades AI tool outputs. The instinct when writing these files is to be thorough. The research we now have says that thoroughness can be counterproductive.
Orientation doesn’t help. A common recommendation for AGENTS.md files is to include a codebase overview—showing the agent the directory structure of a codebase, and describing what each component does. The ETH study tested this specifically and found that codebase overviews don’t help agents find relevant files any faster. Agents are already good at exploring code repositories. What they can’t infer is methodology: specific workflows to follow, constraints to observe, behaviors to exhibit in edge cases. That’s what the human-authored guidance should contain.
Self-generation is unreliable. Both the ETH study and the skills benchmark found that AI-generated guidance is either neutral or harmful compared to not providing guidance at all. The mechanism is the same in both cases: models can’t generate knowledge they don’t have, and they tend to fill the gap with plausible-sounding but imprecise procedures that agents then feel obligated to follow.
Human authorship isn’t just preferable; for specialized domains, it’s essentially the only way to produce guidance that improves the outputs of AI tools.
The implication for organizations using AI coding tools
Effective AI use, especially in specialized areas like government, hinges on human-authored guidance. This procedural knowledge is a necessary investment, not optional.
The challenge for organizations with deep, undocumented institutional knowledge is the real work required: identifying, encoding, and maintaining this domain-specific expertise for AI agents. This task falls to domain experts, typically not general technical staff or an AI tool itself.
However, the hidden opportunity is that this captured knowledge is extremely valuable. Studies also show that smaller AI models with quality, specialized “skills files” can outperform larger models without them. Organizations documenting their unique processes are effectively extending their AI capabilities, without waiting for AI tool updates to be released.
AI adoption discussions often focus on models, tools, or automation. Research suggests the most critical variable is none of these, but rather the specialized human guidance. It’s the quality of human knowledge captured in the guidance documents that most directly shapes how the AI tools perform.
That knowledge exists in every organization. The question is whether it gets used.
At Ad Hoc, we are developing techniques to employ AI-assisted coding tools most effectively to meet the needs of government agencies. These AI tools are powerful and getting better by the day, but adopting them successfully requires more than just access to the technology. To get the most out of them, agencies need partners who have deep experience not only in designing, developing, and deploying modern digital solutions, but also in leveraging new AI tools to accelerate and enhance that process responsibly.
Unlike firms that have a proprietary AI tool to license or legacy tooling and processes to rebrand, Ad Hoc’s focus is on your success. We don’t benefit when agencies simply consume more tokens without making real progress on their goals. Our expertise is, and always has been, helping our government partners build great digital solutions. That’s as true in the AI era as it has always been.
If you have questions or need help developing your agency’s strategy for using AI tooling, we can help.
Related posts
- Introducing the Research Thinking Field Guide
- The new Ad Hoc Government Digital Services Playbook
- The 21st Century IDEA Act Playbook Part 4: Prioritize modernization with user research
- How government platforms make AI coding assistants more effective
- Transforming a complex website into a consistent, efficient resource with AI-powered tools and Drupal
- Bringing service design to the VA digital services platform