Why Automated Accessibility Tools Find Different Issues
Not all automated accessibility tools are created equal. Some return false positives, others miss critical issues entirely, and no tool can catch everything. So why do the results vary so much, and what does this mean for your accessibility strategy? We’ll break it down in more detail below.
Author: Jeff Curtis, Sr. Content Manager
Published: 02/24/2026
)
Webpage featuring a property listing with multiple images, reserve button, and branding. Error icons next to tools labeled Tool A, B, and C are scattered around the page..
If you run the same webpage through two different automated accessibility tools, you’ll likely get two very different results. To understand why, it helps to understand how these tools work, because the variation isn’t random. It’s a direct result of differences in how these tools are built.
How Automated Accessibility Testing Works
Automated accessibility testing tools scan a webpage’s code and evaluate it against predefined rules, most of which are built around the Web Content Accessibility Guidelines(opens in a new tab) (WCAG). The process is essentially a series of binary checks: Does this image have an alt attribute? Does this button have an accessible label? Does this text meet the minimum color contrast ratio?
The internal logic that governs these checks is called a rule engine. Rule engines are what make automated testing fast and scalable. They can evaluate thousands of pages quickly. But they’re constrained to issues that are programmatically determinable: problems that can be identified by inspecting code alone. Whether an image’s alt text is actually descriptive, or whether a form is genuinely intuitive to navigate using a screen reader, falls outside what any rule engine can assess. Those questions require human judgment.
That limitation is fundamental to understanding both what automated tools are good at and where their results will naturally diverge.
Four Reasons Tools Return Different Findings
1. Rule Interpretation
WCAG success criteria aren’t always black-and-white. Some are precise and easy to automate consistently. Others involve judgment calls that different tool developers resolve differently, meaning two tools can both claim to test for the same criterion and still return different results on identical content.
2. DOM Rendering
Tools differ in how they interact with a page’s Document Object Model (DOM), the structure a browser uses to render a webpage. Some tools scan static HTML. Others simulate a browser environment to capture content that loads dynamically after the initial page render.
A tool that only reads static HTML may miss accessibility issues that only appear after user interaction: an error message triggered by a form submission, a modal that opens on click, or content loaded asynchronously.
3. Thresholds and Scoring
Different tools apply different thresholds for what constitutes a violation. Some are conservative, flagging potential issues even when the evidence is ambiguous. Others only flag clear, confirmed violations. This means a tool that returns fewer issues isn’t necessarily more accurate. It may simply mean the tool is less willing to flag ambiguous cases, and what it doesn’t report, your team won’t know what to fix.
4. WCAG Coverage Scope
No tool tests every WCAG success criterion. The criteria each tool includes — and excludes — directly shape what it can find. A tool with narrower coverage will consistently return fewer findings than one with broader coverage, even when both are scanning the same content under the same conditions. This is the ceiling problem: a tool can only find what it's designed to look for.
What Independent Research Reveals About Detection Variation
Independent research conducted by Adience in January 2026 evaluated five automated accessibility tools against identical content across six well-known websites.
The findings confirmed what the mechanics suggest: tools do not detect the same issues, even under controlled conditions.
Some tools returned no findings around certain WCAG conformance issues on pages where others identified multiple valid problems. The gaps were most pronounced at WCAG Levels AA, the conformance level most relevant to legal compliance and regulatory standards, and Level AAA.
The research also found meaningful differences in the breadth of WCAG success criteria that each tool could automatically detect, with the top-performing tool (AudioEye) identifying up to 2.5 times more unique criteria than the others. In practical terms, that means the ceiling of what any given tool can find is lower than what most teams assume, and that ceiling varies significantly depending on which tool you’re using.
The most important thing to understand: A tool that detects fewer issues doesn’t mean a more accessible site. It may simply not be finding as much.
What Automated Testing Can and Can’t Do
Even a tool with broad WCAG coverage hits a limit: it can only evaluate what’s in the code. It can confirm that alt text exists, but it can’t determine if that text is useful. It can flag a missing label, but it can’t tell you whether the error message that appears after a form is submitted actually helps users of all abilities correct the mistake.
That’s why automated testing is best understood as a first pass: efficient at finding issues at scale, but incomplete by nature. Manual testing by human experts, including people with disabilities using the latest assistive technology, is what validates whether content is genuinely usable in practice, and whether fixes applied to automated findings actually resolve the underlying barrier.
How to Evaluate Automated Accessibility Tools
Given that tools vary in rule interpretation, thresholds, and WCAG coverage, a few things are worth keeping in mind when comparing tools or interpreting results:
Look beyond issue counts: A tool returning fewer findings may simply be looking at less. When evaluating tools, look at the breadth of WCAG criteria each one covers, not just the volume of findings it returns.
Consider coverage across compliance levels: Not all tools test equally across WCAG conformance levels. Since Level AA is the benchmark used for most accessibility laws, it’s worth understanding where any tool’s coverage becomes inconsistent or drops off entirely.
Be skeptical of a clean report: A tool that returns very few findings on a complex page isn’t necessarily confirming that everything is fine. It may just be working with a narrower set of rules. Independent research like the Adience report demonstrates that detection gaps are real and measurable, even when tools are tested under identical conditions.
Don’t Just Find Issues. Understand Them.
Automated accessibility testing is only as valuable as the insights it provides. Knowing that tools detect different issues under identical conditions, and understanding why, changes how you interpret results, how you evaluate tools, and ultimately how confident you can be in your accessibility approach.
The goal shouldn’t be a clean report. It should be content that works for every user, regardless of how they access it.
If you’re ready to go beyond surface-level scanning, AudioEye combines powerful automation with expert human testing and continuous monitoring to identify 89-253% more WCAG issues, giving you a fuller picture of your accessibility gaps.
Ready to see AudioEye in action?
Frequently Asked Questions
Share Article
)
)
)