
13% of WCAG 2.2 AA criteria (7 of 55) can be flagged with mostly accurate detection — these are technical, measurable criteria like color contrast, page titles, and HTML validation where scans rarely produce false positives, though human review is still needed to verify context and implementation.
45% of criteria (25 of 55) are partially detectable — scans can detect part of a WCAG success criterion, but can easily miss a crucial component of conformance.
42% of criteria (23 of 55) cannot be detected at all — these involve subjective evaluation of quality, meaning, and user experience that automated tools cannot assess.
Thus, automated scans — whether they include AI or not — are helpful in instantly flagging several potential issues, but are not conclusive of accessibility and require review. Obviously, scans can not replace an accessibility audit.
Accessibility scans are very limited. The problem is many people think scans are audits or “ADA compliance checkers.”
They’re not.
Scans are software that can flag several accessibility issues for review.
So exactly how many accessibility issues can scans catch?
Scans can flag several issues, but a lot of times they’re only partial flags. And we still need to manually review everything because scans aren’t conclusive. Here’s a complete breakdown of the issues (WCAG success criteria) that scans can flag:
| Success Criterion | Detection Category | Explanation |
|---|---|---|
| 1.1.1 Non-text Content | Partially Detectable | Scans can identify missing alt attributes on images, but cannot evaluate whether the alt text is meaningful, accurate, or appropriate for the context. Human review needed to verify quality. |
| 1.2.1 Audio-only and Video-only | Not Detectable | Cannot automatically verify if transcripts exist, are accurate, or are properly linked. Requires manual checking of media content and associated text alternatives. |
| 1.2.2 Captions (Prerecorded) | Not Detectable | Tools cannot verify caption presence, accuracy, synchronization, or completeness. Manual review of video content required. |
| 1.2.3 Audio Description | Not Detectable | Cannot automatically determine if audio descriptions are present or if visual information is adequately conveyed through audio. |
| 1.2.4 Captions (Live) | Not Detectable | Live caption quality, accuracy, and presence cannot be tested automatically. Requires real-time human evaluation. |
| 1.2.5 Audio Description (Prerecorded) | Not Detectable | Tools cannot verify if audio descriptions exist or evaluate their quality and completeness for video content. |
| 1.3.1 Info and Relationships | Partially Detectable | Can identify some structural issues like missing form labels, improper heading hierarchy, and missing table headers, but cannot verify all semantic relationships are properly conveyed. |
| 1.3.2 Meaningful Sequence | Partially Detectable | Can detect DOM order and compare to visual presentation, but cannot determine if the sequence actually preserves meaning. |
| 1.3.3 Sensory Characteristics | Not Detectable | Cannot identify instructions that rely solely on shape, size, visual location, orientation, or sound. Requires human evaluation of content. |
| 1.3.4 Orientation | Partially Detectable | Can detect if content adapts to orientation changes, but may miss specific functionality that becomes unavailable or edge cases. |
| 1.3.5 Identify Input Purpose | Mostly Accurate | Can verify autocomplete attributes are present and properly formatted, though human review needed to confirm values match actual field purposes. |
| 1.4.1 Use of Color | Not Detectable | Cannot determine if color is the only method used to convey information, indicate actions, or distinguish elements. Requires visual inspection. |
| 1.4.2 Audio Control | Partially Detectable | Can identify autoplaying media elements but cannot verify if proper pause/stop controls are available and functional. |
| 1.4.3 Contrast (Minimum) | Mostly Accurate | Can measure contrast ratios accurately for standard text/background combinations, but may miss text over images, gradients, or dynamic content. |
| 1.4.4 Resize Text | Partially Detectable | Can test zoom functionality up to 200%, but may miss content overlap, truncation, or functionality loss in specific contexts. |
| 1.4.5 Images of Text | Partially Detectable | Can identify some images containing text through OCR or file analysis, but may miss decorative text or complex graphics. |
| 1.4.10 Reflow | Partially Detectable | Can detect responsive behavior at different viewports, but cannot fully verify no content or functionality is lost during reflow. |
| 1.4.11 Non-text Contrast | Mostly Accurate | Can measure UI component contrast ratios reliably for standard elements, though complex graphics and states need verification. |
| 1.4.12 Text Spacing | Partially Detectable | Can determine text spacing adjustments and check for visible issues, but may miss subtle content loss or overlap. |
| 1.4.13 Content on Hover or Focus | Partially Detectable | Can detect tooltip and hover content presence, but cannot fully verify dismissibility, persistence, and hoverable characteristics. |
| 2.1.1 Keyboard | Partially Detectable | Can identify some keyboard accessibility issues like missing focus handlers, but cannot test all interactive functionality. |
| 2.1.2 No Keyboard Trap | Partially Detectable | Can detect some keyboard traps in common patterns, but requires manual testing to verify all navigation paths. |
| 2.1.4 Character Key Shortcuts | Not Detectable | Cannot automatically identify single-character keyboard shortcuts or verify if they can be disabled or remapped. |
| 2.2.1 Timing Adjustable | Partially Detectable | Can identify session timeouts and time limits, but cannot verify if proper extension or adjustment mechanisms exist. |
| 2.2.2 Pause, Stop, Hide | Partially Detectable | Can detect moving, blinking, or auto-updating content, but cannot verify if adequate controls are provided. |
| 2.3.1 Three Flashes | Partially Detectable | Can identify potential flashing content and measure flash rates, but may require manual verification for edge cases. |
| 2.4.1 Bypass Blocks | Mostly Accurate | Can reliably detect skip links or landmarks presence, though functionality and correct targeting require verification. |
| 2.4.2 Page Titled | Mostly Accurate | Can verify title element exists and identify duplicates, but cannot evaluate if titles are meaningful and descriptive. |
| 2.4.3 Focus Order | Partially Detectable | Can detect tab order and compare to DOM order, but cannot determine if the sequence is logical for understanding. |
| 2.4.4 Link Purpose (In Context) | Partially Detectable | Can flag suspicious link text like “click here” or “read more,” but cannot evaluate if context provides clarity. |
| 2.4.5 Multiple Ways | Not Detectable | Cannot automatically determine if multiple navigation methods exist (search, sitemap, navigation menu, etc.). |
| 2.4.6 Headings and Labels | Partially Detectable | Can identify missing headings and labels, but cannot evaluate if they accurately describe content or purpose. |
| 2.4.7 Focus Visible | Partially Detectable | Can detect if focus indicators exist and measure visibility, but may miss custom implementations or specific states. |
| 2.4.11 Focus Appearance | Partially Detectable | Can measure focus indicator size and contrast, but requires manual verification for all interactive elements. |
| 2.5.1 Pointer Gestures | Not Detectable | Cannot identify multi-point or path-based gestures or verify if single-pointer alternatives exist. |
| 2.5.2 Pointer Cancellation | Not Detectable | Cannot test down-event activation or verify if proper cancellation mechanisms are implemented. |
| 2.5.3 Label in Name | Partially Detectable | Can compare visible text with accessible names, but may miss complex cases or image-based labels. |
| 2.5.4 Motion Actuation | Not Detectable | Cannot identify motion-based features or verify if alternative input methods and disable options exist. |
| 2.5.7 Dragging Movements | Not Detectable | Cannot identify drag functionality or verify if single-pointer alternatives are provided. |
| 2.5.8 Target Size | Mostly Accurate | Can measure static element dimensions reliably, though responsive designs and actual touch areas need verification. |
| 3.1.1 Language of Page | Mostly Accurate | Can verify lang attribute presence and format, though confirming the declared language matches content requires review. |
| 3.1.2 Language of Parts | Partially Detectable | Can identify lang attributes on elements, but cannot determine if all language changes are properly marked. |
| 3.2.1 On Focus | Partially Detectable | Can detect some context changes on focus, but cannot identify all types of unexpected behavior. |
| 3.2.2 On Input | Partially Detectable | Can identify automatic form submissions and some context changes, but requires manual verification. |
| 3.2.3 Consistent Navigation | Not Detectable | Cannot automatically compare navigation consistency across multiple pages. Requires human evaluation. |
| 3.2.4 Consistent Identification | Not Detectable | Cannot determine if components with same functionality are identified consistently throughout the site. |
| 3.2.6 Consistent Help | Not Detectable | Cannot verify if help mechanisms appear in consistent locations across pages. |
| 3.3.1 Error Identification | Partially Detectable | Can detect error message presence and association, but cannot evaluate clarity or helpfulness. |
| 3.3.2 Labels or Instructions | Partially Detectable | Can identify missing labels and basic instructions, but cannot evaluate adequacy or clarity. |
| 3.3.3 Error Suggestion | Not Detectable | Cannot verify if error messages include helpful suggestions for correction. |
| 3.3.4 Error Prevention | Not Detectable | Cannot test if legal/financial transactions have proper confirmation, review, or reversal mechanisms. |
| 3.3.7 Redundant Entry | Not Detectable | Cannot determine if previously entered information is auto-populated or made selectable. |
| 3.3.8 Accessible Authentication | Not Detectable | Cannot evaluate if authentication avoids cognitive function tests or provides adequate alternatives. |
| 4.1.1 Parsing | Mostly Accurate | Can identify HTML validation errors reliably, though impact on assistive technology varies. (Note: Obsolete in WCAG 2.2) |
| 4.1.2 Name, Role, Value | Partially Detectable | Can identify missing ARIA attributes and roles, but cannot verify if they accurately represent component purpose and state. |
| 4.1.3 Status Messages | Partially Detectable | Can detect ARIA live regions and role attributes, but cannot verify if status messages are properly announced. |
Note: Although there is significant overlap, not all scans are the same. Some scans include issues that others do not.
Table of Contents
How Automated Scans Work
Automated accessibility scans analyze your website’s code by reading HTML elements, measuring CSS properties, and checking for specific technical patterns. Think of them as sophisticated programs that look through your website’s code and flag anything that matches their list of known accessibility issues.
Examples of scans include:
- Google Lighthouse
- WAVE by WebAIM
- AXE by Deque
- Powermapper
These tools work similarly to spell checkers in word processors. Just as spell check can flag “there” when you meant “their,” it takes human judgment to know which correction applies. Scans can tell you an image is missing alternative text, but they cannot tell you if the alternative text that exists actually describes the image correctly.
This fundamental characteristic shapes everything about automated scanning. The tools excel at finding what’s missing or measuring what’s present, but they cannot evaluate meaning, context, or user experience.
What Scans Do Well
Automated scans provide real value in terms of speed and volume. They can process thousands of pages in minutes, flagging issues that would take humans weeks to identify manually.
Missing Elements Detection
Scans excel at flagging missing code elements. When an image lacks an alt attribute, a scan flags it immediately. When a form field lacks a label, the scan catches it. When a page lacks a title element, it gets flagged. These binary checks—something is either present or absent—represent the strongest capability of automated scanning.
For example, if your HTML contains:
<img src="team-photo.jpg">Every automated scan will flag this missing alt attribute. This detection is reliable and consistent.
However, if your HTML contains:
<img src="team-photo.jpg" alt="photo">The scan sees the alt attribute exists and typically will not flag an issue, even though “photo” provides no useful information to someone using a screen reader.
Mathematical Measurements
Scans can perform mathematical calculations reliably. Color contrast represents the best example — scans measure the color values of text and backgrounds, calculate the ratio, and flag combinations that fall below WCAG thresholds.
When scanning finds white text on a light gray background, it calculates a contrast ratio of 1.6:1 and flags it as failing WCAG’s required 4.5:1 ratio. This mathematical evaluation is consistent and accurate for simple scenarios.
However, scans often miss contrast issues when:
- Text appears over background images
- Gradients create varying contrast levels
- Dynamic content changes colors on interaction
- CSS pseudo-elements add overlays
- Text appears in non-HTML contexts like Canvas or SVG
Code Validation
Automated scans effectively identify HTML syntax errors, duplicate IDs, improper nesting, and malformed tags. They can parse your code and flag structural issues that might affect assistive technology.
For instance, scans will flag:
<div><span></div></span> <!-- Improper nesting -->
<div id="header"></div>
<div id="header"></div> <!-- Duplicate ID -->These technical validations help maintain clean code structure, though the actual impact on users varies considerably.
The Three Categories of Detection Reliability
Understanding automated scanning requires recognizing three distinct categories of detection reliability. Each WCAG success criterion falls into one of these categories based on how accurately scans can identify issues.
Category 1: Mostly Accurate Detection
About 13% of WCAG criteria can be flagged with high accuracy. These criteria involve measurable, technical requirements where scans rarely produce false positives. However, even these “mostly accurate” flags require human review to confirm the context and verify the implementation works correctly.
Success criteria in this category include:
1.4.3 Contrast (Minimum) – Scans accurately measure contrast ratios for standard text and backgrounds, though they may miss complex scenarios like text over images or gradients.
2.4.1 Bypass Blocks – Scans reliably detect the presence of skip links or landmark regions, though human review must verify they actually function and navigate to the correct location.
2.4.2 Page Titled – Scans can confirm title elements exist and identify duplicates, though humans must evaluate if titles are descriptive and meaningful.
3.1.1 Language of Page – Scans verify the lang attribute exists and uses valid language codes, though review is needed to confirm the declared language matches the actual content.
1.3.5 Identify Input Purpose – Scans can detect autocomplete attributes on form fields, though verification is needed to ensure the values match the actual field purposes.
1.4.11 Non-text Contrast – Scans measure contrast for user interface components, though complex graphics and states may require additional review.
2.5.8 Target Size – Scans can measure interactive element dimensions, though responsive designs and touch targets need verification.
Even with mostly accurate detection, human evaluation confirms whether the technical fix actually improves accessibility. A skip link might exist but skip to the wrong location. A page title might be present but provide no useful information. The scan correctly identifies the technical aspect, but the accessibility impact requires human judgment.
Category 2: Partially Detectable
Approximately 45% of WCAG criteria fall into the partially detectable category. Here, scans can flag potential issues but with varying reliability. False positives occur on occassion, and more often actual issues will go unflagged (false negatives).
These criteria typically involve elements that require understanding context or evaluating multiple factors. For example:
1.1.1 Non-text Content – Scans flag missing alt attributes reliably but cannot evaluate if existing alt text serves an equivalent purpose.
1.3.1 Info and Relationships – Scans can identify some structural issues like missing form labels but miss many semantic relationship problems.
2.1.1 Keyboard – Scans can detect some keyboard accessibility patterns but cannot test all functionality thoroughly.
2.4.4 Link Purpose – Scans flag generic link text like “click here” but cannot determine if surrounding context provides clarity.
In this category, scans serve as initial indicators that something might need attention. A flagged issue might be a genuine problem, a false positive, or might miss the actual accessibility issue entirely.
Category 3: Not Detectable
Roughly 42% of WCAG criteria cannot be detected through automated scanning at all. These criteria require human judgment about quality, appropriateness, accuracy, or user experience.
Undetectable criteria include:
1.2.2 Captions – Scans cannot verify if captions exist, are synchronized, or accurately represent audio content.
1.2.5 Audio Description – Scans cannot determine if audio descriptions adequately convey visual information.
2.4.5 Multiple Ways – Scans cannot assess if sites provide multiple navigation methods like search, sitemap, and menus.
3.2.3 Consistent Navigation – Scans cannot compare navigation consistency across different pages.
3.3.3 Error Suggestion – Scans cannot evaluate if error messages provide helpful correction guidance.
These criteria involve aspects of accessibility that require understanding meaning, evaluating quality, or assessing user experience—capabilities beyond current automated technology.
Scans vs. Audits
The distinction between automated scanning and comprehensive audits is fundamental to understanding accessibility evaluation. Scans provide a technical review of code against known patterns. Audits provide expert evaluation of actual accessibility and user experience, using the Web Content Accessibility Guidelines (WCAG) as a baseline
A scan might return a “100%” score because no technical flags were triggered, while an audit report identifies 100+ accessibility issues. The scan checked what it could measure; the audit evaluated everything.
Comprehensive audits involve:
- Expert review of all WCAG success criteria
- Testing with actual assistive technologies
- Evaluating user paths and task completion
- Assessing content clarity and understandability
- Verifying consistency across the experience
- Identifying issues beyond WCAG requirements
Where a scan might take seconds, an accessibility audit requires hours or days depending on site complexity. Where a scan produces a list of technical flags, an audit provides prioritized recommendations based on actual user impact.
The Role of AI in Evolving Scan Capabilities
Artificial intelligence has begun enhancing automated scanning capabilities, though fundamental limitations remain. AI accessibility scans can now perform more sophisticated analyses than traditional rule-based scanning.
Modern AI-enhanced scans can:
- Evaluate alt text quality using natural language processing
- Identify potentially confusing content patterns
- Suggest more descriptive link text
- Detect complex interaction patterns
- Flag inconsistent navigation structures
However, AI scanning still cannot determine if content truly serves user needs. An AI might recognize that alt text says “graph showing sales data” and suggest adding specific data points, but it cannot know whether those details are relevant for the page’s purpose or if the graph should be marked decorative.
The promise of AI lies not in replacing human evaluation but in making scans more intelligent about what they flag for human review. Instead of flagging every “click here” link, AI can evaluate surrounding context to reduce false positives. Instead of just detecting missing labels, AI can suggest appropriate label text based on field context.
These improvements make scans more useful for practitioners who understand their limitations. AI reduces noise in scan results, making human review more efficient, but it does not eliminate the need for expert evaluation.
Yet, AI scanning introduces new risks that traditional rule-based scans avoid. While traditional scans are deterministic and consistent (they either detect an issue or they don’t), AI scans can hallucinate problems that don’t exist or confidently suggest incorrect fixes.
A traditional scan will reliably flag a missing alt attribute every time, but an AI scan might incorrectly interpret code patterns, suggest inappropriate ARIA implementations, or misunderstand the context of interactive elements. This unpredictability means AI scans require even more careful human verification than traditional scans, as you’re not just checking for missed issues but also for fabricated ones.
Scans as Tools for Practitioners
For accessibility professionals who understand their limitations, automated scans are useful tools. They provide rapid initial assessment, identify patterns across large sites, and catch obvious technical issues that might otherwise be missed.
Experienced practitioners use scans to:
- Quickly assess the technical accessibility baseline
- Identify systemic issues needing attention
- Reduce the number of issues during staging
- Monitor for regression after making fixes
- Provide metrics for tracking progress
- Focus human evaluation efforts efficiently
The key is understanding what scans tell you and what they do not. A clean scan report does not mean your site is accessible. A long list of flags does not mean every issue is real or important. Scans provide data that requires interpretation.
Practitioners who use scans effectively understand that flagged issues are starting points for investigation. They know to look beyond what scans report to find the 40% of issues that cannot be detected automatically. They recognize that fixing flagged issues without understanding their context might not improve actual accessibility.
Scan-Based Accessibility Platforms
As we wrote about in our review of accessibility platforms, most are scan-based. This means the entire platform — and all analytics, progress reports, and data visualizations — are based on scan results.
As a result, many organizations are basing their entire accessibility project around a scan. Sometimes knowingly, many times unknowingly.
The problem is virtually all of these organizations are striving to make their websites and other digital assets WCAG 2.1 AA or WCAG 2.2 AA conformant while simultaneously targeting a 100% score within the platform (a scan score).
This is why we built Accessibility Tracker to be an audit-based platform — so you can accurately track your progress / WCAG conformance.
Insights
Automated accessibility scans are useful tools when their capabilities and limitations are understood. They excel at flagging technical patterns and missing elements but cannot evaluate meaning, quality, or user experience. Even the most accurate scan results require human verification to confirm actual accessibility impact.
The numbers reveal clear limitations: roughly 13% of WCAG criteria can be flagged with mostly accurate results, 45% can be partially detected with varying reliability, and 42% cannot be detected at all.
As AI enhances scanning capabilities, tools will become more sophisticated in what they flag for review, but the fundamental need for human evaluation remains.
Frequently Asked Questions
Even with a scan, do I still need an audit for WCAG conformance?
Yes. This is why we recommend clients do not purchase a scan (you’ll need the audit regardless). That said, you can still use scans for free and start learning about accessibility and get a feel for overall accessibility.
What percentage of accessibility issues can scans actually catch?
0% of WCAG success criteria can be conclusively detected by scans. Scans can reliably flag 13% of issues and partially flag 45% of WCAG criteria. Scans miss approximately 42% of criteria entirely.
Can I rely on scans that return 100% Score?
No. A clean scan score only means no technical patterns were flagged. Only 13% of WCAG 2.2 AA success criteria can be reliably flagged.
Are AI-powered scans significantly better than traditional scans?
AI-powered scans are more sophisticated in identifying potential issues and reducing false positives, but they still cannot evaluate subjective qualities like content clarity, error message helpfulness, or whether captions accurately represent audio. AI improves scan usefulness but does not eliminate the need for human evaluation. However, AI can also make mistakes / errors in judgment, which can undo some of the added utility.
How should organizations use automated scans effectively?
Scans should only be by practitioners who understand their limitations and for organizations who want to continuously monitor for new issues that are flagged by scans. Scans also serve as a way to get an overall feel for web accessibility (provided a developer hasn’t purposefully eliminated those issues that get flagged by scans).
Why do different scanning tools produce different results?
Each scanning tool uses different detection rules, algorithms, and heuristics. Some tools are more aggressive in flagging potential issues, while others are more conservative.
Can combining automated scans plus screen reader testing provide complete WCAG coverage?
No, combining screen reader testing with a scan is still wholly insufficient. Issues like color contrast, captions, small touch targets, complex gestures, and confusing instructions all pass screen reader testing while failing WCAG. Also, some screen readers can read past some technical issues while others cannot.
If AI improves, will automated scans eventually detect all WCAG issues?
While AI is making scans more sophisticated in identifying patterns and reducing false positives, the fundamental limitation remains: many WCAG criteria require human judgment about meaning, quality, and user experience. AI cannot determine if video captions accurately represent dialogue, if instructions make sense to users, or if navigation is truly consistent. AI improves what scans flag for review but cannot eliminate the need for human evaluation of context and appropriateness.
What’s the risk of relying solely on automated scanning for accessibility?
The risk is quite significant. Fixing scan issues will not result in WCAG conformance. WCAG 2.1 AA conformance is either required or a best practice for many laws / regulations concerning digital accessibility.
Do expensive enterprise scanning tools detect significantly more issues than free tools?
No. The AXE scan is free to use and reliably flags issues to the extent they can be.
Enterprise tools often provide better reporting, integration features, and workflow management, but their core detection capabilities are similar to free tools. All automated scanners face the same fundamental limitations—they cannot evaluate meaning, quality, or user experience. The price difference reflects convenience features and support rather than more issues flagged.