Not yet. While large language models (LLMs) like ChatGPT, Google Gemini, and Facebook / Meta’s Llama and accessibility specific artificial intelligence (AI) tools are already quite impressive, we’re still a ways off from AI being able to fully conduct a website accessibility audit.
In this guide, we’ll use ChatGPT as the AI representative and relate and compare ChatGPT current capabilities with the WAVE automated scan from WebAIM and the good ol’ fashion fully manual audits from Accessible.org. This will give you context for all three and help you understand the key differences.
In a nutshell, AI can already check for several accessibility issues, but there are practical and technical limitations to its efficiency and effectiveness. And although AI’s capabilities supersede the WAVE scan, AI isn’t nearly as quick, user friendly, or as reliable as WAVE.
And, of course, this means that AI hasn’t replaced Accessible.org audit services just yet.
Let’s explain further.
Table of Contents
What is an Audit?
Before we continue, we must define what an audit is so we’re on the same page.
A website accessibility audit is a formal, manual evaluation of a website’s accessibility. Typically, the Web Content Accessibility Guidelines (WCAG) 2.1 AA or WCAG 2.2 AA will be used as a baseline standard to grade the website.
The purpose of an audit is to identify accessibility issues that might negatively impact the experience of a user with a disability.
Informally, many people refer to audits as ADA website audit because they are concerned with lawsuits concerning website accessibility.
Can AI Audit a Website?
No, ChatGPT can’t mimic human, manual evaluation.
Can ChatGPT find accessibility issues if you feed it code?
Absolutely, but there are several important notes on ChatGPT’s ability to find accessibility issues.
Errors
If you’ve used any LLM for long enough, you’re well aware that AI makes errors and that remains true in terms of finding accessibility issues.
For the sake of simplicity, we’ll lump the following as errors in this section:
- being outright wrong
- oversights
- missing issues
- making stuff up
- hallucinations
- not factoring in “obvious” considerations
- applying prompts too literally
- not following prompts closely enough
And the list goes on. The point here is that AI routinely mistakes which means any ChatGPT website accessibility audit would be prone to false positives, false negatives, and missed issues galore.
Processing Limitations
There are limits to how much code ChatGPT can effectively parse through, whether you upload it or copy and paste it.
ChatGPT performs quite well when you give it a singular task with a limited focus vs. just saying here’s some code, tell me the problems. Practically, this means you’re better off pasting smaller snippets of code, one snippet at a time.
Technical Limitations
An essential part of a website accessibility audit is not just inspecting the visual appearance and code of the website, but testing the website using a keyboard and screen reader.
We need to test to make sure that not only is everything technically in place, but also practically works and behaves as it should.
AI can’t truly help us with testing yet.
Practical Limitations
LLMs also aren’t able to make a lot of the “dynamic” considerations that a human auditor will.
Sometimes, when auditing a website, an auditor needs to take several different considerations into account all at once. Humans can roll these considerations into one continuous thought, but AI is currently more linear in how it thinks.
For example, an Accessible.org audit will typically include a few accessibility recommendations as best practices, even if the website is technically conformant with the relevant success criteria. In a theoretical world where ChatGPT could currently identify all instances of WCAG 2.1 AA non conformance, the LLM likely wouldn’t add in extra recommendations for optimal accessibility and possible legal considerations.
Prompt Engineering
Another difficulty is you have to know what to ask of ChatGPT if you want the best result.
For example, entering a prompt such as:
“Tell me all of the WCAG 2.1 AA issues in this code.”
Will result in middling if not poor quality results. There are 50 WCAG 2.1 AA success criteria and ChatGPT simply won’t strictly evaluate your code against each success criterion. Rather, the grading against WCAG will be uneven and overgeneralized.
However, if you laser your focus your prompt to whether a snippet of code meets a single success criterion, you’ll have much better results.
Which means one problem for the average website owner trying to make their website ADA compliant is they have to have the preliminary knowledge necessary to fully leverage AI capabilities.
ChatGPT vs. WAVE
What is WAVE?
WAVE (Web Accessibility Evaluation Tool) is a tool developed by WebAIM that uses automation to scan a webpage using rule sets. The software returns errors and alerts for any instances where the code doesn’t meet conditions defined by the rule sets.
Differences Between AI and WAVE
In no surprise, artificial intelligence is more intelligent than a scan in its ability to find accessibility issues. However, just because AI is capable of more, doesn’t mean its actually better than WAVE in terms of performance.
Whether WAVE or ChatGPT is better at finding accessibility issues depends on what you mean by finding accessibility issues / what you’re trying to accomplish.
In a straight up scan vs. scan, WAVE wins easily. However, when it comes to more in-depth, intelligent evaluation, ChatGPT obviously takes the championship belt.
Here’s an illustration of intelligence: You can’t ask WAVE to evaluate whether a specific ARIA attribute was the best choice.
But tilting the see-saw back the other way, ChatGPT won’t return a reliable list of almost issues that can be flagged by automation within a few seconds. GPT also won’t highlight the issues on your website and educate you on accessibility as you look through the issues.
ChatGPT vs. Accessible.org
If you’ve read this far, you already know how this story ends.
Virtually all of our clients come to us because they want an ADA website compliance audit and there is no AI tool on the market that even comes remotely close to being able to evaluate a website based on WCAG 2.1 AA or any other technical standard.
For all of the reasons listed above and more, it’s just not happening.
Now we’re confident that some really nice AI tools will eventually make their way onto the market, but will they practically yield any benefit to customers?
As mentioned, our clients are interested in ADA compliance. Essentially, this means they don’t want to get sued over website accessibility.
If a tool comes out that can reliably audit for 40% of accessibility issues (rather than 25% currently), how much of a practical difference does that make for clients who still need services to ensure their website is ADA compliant?
Obviously, part of the value would depend on what the additional accessibility issues were, but our point is that attention to detail is critical and a scan that isn’t conclusive won’t have significant practical value for the majority of website owners.
Accessibility Software
Which leads to our next point: accessibility software is really more for accessibility professionals than it is for website owners.
Because there are various tools – including software, applications, and AI – that really do aid us in finding accessibility issues, but we’re working in accessibility and we know the strengths and weaknesses of these tools and how to leverage them.
In contrast, many website owners don’t fully understand the tools they use. For example, many people think that scans like Google Lighthouse and WAVE are ADA website compliance checkers and that all they have to do is get their errors down to 0 and they’re finished with accessibility.
And going back to AI prompting, people that use LLMs might not know enough about digital accessibility to be able to extract the right information from AI.
Summary
AI is already wild in what it can do and its capabilities are accelerating at a rapid pace. But asking AI to audit a website’s accessibility in 2024 is currently unrealistic.
Look out for breakthroughs in artificial general intelligence (AGI) because this is when AI will quickly turn a corner in accessibility.
As for new accessibility AI tools including possible fine-tuned LLM models, we may not see a momentous breakthrough before AGI takes over. Even if a big release is announced, it would likely be rushed to market with early customers essentially being beta testers.
And some of the first issues they would find would be with the AI results themselves.