How to Use ChatGPT to Create an Accessible Text Transcript

With most AI transcript services, they’re only giving you the text of your video or audio file. That’s not enough to create a WCAG conformant text transcript.

Whether it’s:

  • riverside.fm
  • restream.io
  • turboscribe.ai
  • aurisai.io
  • trint.com
  • otter.ai

The common theme is we’ll transcript audio to text. So text transcription, but not a text transcript.

For an accessible text transcript, you need to include:

  • Speaker identification when there are multiple speakers
  • Description of relevant non-speech sounds (music, meaningful background noises)
  • Description of important visual information for video content
  • Text shown on screen that isn’t spoken aloud

And, of course, the transcript needs to be accurate. Oh, and formatting so that the text isn’t in one block is also important.

Most of the AI transcription tools are now fairly accurate (a manual review is still needed), but the absolute essential item that is missing is speaker identification: we need to assign the text to each speaker.

For a single host podcast with no guests, problem solved.

For audio or video with multiple speakers, we need to add in those speakers.

Here’s our ChatGPT (or Gemini) hack for assigning speakers:

  1. Give ChatGPT the transcript with the speakers assigned to their initial dialogue
    • the more text you assign (e.g., 5 rounds of speaking), the better the accuracy is
  2. Prompt ChatGPT to continue assigning the speakers based on the conversation flow and patterns

With two people, this is very accurate.

With three people, the accuracy decreases, but you’ll still at least have placeholder names that you can manually replace.

Let’s take the example of this NBA playoff preview YouTube video podcast from The Zach Lowe Show. Here this is just Zach Lowe and guest Bill Simmons discussing the upcoming NBA playoffs.

This transcript would be fairly easy for AI to pick up on because Zach Lowe introduces the show, then transitions to Bill and Bill picks up from there.

There can become a point where the speaker assignment gets muddied. For example, let’s say Zach interrupts Bill and it’s unclear who’s still continuing on.

This happens in many discussions so this is why it’s so important to manually review the AI output.

Still, AI will save us 90% of the time we used to spend manually creating transcripts (even if we had the text transcription).

And note that The Zach Lowe Show recently started up on The Ringer Podcast Network on Spotify so there is a pure audio version of this podcast which means a text transcript is needed for WCAG conformance.

ChatGPT Prompt

Here’s a prompt you can use for creating a text transcript:

I need help assigning speakers to this transcript. The conversation involves [number] speakers: [list names].

Here are the first few exchanges with speakers already assigned to help you understand their speaking patterns:

[Speaker 1]: [Their initial dialogue – include at least 5 lines/paragraphs]

[Speaker 2]: [Their initial dialogue – include at least 5 lines/paragraphs]

[etc. for additional speakers]

Now continue assigning speakers to the rest of this transcript while maintaining their distinct speaking patterns and conversation flow:

[Paste remaining unassigned transcript here]

Format your response with clear speaker labels before each section of dialogue and maintain paragraph breaks for readability.

ChatGPT Project

If you’re continually creating text transcripts for one or a few shows, creating a Project in ChatGPT can significantly improve efficiency and accuracy. Here’s how to set up and optimize a transcript project:

Set Up

  1. Create a new ChatGPT Project for your specific show or podcast
    • Name it clearly (e.g., “The Zach Lowe Show Transcripts”)
    • Add a description explaining its purpose
  2. Upload reference materials:
    • Previous accurately labeled transcripts from the same show
    • Show notes that include guest information
    • Brief descriptions of regular hosts
  3. Create a “Speaker Profile” file with:
    • Common phrases or verbal tics for each regular speaker
    • Topics each speaker typically covers
    • Their role on the show (host, co-host, regular guest)

Optimize Workflow

  1. Start with a baseline prompt: see above
  2. Save successful prompts as templates in your project for reuse with future episodes
  3. Process transcripts in segments if they’re particularly long (30-minute chunks work well)
  4. After each successful transcript, save the completed version in your project to build the AI’s knowledge base

Project Knowledge Power

When you load knowledge into a Project, ChatGPT learns:

  • How to recognize recurring hosts based on their speech patterns
  • Common topics and vocabulary specific to your show
  • The typical flow and format of your episodes
  • Which speaker is more likely to reference certain topics or people

Each transcript you process and save in the Project improves ChatGPT’s accuracy for future transcripts, creating a positive feedback loop that saves you more time with each new episode.

When a new episode features return guests, ChatGPT will already have profiles for their speaking styles, making speaker identification much more accurate than starting from scratch each time.

Long Transcripts

The embedded podcast episode is literally over two hours – that’s a long, long transcript. And not that putting a transcript in the YouTube or podcast description would be optimal, but the characters are limited so it’s a no go anyway.

The obvious answer is to host the transcript elsewhere. Using the case of The Zach Lowe Show, The Ringer could host all of their transcripts on TheRinger.com, but they not want to for multiple reasons. One being it could interfere with site search results. Another for SEO reasons.

The semi-good news is there appear to be two services that host The Bill Simmons Podcast transcripts (HappyScribe.com and Podscribe.com), but these transcripts have errors and, crucially, no speaker assignments.

Try reading a podcast transcript with no speaker assignments. It doesn’t work – you don’t know who’s saying what. This is not only a usability problem, it’s an access problem for people who are deaf or hard-of-hearing.

Transcript Host

This is one of the big reasons we created Transcript Host: give people a place to host all of their text transcripts.

And it’s not just hosting – you have full control over your transcript inside the dashboard so you can format it as you’d like for optimal usability and accessibility.

From there, all you need to do is link to the episode from your show notes and you’ve got a WCAG conformant podcast.

If you’d like a dedicated host for managing your transcripts, you can sign up at TranscriptHost.com.

Related Posts

WCAG 2.1 AA Training

We have created the best training for learning the Web Content Accessibility Guidelines (WCAG). Videos, Excel spreadsheet checklists, cheatsheets, and code examples included

Start Learning WCAG

Join The War Room

Get access to 2.5 hours of Kris Rivenburgh's best advice on stopping ADA website lawsuits. This strategy session is behind closed doors information only previously available to clients who hire Kris for 1-on-1 consulting ($395/hour). Sign up now and significantly reduce your risk of being sued.

Join The War Room
Kris Rivenburgh

Kris Rivenburgh

Kris has helped thousands of people with accessibility and compliance. Clients range from small businesses to governments to corporations.