Skip to main content
MediaScribe
Deadline Extended: ADA compliance deadline moved to April 26, 2027.Learn what changed →

You are on lesson 6 of 11 in the course Path 1: Captions.

Module 1.2: Custom Vocabularies: Getting Names Right

Why vocabularies matter

Automated speech recognition handles everyday words well, but struggles with uncommon terms specific to your community. Council member names like "Aakanksha Chatterjee," local street names like "Wojciechowski Avenue," specialized terms like "stormwater bioswales," and abbreviations like "ARPA" often appear incorrectly. "Chatterjee" might become "chattery." "Wojciechowski" could appear as "voyage chef ski."

Custom vocabularies solve this problem by teaching MediaScribe the specific words, names, and phrases used in your meetings.

What vocabularies improve

Focus your vocabulary on terms that sound distinct but aren't in the system's standard dictionary:

People's names are the most common use case. Council members, department heads, and frequent speakers deserve accurate representation. "Siobhan O'Malley" shouldn't appear as "she vaughn oh malley."

Local place names include streets, neighborhoods, parks, and buildings that help residents understand what's being discussed.

Technical and specialized terms vary by department. Planning commissions use "conditional use permits." Public works discusses "bioretention systems." Budget meetings reference "millage rates."

Acronyms and abbreviations like "FEMA," "CARES Act," or "RFP" often appear as individual letters or nonsense words without vocabulary entries.

Grant programs and funding sources need exact official names for the record.

Vocabularies improve recognition of specific words but don't fix audio quality problems or grammatical errors. They work best when words are pronounced clearly and consistently.

Building an effective vocabulary

Start small and expand based on actual caption errors you observe.

Gather terms from real meetings

Watch caption output and note which terms consistently appear incorrectly. Keep a running list during meetings. When "sanitation" captions as "san station" repeatedly, that's a vocabulary candidate.

Review existing documents for terms to include:

  • Council or board member names from official rosters

  • Department head names and frequent public commenters

  • Street names from your jurisdiction's maps

  • Park and facility names

  • Current project names from capital improvement plans

  • Grant program names and technical terms from ordinances

Format entries correctly

MediaScribe vocabularies use two columns: Content and Sounds Like.

The Content column shows exactly how you want the term to appear. Use proper capitalization and spacing. Enter "Aakanksha Chatterjee," not "aakanksha chatterjee."

The Sounds Like column is optional but powerful. Use it when spelling doesn't match pronunciation. For "Siobhan," enter "shiv-awn" or "shuh-von." For "Schuylkill," enter "skoo-kill." Write how it sounds in plain English.

You can enter multiple sounds-like variations for the same term. If "Wojciechowski" gets pronounced as both "voy-cheh-kow-ski" and "woy-jeh-kow-ski," enter both.

Organize by meeting type

Create preset-specific vocabularies:

  • City Council preset: Council member names, city departments, major street names, and current project names

  • Planning Commission preset: Planning terms, applicant names, property locations, zoning classifications

  • Board of Education preset: Board member names, school names, curriculum terms

  • Public Works preset: Engineering terms, contractor names, infrastructure project names

Setting up vocabularies in MediaScribe

Vocabularies are configured at the preset level.

Access and add entries

  1. Navigate to Settings > Presets

  2. Click Edit on the preset you want to configure

    MediaScribe Settings page on the Presets tab, showing a list of presets with edit and delete icons, highlighted by a red arrow.
  3. Select the Vocabulary tab

  4. Type how you want the word displayed in the Content field

  5. Optionally, type phonetic spellings in the Sounds Like field

  6. Click the Plus button to add the entry

    Edit Preset “School Committee” Vocabulary tab showing a list of custom phrases with display text and pronunciation variants.

You can add up to 1,000 words or phrases per vocabulary. To add multiple pronunciation variations, type an additional sounds-like entry and click Plus again.

Click the Trash icon to remove an entry and all its sounds-like variations. Click Save when finished.

CSV import/export

For larger vocabularies, use CSV import and export.

Vocabulary settings screen showing Import CSV and Export CSV buttons highlighted with arrows.

Click Export CSV to download your vocabulary as a CSV file. This creates a backup and lets you edit in a spreadsheet.

The CSV has two columns: Content and Sounds Like. For multiple pronunciations, add multiple rows with the same Content value but different Sounds Like entries.

Spreadsheet showing a vocabulary list mapping “sounds like” abbreviations to full department and council names.

Click Import CSV to upload a vocabulary file. Import replaces your current vocabulary, so export first to preserve existing entries.

Example vocabulary template

Here's a sample vocabulary showing proper formatting for different term types:

Content

Sounds Like

Aakanksha Chatterjee

ah-kahn-shah chat-er-jee

Wojciechowski Avenue

voy-cheh-kow-ski

Wojciechowski Avenue

woy-jeh-kow-ski

ARPA

ARPA

ar-pah

Community Development Block Grant

Schuylkill River

skoo-kill

bioswales

bioswales

bio-swales

planned unit development

millage rate

Siobhan O'Malley

shiv-awn oh-malley

Siobhan O'Malley

shuh-von oh-malley

Notice how some entries have no sounds-like value (ARPA, bioswales, planned unit development). This is fine when the correct spelling is phonetically similar to how it sounds. The content term itself helps MediaScribe recognize it.

Some entries have multiple sounds-like variations (Wojciechowski Avenue, ARPA, Siobhan O'Malley). This increases recognition accuracy when different speakers pronounce terms differently.

Maintaining your vocabulary

Schedule quarterly reviews to:

  • Add names of new officials or department heads

  • Remove terms for people who have left or completed projects

  • Add technical terms from recent meetings

  • Update pronunciations for terms still appearing incorrectly

Keep exported CSV files as backups before making major changes.

Testing vocabulary effectiveness

Monitor your next few meetings after adding terms. If a term still appears incorrectly, check your entry:

  • Is the Content spelling exactly what you want?

  • Does the Sounds Like entry match actual pronunciation?

  • Have you included all common pronunciation variations?

Listen carefully to how words are actually pronounced in meetings, not how you think they should sound.

Vocabularies improve automated caption accuracy but don't eliminate the need for human review. Always review captions for accuracy before publishing final transcripts.

Common vocabulary scenarios

New city council: Three new council members joined, and their names appear incorrectly in captions. Add each name to your city council preset vocabulary with sounds-like entries for non-standard pronunciations.

Major infrastructure project: A multi-year project name comes up in every meeting. Add the complete project name, contractor names, and street names in the project area to relevant presets.

Grant program discussions: Federal grant acronyms caption as random words or individual letters. Add both full grant program names and acronyms with sounds-like entries for acronyms pronounced as words versus spelled out.

Technical planning terms: Specialized terms like "conditional use permit" sometimes appear incorrectly. Add planning-specific terms to your planning commission preset for consistent formatting.

Best practices

Start with names first. Personal names show the most noticeable improvement and matter most to the people being captioned.

Add terms based on actual errors. Watch real caption output instead of anticipating problems. This prevents vocabulary bloat.

Keep pronunciations simple. Write sounds-like entries in plain English. "shiv-awn" works better than formal phonetic notation.

Use preset-specific vocabularies. Targeted vocabularies work better than one giant list for all meetings.

Update quarterly. Regular reviews work for most organizations. More frequent updates are needed during staff turnover or major projects.

Share across your organization. Make sure everyone knows where to find vocabulary files and how to suggest additions.


MediaScribe integration

MediaScribe's vocabulary feature integrates directly into the preset configuration system, allowing you to create meeting-specific vocabularies that automatically load when you select that preset. The vocabulary interface supports up to 1,000 terms per preset and includes CSV import/export for managing large vocabularies or sharing them across multiple presets. Vocabularies work with both live captioning and post-production transcripts, improving accuracy across your entire accessibility workflow.

To configure vocabularies for your presets, navigate to Settings > Presets, select the preset you want to edit, and choose the Vocabulary tab.


What's next

With vocabularies configured, your captions will be more accurate for community-specific terms. Next steps: configure speaker profiles, review caption styling options, and set up preset configurations that include appropriate vocabularies for each meeting type.