Skip to content

Feedback: speech-to-text-pre-recorded-audio-filler-words

Original URL: https://www.assemblyai.com/docs/speech-to-text/pre-recorded-audio/filler-words
Category: speech-to-text
Generated: 05/08/2025, 4:25:26 pm


Generated: 05/08/2025, 4:25:25 pm

Technical Documentation Analysis: Filler Words Feature

Section titled “Technical Documentation Analysis: Filler Words Feature”

This documentation covers a specific feature (filler words detection/removal) but lacks crucial context, examples, and user guidance. While the code examples are comprehensive across multiple languages, the conceptual explanation is minimal.

Problem: The documentation jumps straight into technical details without explaining what filler words are or why this feature matters.

Recommendations:

# Filler Words Detection
Filler words (also called disfluencies) are speech hesitations like "um," "uh," and "hmm" that speakers use while thinking or pausing. AssemblyAI automatically removes these from transcripts by default to provide cleaner, more readable text.
## When to use this feature
- **Keep filler words** when analyzing speech patterns, creating subtitles, or studying natural speech
- **Remove filler words** (default) for clean transcripts, meeting notes, or content creation
## How it works
AssemblyAI's speech recognition model identifies and filters out common filler words during transcription. You can control this behavior using the `disfluencies` parameter.

Problem: The list appears incomplete and doesn’t explain language variations or detection confidence.

Recommendations:

## Detected Filler Words
The following filler words are automatically removed by default:
**Hesitations:** "um", "uh", "ah"
**Confirmations:** "mhm", "uh-huh", "hm"
**Thinking sounds:** "hmm", "huh"
**Minimal responses:** "m"
> **Note:** Detection accuracy may vary based on audio quality, speaker accent, and context. Some instances might be missed or incorrectly identified as filler words.

Problem: Users can’t see the actual impact of enabling/disabling this feature.

Recommendations:

## Example Output
**With filler words removed (default):**

I think we should go to the store and buy some groceries for tonight’s dinner.

**With filler words included (`disfluencies: true`):**

Um, I think we should, uh, go to the store and buy some groceries for, hmm, tonight’s dinner.

4. Inadequate Language Support Information

Section titled “4. Inadequate Language Support Information”

Problem: The accordion is easy to miss, and there’s no explanation of why only English variants are supported.

Recommendations:

## Language Support
Currently available for English variants only:
- Global English (`en`)
- Australian English (`en_au`)
- British English (`en_uk`)
- US English (`en_us`)
> **Coming soon:** Support for additional languages is in development. [Request a language →](link-to-feedback)

Problems:

  • No explanation of the disfluencies parameter
  • Missing error handling context
  • No guidance on when to use each approach

Recommendations:

Add parameter documentation:

## Configuration
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `disfluencies` | boolean | `false` | When `true`, includes filler words in transcript. When `false` (default), removes filler words for cleaner text. |

Improve code example introduction:

## Implementation
Set `disfluencies: true` in your transcription configuration to preserve filler words in the output:

Problem: No guidance on best practices, limitations, or troubleshooting.

Recommendations:

## Best Practices
- **For meeting transcripts:** Keep default (`disfluencies: false`) for cleaner notes
- **For speech analysis:** Enable (`disfluencies: true`) to preserve natural speech patterns
- **For accessibility:** Consider your audience - some prefer filler words for context
## Limitations
- Detection accuracy depends on audio quality and speaker clarity
- Some words may be incorrectly classified as filler words in certain contexts
- Very quiet or unclear filler words might not be detected
## Troubleshooting
**Filler words appearing when they shouldn't?**
- Check that `disfluencies` is set to `false` (default)
- Verify your audio quality meets [minimum requirements](link)
**Missing expected filler words?**
- Ensure `disfluencies` is set to `true`
- Consider that very quiet hesitations may not be detected

Current structure issues:

  • Abrupt start without introduction
  • Language support buried in accordion
  • No logical flow from concept to implementation

Recommended structure:

1. Feature introduction and purpose
2. How it works (brief technical explanation)
3. Language support (prominent placement)
4. Configuration options
5. Before/after examples
6. Code implementation
7. Best practices and limitations
8. Troubleshooting
9. Related features (cross-links)

Add related documentation links:

## Related Features
- [Speech Recognition Accuracy](link) - Improve overall transcription quality
- [Custom Vocabulary](link) - Handle domain-specific terms
- [Punctuation & Formatting](link) - Additional text cleaning options
  1. Add a 2-3 sentence introduction explaining what filler words are
  2. Include a simple before/after example at the top
  3. Move language support out of the accordion to make it more visible
  4. Add a troubleshooting section with common issues
  5. Include cross-links to related features

These changes would significantly improve user understanding and reduce support requests while maintaining the existing comprehensive code examples.