Skip to content

Feedback: speech-to-text-pre-recorded-audio-improving-transcript-accuracy

Original URL: https://assemblyai.com/docs/speech-to-text/pre-recorded-audio/improving-transcript-accuracy
Category: speech-to-text
Generated: 05/08/2025, 4:25:22 pm


Generated: 05/08/2025, 4:25:21 pm

Technical Documentation Analysis & Recommendations

Section titled “Technical Documentation Analysis & Recommendations”

Major Issues Requiring Immediate Attention

Section titled “Major Issues Requiring Immediate Attention”
  • No authentication setup instructions - Users don’t know how to obtain or format <YOUR_API_KEY>
  • No prerequisites section - Missing required dependencies, account setup, or SDK installation
  • Incomplete model comparison - No explanation of when to use universal/nano vs slam-1
  • Missing error handling details - What specific errors might occur and how to handle them

Add Missing Sections:

## Prerequisites
- AssemblyAI account and API key
- Required dependencies: requests, axios
- Supported audio formats and file size limits
## Model Comparison
| Model | Best For | Accuracy | Speed | Fine-tuning |
|-------|----------|----------|-------|-------------|
| slam-1 | Domain-specific, high accuracy needs | Highest | Slower | Yes (keyterms) |
| universal | General purpose | Good | Fast | Limited |
| nano | Real-time, cost-sensitive | Basic | Fastest | No |

Current Problem: The keyterms example uses medical terms but the audio URL suggests sports content.

Fix with Domain-Matched Examples:

# Medical audio example
data = {
"audio_url": "https://assembly.ai/medical_consultation.mp3",
"speech_model": "slam-1",
"keyterms_prompt": ['differential diagnosis', 'hypertension', 'Wellbutrin XL 150mg']
}
# Sports audio example
data = {
"audio_url": "https://assembly.ai/sports_injuries.mp3",
"speech_model": "slam-1",
"keyterms_prompt": ['ACL tear', 'physical therapy', 'sports medicine', 'rehabilitation']
}

Insert Best Practices Section:

## Best Practices for Keyterms
### Effective Keyterm Selection
**Good examples:**
- Technical terms: "API endpoint", "machine learning"
- Proper nouns: "JavaScript", "MongoDB"
- Domain jargon: "differential diagnosis", "accounts payable"
**Avoid:**
- Common words: "the", "and", "very"
- Overly long phrases (>6 words)
- Duplicates or near-duplicates
### Optimization Tips
- Start with 10-20 most critical terms
- Test and iterate based on results
- Use actual terminology from your domain
- Include variations and abbreviations

Add Error Handling & Validation:

import requests
import time
import os
# Better authentication handling
API_KEY = os.getenv('ASSEMBLYAI_API_KEY')
if not API_KEY:
raise ValueError("Please set ASSEMBLYAI_API_KEY environment variable")
headers = {"authorization": API_KEY}
# Input validation
def validate_keyterms(keyterms):
if len(keyterms) > 1000:
raise ValueError("Maximum 1000 keyterms allowed")
for term in keyterms:
if len(term.split()) > 6:
raise ValueError(f"Term '{term}' exceeds 6 word limit")
return True
# Usage with validation
keyterms = ['differential diagnosis', 'hypertension', 'Wellbutrin XL 150mg']
validate_keyterms(keyterms)

Add Troubleshooting Section:

## Troubleshooting
### Common Issues
**"Keyterms not improving accuracy"**
- Ensure terms actually appear in your audio
- Verify you're using slam-1 model
- Try more specific/technical terminology
**"Hitting keyword limits"**
- Count total words, not phrases (each word counts toward 1000)
- Prioritize most critical terms
- Use lowercase when possible (saves tokens)
**"Authentication errors"**
- Verify API key format: no 'Bearer' prefix needed
- Check key permissions in dashboard
- Ensure account has sufficient credits
1. Prerequisites & Setup
2. Model Selection Guide
3. Basic Usage (slam-1)
4. Advanced: Fine-tuning with keyterms
5. Best Practices
6. Alternative Models (universal/nano)
7. Troubleshooting
  • Explain the “1000 limit” with concrete examples of token counting
  • Define “multi-modal architecture” or link to explanation
  • Clarify “related terminology” with before/after examples
  • Add performance metrics (accuracy improvements, processing time)
  • Link to audio format requirements
  • Reference pricing differences between models
  • Connect to real-time transcription docs
  • Add links to SDK documentation
  1. High Priority: Add prerequisites, fix authentication, improve examples
  2. Medium Priority: Add troubleshooting, best practices, model comparison
  3. Low Priority: Restructure flow, add performance metrics, enhance cross-references

These changes will transform the documentation from a basic API reference into a comprehensive guide that helps users succeed on their first attempt.