Skip to content

Feedback: getting-started-models

Original URL: https://www.assemblyai.com/docs/getting-started/models
Category: getting-started
Generated: 05/08/2025, 4:30:32 pm


Generated: 05/08/2025, 4:30:31 pm

Technical Documentation Analysis: AssemblyAI Models

Section titled “Technical Documentation Analysis: AssemblyAI Models”

This documentation provides a good foundation but lacks critical technical details and practical guidance that developers need to make informed decisions and implement successfully.

  • Audio format requirements (file types, sample rates, bit rates, encoding)
  • File size and duration limits for each model
  • Processing time estimates (e.g., “Universal typically processes 1 hour of audio in 2-3 minutes”)
  • Memory and bandwidth requirements for streaming
  • Rate limits and concurrent request limits
  • No code examples showing how to select models in API calls
  • Missing authentication setup information
  • No error handling examples or common error codes
  • Webhook configuration for async processing (if applicable)
  • Benchmark comparisons between models on common use cases
  • Actual accuracy metrics beyond WER ranges
  • Latency measurements in real-world scenarios
  • Performance degradation factors (background noise, accents, audio quality)
"Universal-Streaming" vs "Streaming"
Use consistent naming throughout (recommend "Universal-Streaming")

Current: “Good accuracy (>10% to ≤25% WER)” Better:

Good accuracy (11-25% WER)
- Suitable for: Content analysis, meeting transcription
- Not recommended for: Legal documentation, medical transcription
- Typical use cases: [specific examples]

Add a decision tree or flowchart:

Start here →
├─ Real-time needed? → Universal-Streaming
├─ English only + highest accuracy? → Slam-1
└─ Multi-language + good accuracy? → Universal
# Add this example
import assemblyai as aai
# Configure model selection
config = aai.TranscriptionConfig(
speech_model=aai.SpeechModel.slam_1, # or universal, universal_streaming
language_code="en" # for Universal model
)
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe("path/to/audio.mp3")
# Add streaming configuration example
import assemblyai as aai
def on_data(transcript: aai.RealtimeTranscript):
if not transcript.text:
return
if isinstance(transcript, aai.RealtimeFinalTranscript):
print(transcript.text, end="\r\n")
aai.settings.api_key = "your-api-key"
transcriber = aai.RealtimeTranscriber(
on_data=on_data,
on_error=lambda error: print("Error:", error),
sample_rate=16000,
encoding=aai.AudioEncoding.pcm_s16le
)

Replace current basic table with:

FeatureSlam-1UniversalUniversal-Streaming
Primary Use CaseHigh-accuracy EnglishMulti-language batchReal-time applications
LanguagesEnglish only80+ languagesEnglish + 10 major languages
Avg Processing Time0.3x audio length0.15x audio length~300ms latency
Best WER<5% (English)<10% (top languages)<12% (real-time)
Max File Size[specify][specify]N/A (streaming)
Fine-tuning
Custom VocabularyLimitedLimited
Concurrent Requests[specify][specify][specify]

Add: “New to AssemblyAI? Start with Universal model - it works out of the box for most use cases.”

## Before You Start
- [ ] Obtain API key from [dashboard link]
- [ ] Install SDK: `pip install assemblyai`
- [ ] Verify audio format compatibility
- [ ] Review rate limits for your plan

Add section:

## Common Issues
- **Poor accuracy?** → Check audio quality, consider Slam-1 for English
- **Slow processing?** → Use Universal for better speed/accuracy balance
- **Streaming dropouts?** → Verify network stability and sample rate
## Quick Model Selection
- **English podcast transcription** → Slam-1
- **Multi-language meeting notes** → Universal
- **Voice assistant integration** → Universal-Streaming
- **Legal/medical documentation** → Slam-1 with fine-tuning
  • Move detailed language list to separate page
  • Keep only top 10-15 languages in main documentation
  • Add language detection capabilities information

Current: Basic links Better:

## Next Steps
1. **First time?** → [5-minute quickstart tutorial]
2. **Ready to implement?** → [Model selection API guide]
3. **Need customization?** → [Fine-tuning documentation]
4. **Production deployment?** → [Best practices guide]
  1. Add FAQ section addressing common model selection questions
  2. Include audio quality guidelines for optimal results with each model
  3. Provide cost calculator for different usage patterns
  4. Add model comparison playground link where users can test different models
  5. Include migration guide for switching between models
  6. Add monitoring and analytics information for production usage

This documentation would benefit significantly from more technical depth, practical examples, and clearer guidance for different user personas (beginners vs. experienced developers).