Skip to content

Feedback: speech-to-text-pre-recorded-audio-custom-spelling

Original URL: https://www.assemblyai.com/docs/speech-to-text/pre-recorded-audio/custom-spelling
Category: speech-to-text
Generated: 05/08/2025, 4:26:00 pm


Generated: 05/08/2025, 4:25:59 pm

Analysis & Recommendations for Custom Spelling Documentation

Section titled “Analysis & Recommendations for Custom Spelling Documentation”

A. Inconsistent API Structure Documentation

Section titled “A. Inconsistent API Structure Documentation”

Problem: The Python SDK uses a completely different structure than all other languages, creating major confusion.

Fix: Add a clear explanation of the structural differences:

## API Structure Differences
**Python SDK**: Uses dictionary format where:
- Key = desired output spelling
- Value = array of input words to replace
**All other APIs**: Use object/array format where:
- `from` = array of input words to replace
- `to` = desired output spelling
### Python SDK Format
```python
{
"SQL": ["Sequel", "sequel"], # Output: Input variations
"DeCarlo": ["decarlo", "Decarlo"]
}
[
{
"from": ["Sequel", "sequel"],
"to": "SQL"
}
]
#### B. Contradictory Examples
**Problem**: Examples show conflicting mappings (SQL→Sequel vs Sequel→SQL) without explanation.
**Fix**: Use consistent, logical examples throughout:
```markdown
## Consistent Example Set
- Company names: "decarlo" → "DeCarlo"
- Technical terms: "sequel" → "SQL"
- Brand names: "goo gle" → "Google"
## Limitations & Constraints
- **Word limits**: Maximum X words per `from` array
- **Mapping limits**: Maximum X custom spelling rules per request
- **Character limits**: `to` value limited to X characters
- **Language considerations**: How custom spelling interacts with different languages
- **Processing priority**: How custom spelling interacts with other features
## Common Use Cases
### Technical Terms
- Database terminology: "sequel" → "SQL", "my sequel" → "MySQL"
- Programming languages: "java script" → "JavaScript"
### Proper Nouns
- Company names: "micro soft" → "Microsoft"
- Personal names: "o'connor" → "O'Connor"
- Geographic locations: "new york" → "New York"
### Acronyms & Abbreviations
- "C E O" → "CEO"
- "A P I" → "API"
## Best Practices
- Test with sample audio before processing large batches
- Use consistent capitalization in `to` values
- Include common variations in `from` arrays
## Quick Reference
| Language | Structure | Key→Value Mapping |
|----------|-----------|-------------------|
| Python SDK | Dictionary | `"output": ["input1", "input2"]` |
| REST APIs | Object Array | `{"from": ["input1"], "to": "output"}` |
## Parameter Summary
- **Case sensitivity**: `to` values are case-sensitive, `from` values are not
- **Word count**: `to` must be single word, `from` can be multiple words
- **Array format**: `from` always expects an array, even for single words

Current issues:

  • Inconsistent highlighting
  • Missing error handling in some examples
  • No output examples

Improvements:

### Before/After Output Examples
**Input audio**: "The sequel database and decarlo both work well"
**Without custom spelling**:

“The sequel database and decarlo both work well”

**With custom spelling**:
```python
config.set_custom_spelling({
"SQL": ["sequel"],
"DeCarlo": ["decarlo"]
})

Output:

"The SQL database and DeCarlo both work well"
### 4. **User Experience Pain Points**
#### A. Add Troubleshooting Section
```markdown
## Troubleshooting
### Common Issues
**Custom spelling not applied**
- ✅ Verify exact spelling matches (case-insensitive for `from`)
- ✅ Check that `to` value contains only one word
- ✅ Ensure `from` is an array format
**Unexpected results**
- ✅ Test with shorter audio clips first
- ✅ Verify JSON structure matches your language's requirements
- ✅ Check for conflicting custom spelling rules
### Debugging Tips
1. Start with simple, single-word replacements
2. Use the transcript confidence scores to verify audio quality
3. Test common variations of your target words
## Input Validation
### Valid Examples ✅
```json
{"from": ["hello world"], "to": "HelloWorld"} // Multi-word to single word
{"from": ["c e o"], "to": "CEO"} // Spelled out acronym
{"from": ["decarlo", "de carlo"], "to": "DeCarlo"} // Multiple variations
{"from": "hello", "to": "hi"} // Missing array brackets
{"from": ["hi"], "to": "hello world"} // Multi-word output
{"from": [], "to": "test"} // Empty input array
### 5. **Additional Recommendations**
#### A. Add Integration Examples
Show how custom spelling works with other features:
```markdown
## Integration with Other Features
Custom spelling can be combined with:
- Speaker diarization
- Custom vocabulary
- Punctuation formatting
- Language detection
## Performance Considerations
- Custom spelling adds minimal processing time
- Large numbers of rules (>100) may impact processing speed
- Consider batching similar audio files with shared custom spelling rules

Recommended new structure:

  1. Overview & quick examples
  2. API differences explanation
  3. Common use cases
  4. Code examples (organized by complexity)
  5. Limitations & best practices
  6. Troubleshooting
  7. Integration notes

This restructuring would significantly improve user comprehension and reduce implementation errors.