Skip to content

Feedback: audio-intelligence-key-phrases

Original URL: https://www.assemblyai.com/docs/audio-intelligence/key-phrases
Category: audio-intelligence
Generated: 05/08/2025, 4:32:55 pm


Generated: 05/08/2025, 4:32:54 pm

Technical Documentation Analysis: Key Phrases Feature

Section titled “Technical Documentation Analysis: Key Phrases Feature”

This documentation provides a comprehensive overview of the Key Phrases feature with good code examples across multiple languages. However, there are several areas for improvement in clarity, structure, and user experience.

Critical Gaps:

  • No explanation of what constitutes a “key phrase” - Users don’t understand the selection criteria beyond a vague “significant words and phrases”
  • Missing pricing information - No mention of costs or usage limits
  • No performance metrics - No information about processing time, accuracy rates, or quality expectations
  • Absent error handling examples - Code shows happy path only
  • No rate limiting information - Missing API usage constraints

Recommended Additions:

## Understanding Key Phrases
Key phrases are extracted based on:
- **Frequency**: How often terms appear in the audio
- **Relevance**: Contextual importance within the content
- **Uniqueness**: Terms that distinguish this content from general speech
- **Length**: Multi-word phrases that convey complete concepts
## Pricing & Limits
- Key Phrases detection costs $X per hour of audio
- Maximum audio file size: X MB
- Processing time: Typically 10-25% of audio duration

Current Issues:

  • The rank score explanation is vague (“greater number means more relevant”)
  • Timestamp units are not specified (milliseconds assumed but not stated)
  • The relationship between count and rank is unclear

Improved Explanations:

### Understanding the Response Fields
| Field | Description | Example |
|-------|-------------|---------|
| `rank` | Relevance score from 0.0-1.0, where higher values indicate greater importance to the overall content | 0.08 = moderately important |
| `count` | Total occurrences of this phrase in the audio | 3 = phrase appears 3 times |
| `timestamps` | Start/end times in milliseconds where phrase occurs | `start: 3978` = 3.978 seconds into audio |

Current Limitations:

  • Only one audio sample (wildfires.mp3) used across all examples
  • No explanation of what makes a good vs. poor candidate for key phrase extraction
  • Missing real-world use cases

Recommended Improvements:

## Use Cases & Examples
### Meeting Analysis
**Input**: Team standup recording
**Expected Key Phrases**: "sprint goals", "blockers", "deadline", "action items"
### Podcast Content
**Input**: Interview about climate change
**Expected Key Phrases**: "carbon emissions", "renewable energy", "policy changes"
### Customer Support
**Input**: Support call recording
**Expected Key Phrases**: "billing issue", "account access", "refund request"
## Sample Audio Characteristics
**Best Results:**
- Clear speech with minimal background noise
- 2+ minutes of content for meaningful phrase extraction
- Structured content (presentations, interviews, meetings)
**Challenging Content:**
- Casual conversations with frequent topic changes
- Heavy accents or poor audio quality
- Very short audio clips (<30 seconds)

Current Structure Issues:

  • FAQ section is too buried and hard to scan
  • API reference comes after code examples but would be more logical before
  • No clear progression from basic to advanced usage

Recommended Restructure:

# Key Phrases
## Overview
[Brief description and use cases]
## Supported Languages
[Current accordion content]
## How It Works
[Technical explanation of the algorithm]
## API Reference
[Move this section up, before examples]
## Quick Start Guide
[Simplified first example]
## Complete Examples
[Current detailed examples]
## Understanding Results
[Detailed explanation of response format]
## Best Practices
[Optimization tips]
## Troubleshooting
[Common issues and solutions]
## FAQ
[Reorganized with better categorization]

Identified Issues:

a) Configuration Confusion:

# Current - unclear parameter name
config = aai.TranscriptionConfig(auto_highlights=True)
# Suggest adding clarity in docs:
# Note: auto_highlights=True enables Key Phrases extraction
# This parameter name is maintained for backward compatibility

b) Missing Error Handling:

# Add to all examples
try:
transcript = aai.Transcriber().transcribe(audio_file, config)
if transcript.error:
print(f"Transcription failed: {transcript.error}")
return
if not transcript.auto_highlights:
print("Key phrases extraction failed or returned no results")
return
except Exception as e:
print(f"API error: {e}")

c) Result Interpretation:

## Interpreting Results
### Rank Scores
- **0.08-1.0**: Highly relevant phrases, likely central themes
- **0.05-0.07**: Moderately relevant, supporting concepts
- **0.01-0.04**: Lower relevance, may be noise or peripheral topics
### When You Get Few/No Results
- Audio may be too short (try 2+ minutes)
- Content may be too conversational or unstructured
- Audio quality issues may affect transcription accuracy

d) Performance Expectations:

## What to Expect
### Processing Time
- Typically 15-30% of your audio duration
- Example: 10-minute audio = ~2-4 minutes processing
### Result Volume
- Usually 10-50 key phrases per 30 minutes of audio
- Highly dependent on content structure and topic diversity
  1. Immediate (High Impact, Low Effort):

    • Add timestamp unit clarification
    • Include error handling in code examples
    • Explain rank score ranges
  2. Short-term (High Impact, Medium Effort):

    • Add “Understanding Results” section
    • Reorganize FAQ with better categories
    • Include multiple audio example types
  3. Long-term (High Impact, High Effort):

    • Restructure entire document flow
    • Add interactive examples
    • Create separate advanced usage guide

This analysis should help create more user-friendly documentation that reduces confusion and improves the developer experience.