Feedback: guides-input-text-chapters

Documentation Feedback

Original URL: https://www.assemblyai.com/docs/guides/input-text-chapters
Category: guides
Generated: 05/08/2025, 4:40:06 pm

Claude Sonnet 4 Feedback

Generated: 05/08/2025, 4:40:05 pm

Technical Documentation Analysis & Feedback

Critical Issues Requiring Immediate Attention

1. Missing Prerequisites & Setup Information

Issue: No clear explanation of what LeMUR is or its capabilities
Fix: Add a dedicated “What is LeMUR?” section explaining its purpose, limitations, and use cases
Add: System requirements, supported audio formats, and file size limits

2. Incomplete Error Handling

The code lacks any error handling, which will frustrate users when things go wrong.

Add this enhanced version:

import assemblyai as aai
import time

aai.settings.api_key = "YOUR_API_KEY"

try:
    transcriber = aai.Transcriber()
    transcript = transcriber.transcribe(audio_url)

    # Wait for transcription to complete
    if transcript.status == aai.TranscriptStatus.error:
        print(f"Transcription failed: {transcript.error}")
        exit(1)

except Exception as e:
    print(f"Error during transcription: {e}")
    exit(1)

3. Confusing Code Logic

The paragraph combining logic is unnecessarily complex and poorly explained.

Current problematic code:

step = 2  # Adjust as needed if you want combined paragraphs to be shorter or longer in length.

Better approach with clear explanation:

# Configuration: Combine every 2 paragraphs into one chapter
# Increase for longer chapters, decrease for shorter ones
PARAGRAPHS_PER_CHAPTER = 2

def create_chapters(paragraphs, paragraphs_per_chapter=2):
    """
    Combine paragraphs into logical chapters for better summarization.

    Args:
        paragraphs: List of transcript paragraphs
        paragraphs_per_chapter: Number of paragraphs to combine (default: 2)

    Returns:
        List of formatted chapter strings with timestamps
    """
    chapters = []
    for i in range(0, len(paragraphs), paragraphs_per_chapter):
        chapter_paragraphs = paragraphs[i:i + paragraphs_per_chapter]
        # Extract timing information
        start_time = chapter_paragraphs[0].start
        end_time = chapter_paragraphs[-1].end

        # Combine text content
        combined_text = " ".join(p.text for p in chapter_paragraphs)

        chapters.append({
            'text': combined_text,
            'start': start_time,
            'end': end_time,
            'formatted': f"Content: {combined_text}\nStart: {start_time}ms\nEnd: {end_time}ms"
        })

    return chapters

Structural Improvements

4. Add Missing Sections

A. Parameters Reference Table:

## LeMUR Task Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `prompt` | string | Yes | Instructions for the AI model |
| `input_text` | string | Yes* | Custom text input (*when not using transcript_ids) |
| `final_model` | LemurModel | No | AI model to use (default: claude3_5_sonnet) |
| `max_output_size` | int | No | Maximum response length (default: 2000) |
| `temperature` | float | No | Response creativity (0.0-1.0, default: 0.0) |

B. When to Use This Approach:

## When to Use input_text vs transcript_ids

### Use `input_text` when:
- ✅ You need to process edited transcripts
- ✅ Working with speaker-labeled content
- ✅ Combining multiple transcript sources
- ✅ Adding custom formatting or metadata

### Use `transcript_ids` when:
- ✅ Processing unmodified AssemblyAI transcripts
- ✅ Working with single audio files
- ✅ Simpler implementation requirements

5. Improve Examples

Add a realistic, complete example:

"""
Complete example: Create chapter summaries from a podcast transcript
"""
import assemblyai as aai
from typing import List, Dict

def create_chapter_summaries(audio_url: str, api_key: str) -> List[Dict]:
    """
    Process audio file and create chapter summaries using LeMUR.

    Returns:
        List of dictionaries containing chapter summaries and timestamps
    """
    aai.settings.api_key = api_key

    # Step 1: Transcribe audio
    print("Transcribing audio...")
    transcriber = aai.Transcriber()
    transcript = transcriber.transcribe(audio_url)

    if transcript.status == aai.TranscriptStatus.error:
        raise Exception(f"Transcription failed: {transcript.error}")

    # Step 2: Process paragraphs into chapters
    print("Creating chapters...")
    paragraphs = transcript.get_paragraphs()
    chapters = create_chapters(paragraphs, paragraphs_per_chapter=3)

    # Step 3: Generate summaries
    print("Generating summaries...")
    summaries = []

    for i, chapter in enumerate(chapters):
        try:
            result = aai.Lemur().task(
                prompt="""Summarize this chapter in 2-3 sentences.
                Focus on the main topics discussed.
                Format your response as:
                Title: [Brief chapter title]
                Summary: [Your summary]""",
                input_text=chapter['formatted'],
                final_model=aai.LemurModel.claude3_5_sonnet,
            )

            summaries.append({
                'chapter_number': i + 1,
                'start_time_ms': chapter['start'],
                'end_time_ms': chapter['end'],
                'summary': result.response
            })

        except Exception as e:
            print(f"Error processing chapter {i+1}: {e}")
            continue

    return summaries

# Usage example
if __name__ == "__main__":
    audio_url = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
    summaries = create_chapter_summaries(audio_url, "YOUR_API_KEY")

    for summary in summaries:
        print(f"\n--- Chapter {summary['chapter_number']} ---")
        print(f"Time: {summary['start_time_ms']}ms - {summary['end_time_ms']}ms")
        print(summary['summary'])

User Experience Improvements

6. Add Troubleshooting Section

## Common Issues & Solutions

### "Authentication failed"
- Verify your API key is correct
- Ensure you have LeMUR access (paid plan required)

### "Transcription taking too long"
- Large files may take several minutes
- Check transcript.status in a loop with delays

### "Empty or poor summaries"
- Try adjusting paragraphs_per_chapter (2-5 works well)
- Improve your prompt with more specific instructions
- Consider using temperature parameter for more creative responses

### "Rate limiting errors"
- Add delays between LeMUR calls: `time.sleep(1)`
- Process chapters in smaller batches

7. Performance & Cost Optimization

## Performance Tips

### Optimize Costs:
- Combine paragraphs appropriately (2-4 paragraphs per chapter)
- Use specific prompts to get concise responses
- Set `max_output_size` to limit response length

### Improve Processing Speed:
- Process chapters in parallel (with rate limiting)
- Use webhook notifications for long transcriptions
- Cache transcription results to avoid re-processing

8. Better Output Formatting

Replace the basic print statements with:

def format_chapter_output(summaries: List[Dict]) -> str:
    """Format chapter summaries for display

---