Feedback: guides-input-text-chapters
Documentation Feedback
Section titled “Documentation Feedback”Original URL: https://www.assemblyai.com/docs/guides/input-text-chapters
Category: guides
Generated: 05/08/2025, 4:40:06 pm
Claude Sonnet 4 Feedback
Section titled “Claude Sonnet 4 Feedback”Generated: 05/08/2025, 4:40:05 pm
Technical Documentation Analysis & Feedback
Section titled “Technical Documentation Analysis & Feedback”Critical Issues Requiring Immediate Attention
Section titled “Critical Issues Requiring Immediate Attention”1. Missing Prerequisites & Setup Information
Section titled “1. Missing Prerequisites & Setup Information”- Issue: No clear explanation of what LeMUR is or its capabilities
- Fix: Add a dedicated “What is LeMUR?” section explaining its purpose, limitations, and use cases
- Add: System requirements, supported audio formats, and file size limits
2. Incomplete Error Handling
Section titled “2. Incomplete Error Handling”The code lacks any error handling, which will frustrate users when things go wrong.
Add this enhanced version:
import assemblyai as aaiimport time
aai.settings.api_key = "YOUR_API_KEY"
try: transcriber = aai.Transcriber() transcript = transcriber.transcribe(audio_url)
# Wait for transcription to complete if transcript.status == aai.TranscriptStatus.error: print(f"Transcription failed: {transcript.error}") exit(1)
except Exception as e: print(f"Error during transcription: {e}") exit(1)3. Confusing Code Logic
Section titled “3. Confusing Code Logic”The paragraph combining logic is unnecessarily complex and poorly explained.
Current problematic code:
step = 2 # Adjust as needed if you want combined paragraphs to be shorter or longer in length.Better approach with clear explanation:
# Configuration: Combine every 2 paragraphs into one chapter# Increase for longer chapters, decrease for shorter onesPARAGRAPHS_PER_CHAPTER = 2
def create_chapters(paragraphs, paragraphs_per_chapter=2): """ Combine paragraphs into logical chapters for better summarization.
Args: paragraphs: List of transcript paragraphs paragraphs_per_chapter: Number of paragraphs to combine (default: 2)
Returns: List of formatted chapter strings with timestamps """ chapters = [] for i in range(0, len(paragraphs), paragraphs_per_chapter): chapter_paragraphs = paragraphs[i:i + paragraphs_per_chapter] # Extract timing information start_time = chapter_paragraphs[0].start end_time = chapter_paragraphs[-1].end
# Combine text content combined_text = " ".join(p.text for p in chapter_paragraphs)
chapters.append({ 'text': combined_text, 'start': start_time, 'end': end_time, 'formatted': f"Content: {combined_text}\nStart: {start_time}ms\nEnd: {end_time}ms" })
return chaptersStructural Improvements
Section titled “Structural Improvements”4. Add Missing Sections
Section titled “4. Add Missing Sections”A. Parameters Reference Table:
## LeMUR Task Parameters
| Parameter | Type | Required | Description ||-----------|------|----------|-------------|| `prompt` | string | Yes | Instructions for the AI model || `input_text` | string | Yes* | Custom text input (*when not using transcript_ids) || `final_model` | LemurModel | No | AI model to use (default: claude3_5_sonnet) || `max_output_size` | int | No | Maximum response length (default: 2000) || `temperature` | float | No | Response creativity (0.0-1.0, default: 0.0) |B. When to Use This Approach:
## When to Use input_text vs transcript_ids
### Use `input_text` when:- ✅ You need to process edited transcripts- ✅ Working with speaker-labeled content- ✅ Combining multiple transcript sources- ✅ Adding custom formatting or metadata
### Use `transcript_ids` when:- ✅ Processing unmodified AssemblyAI transcripts- ✅ Working with single audio files- ✅ Simpler implementation requirements5. Improve Examples
Section titled “5. Improve Examples”Add a realistic, complete example:
"""Complete example: Create chapter summaries from a podcast transcript"""import assemblyai as aaifrom typing import List, Dict
def create_chapter_summaries(audio_url: str, api_key: str) -> List[Dict]: """ Process audio file and create chapter summaries using LeMUR.
Returns: List of dictionaries containing chapter summaries and timestamps """ aai.settings.api_key = api_key
# Step 1: Transcribe audio print("Transcribing audio...") transcriber = aai.Transcriber() transcript = transcriber.transcribe(audio_url)
if transcript.status == aai.TranscriptStatus.error: raise Exception(f"Transcription failed: {transcript.error}")
# Step 2: Process paragraphs into chapters print("Creating chapters...") paragraphs = transcript.get_paragraphs() chapters = create_chapters(paragraphs, paragraphs_per_chapter=3)
# Step 3: Generate summaries print("Generating summaries...") summaries = []
for i, chapter in enumerate(chapters): try: result = aai.Lemur().task( prompt="""Summarize this chapter in 2-3 sentences. Focus on the main topics discussed. Format your response as: Title: [Brief chapter title] Summary: [Your summary]""", input_text=chapter['formatted'], final_model=aai.LemurModel.claude3_5_sonnet, )
summaries.append({ 'chapter_number': i + 1, 'start_time_ms': chapter['start'], 'end_time_ms': chapter['end'], 'summary': result.response })
except Exception as e: print(f"Error processing chapter {i+1}: {e}") continue
return summaries
# Usage exampleif __name__ == "__main__": audio_url = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3" summaries = create_chapter_summaries(audio_url, "YOUR_API_KEY")
for summary in summaries: print(f"\n--- Chapter {summary['chapter_number']} ---") print(f"Time: {summary['start_time_ms']}ms - {summary['end_time_ms']}ms") print(summary['summary'])User Experience Improvements
Section titled “User Experience Improvements”6. Add Troubleshooting Section
Section titled “6. Add Troubleshooting Section”## Common Issues & Solutions
### "Authentication failed"- Verify your API key is correct- Ensure you have LeMUR access (paid plan required)
### "Transcription taking too long"- Large files may take several minutes- Check transcript.status in a loop with delays
### "Empty or poor summaries"- Try adjusting paragraphs_per_chapter (2-5 works well)- Improve your prompt with more specific instructions- Consider using temperature parameter for more creative responses
### "Rate limiting errors"- Add delays between LeMUR calls: `time.sleep(1)`- Process chapters in smaller batches7. Performance & Cost Optimization
Section titled “7. Performance & Cost Optimization”## Performance Tips
### Optimize Costs:- Combine paragraphs appropriately (2-4 paragraphs per chapter)- Use specific prompts to get concise responses- Set `max_output_size` to limit response length
### Improve Processing Speed:- Process chapters in parallel (with rate limiting)- Use webhook notifications for long transcriptions- Cache transcription results to avoid re-processing8. Better Output Formatting
Section titled “8. Better Output Formatting”Replace the basic print statements with:
def format_chapter_output(summaries: List[Dict]) -> str: """Format chapter summaries for display
---