Feedback: audio-intelligence-pii-redaction
Documentation Feedback
Section titled “Documentation Feedback”Original URL: https://assemblyai.com/docs/audio-intelligence/pii-redaction
Category: audio-intelligence
Generated: 05/08/2025, 4:32:59 pm
Claude Sonnet 4 Feedback
Section titled “Claude Sonnet 4 Feedback”Generated: 05/08/2025, 4:32:58 pm
Technical Documentation Analysis: PII Redaction
Section titled “Technical Documentation Analysis: PII Redaction”Executive Summary
Section titled “Executive Summary”This documentation covers a complex feature well but has several areas for improvement in clarity, completeness, and user experience. The main issues include inconsistent terminology, missing error handling guidance, and potential user confusion around the dual polling approach for redacted audio.
Specific Issues & Recommendations
Section titled “Specific Issues & Recommendations”1. Critical Missing Information
Section titled “1. Critical Missing Information”Issue: No Error Handling Examples
- Missing HTTP status codes and error responses
- No guidance on handling API failures during polling
Recommendation:
## Error Handling
Common error responses you may encounter:
| Status Code | Error | Description ||-------------|-------|-------------|| 400 | Invalid PII policy | One or more PII policies in your request are invalid || 402 | Insufficient credits | Your account doesn't have enough credits || 413 | File too large | Audio file exceeds 1GB limit for redacted audio |
Example error response:```json{ "error": "Invalid PII policy 'invalid_policy' provided"}**Issue: Missing Rate Limits and Performance Info**- No information about API rate limits- No guidance on expected processing times
**Recommendation:**Add a "Performance & Limits" section with:- Rate limit information- Expected processing times for different file sizes- Concurrent request limits
### 2. **Unclear Explanations**
**Issue: Inconsistent Terminology**- Uses both "Topic Detection" and "PII Redaction" in code comments- Mixes "redact_pii" and "PII Redaction" terminology
**Recommendation:**Standardize terminology throughout:```python# Enable PII Redaction by setting `redact_pii` to `True` in the JSON payload.Issue: Confusing Dual Polling Explanation The redacted audio section doesn’t clearly explain why two separate polling operations are needed.
Recommendation: Add a clear explanation:
### Understanding the Two-Step Process
PII Redaction with audio involves two separate operations:1. **Transcription Processing**: Creates the redacted transcript2. **Audio Processing**: Creates the redacted audio file
You'll need to poll two different endpoints:- `/v2/transcript/{id}` - for transcript completion- `/v2/transcript/{id}/redacted-audio` - for redacted audio completion3. Improved Examples Needed
Section titled “3. Improved Examples Needed”Issue: Inconsistent Code Examples
- Some examples use placeholder audio URLs, others use real ones
- Missing practical examples of response handling
Recommendation:
# Add realistic error handling to examplestry: transcript = aai.Transcriber().transcribe(audio_file, config) print(f"Transcript ID: {transcript.id}")
if transcript.status == aai.TranscriptStatus.error: print(f"Transcription failed: {transcript.error}") else: print(transcript.text)
except Exception as e: print(f"Request failed: {e}")Issue: No Real-World Use Case Examples The documentation lacks context about when to use different PII policies.
Recommendation: Add a “Common Use Cases” section:
## Common Use Cases
### Call Center Compliance```python# For call centers handling customer servicepolicies=[ aai.PIIRedactionPolicy.phone_number, aai.PIIRedactionPolicy.email_address, aai.PIIRedactionPolicy.credit_card_number, aai.PIIRedactionPolicy.us_social_security_number]Healthcare Applications
Section titled “Healthcare Applications”# For medical transcriptionspolicies=[ aai.PIIRedactionPolicy.person_name, aai.PIIRedactionPolicy.date_of_birth, aai.PIIRedactionPolicy.medical_condition, aai.PIIRedactionPolicy.healthcare_number]### 4. **Structure Improvements**
**Issue: Information Architecture**The PII policies table is buried at the end but is crucial for getting started.
**Recommendation:**- Move PII policies section up, right after the Quickstart- Add a "Quick Reference" section at the top with essential parameters- Create a "Getting Started Checklist"
**Issue: Missing Navigation Aids**No clear path for users to understand prerequisites or next steps.
**Recommendation:**Add at the beginning:```markdown## Before You Start
✅ **Prerequisites:**- AssemblyAI API key ([get one here](link))- Audio file URL or local file- Understanding of which [PII policies](#pii-policies) you need
⏱️ **Estimated time:** 5-10 minutes💰 **Cost:** Standard transcription rates apply5. User Pain Points
Section titled “5. User Pain Points”Issue: Webhook Confusion The webhook section mentions two webhook calls but doesn’t provide clear guidance.
Recommendation:
### Webhook Behavior with PII Audio Redaction
When `redact_pii_audio` is enabled, expect two webhook notifications:
1. **First webhook**: When redacted audio is ready ```json { "transcript_id": "abc123", "status": "redacted_audio_ready" }- Second webhook: When transcript is complete
{"transcript_id": "abc123","status": "completed"}
Important: Wait for both webhooks before considering the job complete.
**Issue: Polling Logic Complexity**The polling examples are repetitive and complex.
**Recommendation:**Create a shared "Polling Best Practices" section:```markdown## Polling Best Practices
### Recommended Polling Strategy- Start with 3-second intervals- Implement exponential backoff for longer jobs- Set a maximum timeout (e.g., 30 minutes)- Always check for error status
### Sample Polling Function```pythondef poll_transcript(transcript_id, max_wait=1800): # 30 min timeout start_time = time.time() interval = 3
while time.time() - start_time < max_wait: response = requests.get(f"{base_url}/v2/transcript/{transcript_id}", headers=headers).json()
if response['status'] in ['completed', 'error']: return response
time.sleep(interval) interval = min(interval * 1.1, 10) # Exponential backoff, max 10s
raise TimeoutError("Transcript processing timed out")### 6. **Additional Recommendations**
**Add Visual Aids:**- Flowchart showing the PII redaction process- Diagram explaining the two-polling-endpoint workflow- Before/after audio waveform examples
**Improve Accessibility:**- Add ARIA labels to code examples- Ensure proper heading hierarchy- Add alt text for any future diagrams
**Add Validation Guidance:**```markdown## Validating Your Results
### Checking Redaction Quality- Review the redacted transcript for any missed PII- Test with sample data containing known PII- Verify audio redaction timing matches transcript
### Common Issues- **Partial redaction**: Some PII types may require multiple policies- **False positives**: Common words might be redacted (e.g., "May" as a date)- **Context dependency**: Names used as common nouns might not be redactedThis documentation would benefit significantly from these improvements, making it more user-friendly and reducing support burden while improving the developer experience.