Feedback: audio-intelligence-pii-redaction

Documentation Feedback

Original URL: https://assemblyai.com/docs/audio-intelligence/pii-redaction
Category: audio-intelligence
Generated: 05/08/2025, 4:32:59 pm

Claude Sonnet 4 Feedback

Generated: 05/08/2025, 4:32:58 pm

Technical Documentation Analysis: PII Redaction

Executive Summary

This documentation covers a complex feature well but has several areas for improvement in clarity, completeness, and user experience. The main issues include inconsistent terminology, missing error handling guidance, and potential user confusion around the dual polling approach for redacted audio.

Specific Issues & Recommendations

1. Critical Missing Information

Issue: No Error Handling Examples

Missing HTTP status codes and error responses
No guidance on handling API failures during polling

Recommendation:

## Error Handling

Common error responses you may encounter:

| Status Code | Error | Description |
|-------------|-------|-------------|
| 400 | Invalid PII policy | One or more PII policies in your request are invalid |
| 402 | Insufficient credits | Your account doesn't have enough credits |
| 413 | File too large | Audio file exceeds 1GB limit for redacted audio |

Example error response:
```json
{
  "error": "Invalid PII policy 'invalid_policy' provided"
}

**Issue: Missing Rate Limits and Performance Info**
- No information about API rate limits
- No guidance on expected processing times

**Recommendation:**
Add a "Performance & Limits" section with:
- Rate limit information
- Expected processing times for different file sizes
- Concurrent request limits

### 2. **Unclear Explanations**

**Issue: Inconsistent Terminology**
- Uses both "Topic Detection" and "PII Redaction" in code comments
- Mixes "redact_pii" and "PII Redaction" terminology

**Recommendation:**
Standardize terminology throughout:
```python
# Enable PII Redaction by setting `redact_pii` to `True` in the JSON payload.

Issue: Confusing Dual Polling Explanation The redacted audio section doesn’t clearly explain why two separate polling operations are needed.

Recommendation: Add a clear explanation:

### Understanding the Two-Step Process

PII Redaction with audio involves two separate operations:
1. **Transcription Processing**: Creates the redacted transcript
2. **Audio Processing**: Creates the redacted audio file

You'll need to poll two different endpoints:
- `/v2/transcript/{id}` - for transcript completion
- `/v2/transcript/{id}/redacted-audio` - for redacted audio completion

3. Improved Examples Needed

Issue: Inconsistent Code Examples

Some examples use placeholder audio URLs, others use real ones
Missing practical examples of response handling

Recommendation:

# Add realistic error handling to examples
try:
    transcript = aai.Transcriber().transcribe(audio_file, config)
    print(f"Transcript ID: {transcript.id}")

    if transcript.status == aai.TranscriptStatus.error:
        print(f"Transcription failed: {transcript.error}")
    else:
        print(transcript.text)

except Exception as e:
    print(f"Request failed: {e}")

Issue: No Real-World Use Case Examples The documentation lacks context about when to use different PII policies.

Recommendation: Add a “Common Use Cases” section:

## Common Use Cases

### Call Center Compliance
```python
# For call centers handling customer service
policies=[
    aai.PIIRedactionPolicy.phone_number,
    aai.PIIRedactionPolicy.email_address,
    aai.PIIRedactionPolicy.credit_card_number,
    aai.PIIRedactionPolicy.us_social_security_number
]

Healthcare Applications

# For medical transcriptions
policies=[
    aai.PIIRedactionPolicy.person_name,
    aai.PIIRedactionPolicy.date_of_birth,
    aai.PIIRedactionPolicy.medical_condition,
    aai.PIIRedactionPolicy.healthcare_number
]

### 4. **Structure Improvements**

**Issue: Information Architecture**
The PII policies table is buried at the end but is crucial for getting started.

**Recommendation:**
- Move PII policies section up, right after the Quickstart
- Add a "Quick Reference" section at the top with essential parameters
- Create a "Getting Started Checklist"

**Issue: Missing Navigation Aids**
No clear path for users to understand prerequisites or next steps.

**Recommendation:**
Add at the beginning:
```markdown
## Before You Start

✅ **Prerequisites:**
- AssemblyAI API key ([get one here](link))
- Audio file URL or local file
- Understanding of which [PII policies](#pii-policies) you need

⏱️ **Estimated time:** 5-10 minutes
💰 **Cost:** Standard transcription rates apply

5. User Pain Points

Issue: Webhook Confusion The webhook section mentions two webhook calls but doesn’t provide clear guidance.

Recommendation:

### Webhook Behavior with PII Audio Redaction

When `redact_pii_audio` is enabled, expect two webhook notifications:

1. **First webhook**: When redacted audio is ready
   ```json
   {
     "transcript_id": "abc123",
     "status": "redacted_audio_ready"
   }

Second webhook: When transcript is complete

{
  "transcript_id": "abc123",
  "status": "completed"
}

Important: Wait for both webhooks before considering the job complete.

**Issue: Polling Logic Complexity**
The polling examples are repetitive and complex.

**Recommendation:**
Create a shared "Polling Best Practices" section:
```markdown
## Polling Best Practices

### Recommended Polling Strategy
- Start with 3-second intervals
- Implement exponential backoff for longer jobs
- Set a maximum timeout (e.g., 30 minutes)
- Always check for error status

### Sample Polling Function
```python
def poll_transcript(transcript_id, max_wait=1800):  # 30 min timeout
    start_time = time.time()
    interval = 3

    while time.time() - start_time < max_wait:
        response = requests.get(f"{base_url}/v2/transcript/{transcript_id}",
                              headers=headers).json()

        if response['status'] in ['completed', 'error']:
            return response

        time.sleep(interval)
        interval = min(interval * 1.1, 10)  # Exponential backoff, max 10s

    raise TimeoutError("Transcript processing timed out")

### 6. **Additional Recommendations**

**Add Visual Aids:**
- Flowchart showing the PII redaction process
- Diagram explaining the two-polling-endpoint workflow
- Before/after audio waveform examples

**Improve Accessibility:**
- Add ARIA labels to code examples
- Ensure proper heading hierarchy
- Add alt text for any future diagrams

**Add Validation Guidance:**
```markdown
## Validating Your Results

### Checking Redaction Quality
- Review the redacted transcript for any missed PII
- Test with sample data containing known PII
- Verify audio redaction timing matches transcript

### Common Issues
- **Partial redaction**: Some PII types may require multiple policies
- **False positives**: Common words might be redacted (e.g., "May" as a date)
- **Context dependency**: Names used as common nouns might not be redacted

This documentation would benefit significantly from these improvements, making it more user-friendly and reducing support burden while improving the developer experience.