Skip to content

Feedback: audio-intelligence-pii-redaction

Original URL: https://assemblyai.com/docs/audio-intelligence/pii-redaction
Category: audio-intelligence
Generated: 05/08/2025, 4:32:59 pm


Generated: 05/08/2025, 4:32:58 pm

Technical Documentation Analysis: PII Redaction

Section titled “Technical Documentation Analysis: PII Redaction”

This documentation covers a complex feature well but has several areas for improvement in clarity, completeness, and user experience. The main issues include inconsistent terminology, missing error handling guidance, and potential user confusion around the dual polling approach for redacted audio.

Issue: No Error Handling Examples

  • Missing HTTP status codes and error responses
  • No guidance on handling API failures during polling

Recommendation:

## Error Handling
Common error responses you may encounter:
| Status Code | Error | Description |
|-------------|-------|-------------|
| 400 | Invalid PII policy | One or more PII policies in your request are invalid |
| 402 | Insufficient credits | Your account doesn't have enough credits |
| 413 | File too large | Audio file exceeds 1GB limit for redacted audio |
Example error response:
```json
{
"error": "Invalid PII policy 'invalid_policy' provided"
}
**Issue: Missing Rate Limits and Performance Info**
- No information about API rate limits
- No guidance on expected processing times
**Recommendation:**
Add a "Performance & Limits" section with:
- Rate limit information
- Expected processing times for different file sizes
- Concurrent request limits
### 2. **Unclear Explanations**
**Issue: Inconsistent Terminology**
- Uses both "Topic Detection" and "PII Redaction" in code comments
- Mixes "redact_pii" and "PII Redaction" terminology
**Recommendation:**
Standardize terminology throughout:
```python
# Enable PII Redaction by setting `redact_pii` to `True` in the JSON payload.

Issue: Confusing Dual Polling Explanation The redacted audio section doesn’t clearly explain why two separate polling operations are needed.

Recommendation: Add a clear explanation:

### Understanding the Two-Step Process
PII Redaction with audio involves two separate operations:
1. **Transcription Processing**: Creates the redacted transcript
2. **Audio Processing**: Creates the redacted audio file
You'll need to poll two different endpoints:
- `/v2/transcript/{id}` - for transcript completion
- `/v2/transcript/{id}/redacted-audio` - for redacted audio completion

Issue: Inconsistent Code Examples

  • Some examples use placeholder audio URLs, others use real ones
  • Missing practical examples of response handling

Recommendation:

# Add realistic error handling to examples
try:
transcript = aai.Transcriber().transcribe(audio_file, config)
print(f"Transcript ID: {transcript.id}")
if transcript.status == aai.TranscriptStatus.error:
print(f"Transcription failed: {transcript.error}")
else:
print(transcript.text)
except Exception as e:
print(f"Request failed: {e}")

Issue: No Real-World Use Case Examples The documentation lacks context about when to use different PII policies.

Recommendation: Add a “Common Use Cases” section:

## Common Use Cases
### Call Center Compliance
```python
# For call centers handling customer service
policies=[
aai.PIIRedactionPolicy.phone_number,
aai.PIIRedactionPolicy.email_address,
aai.PIIRedactionPolicy.credit_card_number,
aai.PIIRedactionPolicy.us_social_security_number
]
# For medical transcriptions
policies=[
aai.PIIRedactionPolicy.person_name,
aai.PIIRedactionPolicy.date_of_birth,
aai.PIIRedactionPolicy.medical_condition,
aai.PIIRedactionPolicy.healthcare_number
]
### 4. **Structure Improvements**
**Issue: Information Architecture**
The PII policies table is buried at the end but is crucial for getting started.
**Recommendation:**
- Move PII policies section up, right after the Quickstart
- Add a "Quick Reference" section at the top with essential parameters
- Create a "Getting Started Checklist"
**Issue: Missing Navigation Aids**
No clear path for users to understand prerequisites or next steps.
**Recommendation:**
Add at the beginning:
```markdown
## Before You Start
✅ **Prerequisites:**
- AssemblyAI API key ([get one here](link))
- Audio file URL or local file
- Understanding of which [PII policies](#pii-policies) you need
⏱️ **Estimated time:** 5-10 minutes
💰 **Cost:** Standard transcription rates apply

Issue: Webhook Confusion The webhook section mentions two webhook calls but doesn’t provide clear guidance.

Recommendation:

### Webhook Behavior with PII Audio Redaction
When `redact_pii_audio` is enabled, expect two webhook notifications:
1. **First webhook**: When redacted audio is ready
```json
{
"transcript_id": "abc123",
"status": "redacted_audio_ready"
}
  1. Second webhook: When transcript is complete
    {
    "transcript_id": "abc123",
    "status": "completed"
    }

Important: Wait for both webhooks before considering the job complete.

**Issue: Polling Logic Complexity**
The polling examples are repetitive and complex.
**Recommendation:**
Create a shared "Polling Best Practices" section:
```markdown
## Polling Best Practices
### Recommended Polling Strategy
- Start with 3-second intervals
- Implement exponential backoff for longer jobs
- Set a maximum timeout (e.g., 30 minutes)
- Always check for error status
### Sample Polling Function
```python
def poll_transcript(transcript_id, max_wait=1800): # 30 min timeout
start_time = time.time()
interval = 3
while time.time() - start_time < max_wait:
response = requests.get(f"{base_url}/v2/transcript/{transcript_id}",
headers=headers).json()
if response['status'] in ['completed', 'error']:
return response
time.sleep(interval)
interval = min(interval * 1.1, 10) # Exponential backoff, max 10s
raise TimeoutError("Transcript processing timed out")
### 6. **Additional Recommendations**
**Add Visual Aids:**
- Flowchart showing the PII redaction process
- Diagram explaining the two-polling-endpoint workflow
- Before/after audio waveform examples
**Improve Accessibility:**
- Add ARIA labels to code examples
- Ensure proper heading hierarchy
- Add alt text for any future diagrams
**Add Validation Guidance:**
```markdown
## Validating Your Results
### Checking Redaction Quality
- Review the redacted transcript for any missed PII
- Test with sample data containing known PII
- Verify audio redaction timing matches transcript
### Common Issues
- **Partial redaction**: Some PII types may require multiple policies
- **False positives**: Common words might be redacted (e.g., "May" as a date)
- **Context dependency**: Names used as common nouns might not be redacted

This documentation would benefit significantly from these improvements, making it more user-friendly and reducing support burden while improving the developer experience.