Skip to content

Feedback: speech-to-text-pre-recorded-audio-profanity-filtering

Original URL: https://www.assemblyai.com/docs/speech-to-text/pre-recorded-audio/profanity-filtering
Category: speech-to-text
Generated: 05/08/2025, 4:24:52 pm


Generated: 05/08/2025, 4:24:51 pm

Technical Documentation Analysis: Profanity Filtering

Section titled “Technical Documentation Analysis: Profanity Filtering”

This documentation covers the basic functionality but lacks depth and user guidance. While the code examples are comprehensive, the explanatory content is minimal and leaves users with many unanswered questions.

What’s Missing:

  • No definition of what constitutes “profanity” - Users need to understand what words/categories are filtered
  • No examples of filtered output - Show before/after examples
  • No performance impact information - Does filtering affect processing time or accuracy?
  • No pricing implications - Is this feature included in all plans?
  • No API response format details - How does the response differ when filtering is enabled?

Recommended Addition:

## What Gets Filtered
Profanity filtering detects and replaces offensive language including:
- Strong profanity and vulgar language
- Sexual content and explicit terms
- Religious profanity and blasphemy
- Discriminatory slurs and hate speech
### Example Output
**Original audio:** "This is so damn frustrating, what the hell!"
**Filtered result:** "This is so **** frustrating, what the ****!"

Current Issues:

  • “Any profanity in the returned text will be replaced with asterisks” - How many asterisks? Does it preserve word length?
  • The disclaimer about imperfection needs more context about accuracy rates and common edge cases

Improved Explanation:

## How Filtering Works
When `filter_profanity` is set to `true`:
1. Profane words are replaced with asterisks (`*`) matching the original word length
2. The filtering preserves sentence structure and timing information
3. Word boundaries and punctuation remain intact
**Accuracy:** The filter catches approximately 95% of common profanity but may miss:
- Creative spellings or deliberate misspellings
- Context-dependent offensive language
- Newly coined offensive terms

Current Structure Issues:

  • Language support is buried in an accordion
  • No clear sections for different use cases
  • Code examples dominate without sufficient explanation

Recommended Structure:

# Profanity Filtering
## Overview
Brief explanation of the feature and its use cases
## Supported Languages
[Move out of accordion for better visibility]
## Quick Start
Simple example with explanation
## Configuration Options
Detailed parameter information
## Response Format
How filtered responses differ
## Code Examples
[Current comprehensive examples]
## Limitations and Best Practices
## Troubleshooting

Current Example Issues:

  • No output examples showing actual filtered results
  • No comparison between filtered and unfiltered responses
  • Missing real-world use case examples

Recommended Examples:

## Response Examples
### Unfiltered Response
```json
{
"text": "This damn project is a complete shitshow",
"filter_profanity": false
}
{
"text": "This **** project is a complete ********",
"filter_profanity": true
}

Content Moderation for Family-Friendly Apps

Section titled “Content Moderation for Family-Friendly Apps”
config = aai.TranscriptionConfig(
filter_profanity=True,
language_code="en_us"
)
# Perfect for educational content, children's apps
# Ensure professional presentation of meeting notes
config = aai.TranscriptionConfig(filter_profanity=True)
### 5. User Pain Points to Address
**Identified Pain Points:**
1. **No guidance on when to use this feature** - Add use case scenarios
2. **No troubleshooting information** - What if filtering is too aggressive or not aggressive enough?
3. **No integration with other features** - How does this work with speaker detection, timestamps, etc.?
4. **No validation guidance** - How to verify the feature is working correctly
**Solutions:**
```markdown
## When to Use Profanity Filtering
### Recommended For:
- Educational content platforms
- Family-friendly applications
- Corporate environments
- Content requiring compliance with broadcasting standards
### Not Recommended For:
- Legal transcriptions requiring verbatim accuracy
- Creative content where original language is important
- Academic research on language patterns
## Troubleshooting
### Filter Too Aggressive?
The filter may occasionally flag non-profane words. This typically happens with:
- Proper nouns that resemble profanity
- Technical terms with similar phonetics
- Words in different languages
### Filter Missing Words?
- Check if the word is in a supported language
- Verify audio quality - unclear audio may not be filtered accurately
- Report persistent issues to support for filter improvements
## Integration Notes
Profanity filtering works alongside all other features:
- Timestamps remain accurate for filtered words
- Speaker labels are preserved
- Confidence scores reflect the filtered text

Missing Technical Details:

  • No mention of case sensitivity in filtering
  • No information about how filtering affects confidence scores
  • No details about filtering in different audio qualities

Recommended Additions:

## Technical Details
- **Case Handling:** Filtering preserves original capitalization patterns
- **Confidence Scores:** Remain based on the original detected word
- **Audio Quality Impact:** Lower quality audio may result in less accurate filtering
- **Processing Time:** Adds minimal overhead (<1% increase in processing time)
  1. Add comprehensive explanation of what constitutes profanity and filtering accuracy
  2. Include before/after examples showing actual filtered output
  3. Restructure content to improve information hierarchy
  4. Add use case guidance and best practices
  5. Include troubleshooting section for common issues
  6. Expand technical details about integration and performance
  7. Move language support out of accordion for better visibility
  8. Add validation examples showing how to verify filtering is working

These improvements would transform this from a basic feature reference into a comprehensive guide that helps users understand when, why, and how to implement profanity filtering effectively.