How Sentiment Analysis is Calculated in Vanilla Forums
Overview
Vanilla Forums uses advanced AI-powered sentiment analysis to automatically evaluate the emotional tone of user-generated content. This system helps community managers understand the overall health of conversations and identify potentially problematic content before it impacts the community.
Technical Architecture
Core Components
The sentiment analysis system is built around several key components:
- OpenAI Integration: Uses GPT-4o Mini model for AI-powered sentiment analysis
- Sentiment Score Scale: 0-100 numeric scale with predefined ranges
- Keyword Tracking: Identifies and scores specific terms within content
- Multi-document Processing: Handles long posts by splitting and aggregating results
Sentiment Score Ranges
The system categorizes sentiment into five distinct ranges:
- Strongly Negative (0-20): Highly negative content
- Negative (21-40): Generally negative sentiment
- Balanced (41-60): Neutral or mixed sentiment
- Positive (61-80): Generally positive sentiment
- Strongly Positive (81-100): Highly positive content
Post-Level Sentiment Calculation
Processing Flow
- User Consent Check: System first verifies the user has opted into sentiment analysis via cookie preferences
- Content Preparation: Post content (title + body for discussions, body for comments) is converted to plain text
- Document Normalization: Large posts exceeding 5,000 characters are intelligently split into smaller chunks
- AI Analysis: Each chunk is sent to OpenAI's GPT-4o Mini with a specialized sentiment analysis prompt
- Result Aggregation: Multiple chunks are combined using weighted averaging
Code Example
Here's how multi-document sentiment aggregation works:
protected function aggregateMultiDocumentsSentiment(array $documents): array
{
$globalSentiment = 0;
$terms = [];
foreach ($documents as $document) {
$globalSentiment += $document["globalSentiment"];
foreach ($document["terms"] as $term) {
if (isset($terms[$term["term"]])) {
$terms[$term["term"]]["sentiment"] += $term["sentiment"];
$terms[$term["term"]]["occurrences"] += $term["occurrences"];
$terms[$term["term"]]["divisor"] += 1;
} else {
$terms[$term["term"]] = $term;
$terms[$term["term"]]["divisor"] = 1;
}
}
}
$globalSentiment = $globalSentiment / count($documents);
foreach ($terms as $key => $term) {
$terms[$key]["sentiment"] = ceil($term["sentiment"] / $term["divisor"]);
unset($terms[$key]["divisor"]);
}
return ["globalSentiment" => $globalSentiment, "terms" => $terms];
}
Storage and Tracking
Sentiment data is stored in multiple locations:
- Post Records: Global sentiment score stored in the main Discussion/Comment
Sentiment field
- Attributes Column: Serialized sentiment data in post
Attributes
- Keyword Sentiment Table: Detailed keyword-level sentiment tracking linked to users
Individual User Sentiment
Aggregation Method
Individual user sentiment is calculated by tracking all posts created by a specific user through the recordUserID field. The system maintains a historical record that includes:
- Post-level scores: Each discussion and comment sentiment linked to the user
- Keyword associations: Specific terms and their sentiment scores from user content
- Temporal patterns: Sentiment trends over time for behavioral analysis
User Privacy and Consent
The system respects user privacy by:
- Opt-in Required: Only processes sentiment for users who have accepted sentiment analysis cookies
- Transparent Processing: Users are informed about sentiment analysis in privacy policies
- Data Control: Users can opt out, stopping future sentiment processing
Content Policy Integration
OpenAI Content Filtering
When content violates OpenAI's content policies, the system assigns special negative sentiment codes:
- Hate Speech (-1): Content flagged for hate speech
- Jailbreak Attempts (-2): Attempts to circumvent AI safety measures
- Self-harm Content (-3): Content promoting self-harm
- Sexual Content (-4): Inappropriate sexual material
- Violence (-5): Content promoting violence
Error Handling
The system gracefully handles various error conditions:
- API Failures: Logs errors and continues operation without sentiment data
- Content Policy Violations: Assigns appropriate negative sentiment codes
- Processing Errors: Falls back to no sentiment rather than incorrect data
Integration Points
Automation Rules
Sentiment scores can trigger automated community management actions:
- Escalation Creation: Automatically escalate highly negative posts
- Moderation Queues: Route content based on sentiment thresholds
- User Notifications: Alert moderators to sentiment pattern changes
Analytics and Reporting
Sentiment data feeds into various analytics systems:
- Community Health Dashboards: Overall sentiment trending
- User Behavior Analysis: Individual user sentiment patterns
- Content Performance: Correlation between sentiment and engagement
API Integration
Accessing Sentiment Data
Sentiment scores are accessible through Vanilla's API endpoints:
- Discussion API:
/api/v2/discussions/{id} includes sentiment field
- Comment API:
/api/v2/comments/{id} includes sentiment field
- Keyword Sentiment API: Access detailed keyword-level sentiment data
- User Sentiment Aggregation: Query historical user sentiment patterns
Webhook Integration
Sentiment events can trigger webhooks for external system integration:
- Post Sentiment Events: Fired when new content is analyzed
- Threshold Alerts: Triggered when sentiment crosses configured thresholds
- User Pattern Changes: Notifications for significant user sentiment shifts
Best Practices for Implementation
Configuration Recommendations
- Threshold Setting: Establish clear sentiment thresholds for different automation actions
- Keyword Tracking: Configure relevant keywords for your community's domain
- User Communication: Clearly explain sentiment analysis in privacy policies
- Moderation Training: Train moderators on interpreting sentiment data
Performance Considerations
- Batch Processing: Large content is automatically chunked for optimal API usage
- Rate Limiting: Built-in protections prevent API quota exhaustion
- Caching: Sentiment scores are cached to avoid reprocessing unchanged content
- Async Processing: Sentiment analysis runs asynchronously to avoid blocking user interactions
Development Guidelines
- Event Handlers: Implement custom event handlers for sentiment-based automation
- Database Schema: Understand the sentiment data storage structure for custom queries
- Plugin Architecture: Extend sentiment analysis through Vanilla's plugin system
- Testing: Use sentiment model test suites for validation during development
This technical documentation covers the implementation details of sentiment analysis in Vanilla Forums. For API reference documentation, consult the OpenAPI specifications. For implementation support, contact the development team.