Skip to main content

Analytics Background Jobs

This document describes the scheduled jobs that maintain and optimize the analytics system.

๐Ÿ“‹ Overviewโ€‹

Three background jobs keep your analytics data clean, aggregated, and performant:

  1. Daily Aggregation - Summarizes yesterday's data
  2. Session Cleanup - Closes inactive sessions
  3. Data Retention - Archives old events

๐Ÿ”„ Job 1: Daily Analytics Aggregationโ€‹

File: src/jobs/aggregate-daily-analytics.ts

Purposeโ€‹

Aggregates yesterday's raw analytics events into the analytics_daily_stats table for faster historical queries.

Scheduleโ€‹

0 1 * * *  (Every day at 1:00 AM)

What It Doesโ€‹

  1. Collects Yesterday's Events

    • Gets all analytics events from yesterday (00:00 - 23:59)
    • Groups by website_id
  2. Calculates Aggregated Stats

    • Total pageviews
    • Total custom events
    • Unique visitors
    • Unique sessions
    • Top 10 pages
    • Top 10 referrers
  3. Stores in Daily Stats Table

    • Creates one record per website per day
    • Enables fast historical queries
    • Reduces database load

Benefitsโ€‹

  • โœ… 10x faster historical queries
  • โœ… Reduces load on main events table
  • โœ… Enables long-term trend analysis
  • โœ… Pre-calculated metrics ready to display

Example Outputโ€‹

{
website_id: "01JM1PEW9H0ES7GGMD173GM2T9",
date: "2024-01-15",
total_pageviews: 1247,
total_custom_events: 89,
unique_visitors: 342,
unique_sessions: 456,
top_pages: [
{ item: "/", count: 450 },
{ item: "/products", count: 234 },
{ item: "/about", count: 123 }
],
top_referrers: [
{ item: "google", count: 234 },
{ item: "direct", count: 189 },
{ item: "facebook", count: 45 }
]
}

Logsโ€‹

[Analytics Job] Starting daily aggregation...
[Analytics Job] Aggregating data for 2024-01-15
[Analytics Job] โœ… Aggregated 1247 events for website 01JM1PEW9H0ES7GGMD173GM2T9
[Analytics Job] โœ… Daily aggregation completed for 1 website(s)

๐Ÿงน Job 2: Session Cleanupโ€‹

File: src/jobs/cleanup-analytics-sessions.ts

Purposeโ€‹

Closes sessions that have been inactive for 30+ minutes and calculates their final duration.

Scheduleโ€‹

*/10 * * * *  (Every 10 minutes)

What It Doesโ€‹

  1. Finds Stale Sessions

    • Queries sessions with last_activity_at > 30 minutes ago
    • Only sessions where ended_at is null
  2. Closes Each Session

    • Sets ended_at to last_activity_at
    • Calculates duration_seconds
    • Marks session as complete
  3. Updates Analytics

    • Accurate session duration metrics
    • Correct "current visitors" count
    • Clean data for reporting

Benefitsโ€‹

  • โœ… Accurate active visitor counts
  • โœ… Proper session duration metrics
  • โœ… Clean data for analytics
  • โœ… Prevents stale session buildup

Session Lifecycleโ€‹

User visits page
โ†“
Session created (started_at)
โ†“
User browses (last_activity_at updated)
โ†“
User leaves (no activity for 30 min)
โ†“
Job closes session (ended_at, duration_seconds)

Exampleโ€‹

// Before cleanup
{
session_id: "session_abc123",
started_at: "2024-01-15T10:00:00Z",
last_activity_at: "2024-01-15T10:25:00Z",
ended_at: null,
duration_seconds: null
}

// After cleanup (10:55 AM)
{
session_id: "session_abc123",
started_at: "2024-01-15T10:00:00Z",
last_activity_at: "2024-01-15T10:25:00Z",
ended_at: "2024-01-15T10:25:00Z",
duration_seconds: 1500 // 25 minutes
}

Logsโ€‹

[Analytics Cleanup] Checking for inactive sessions before 2024-01-15T10:25:00Z
[Analytics Cleanup] Found 5 stale session(s) to close
[Analytics Cleanup] โœ… Closed session session_abc123 (duration: 1500s)
[Analytics Cleanup] โœ… Closed 5 inactive session(s)

๐Ÿ—„๏ธ Job 3: Data Retention & Archivalโ€‹

File: src/jobs/archive-old-analytics.ts

Purposeโ€‹

Deletes raw analytics events older than 90 days while keeping aggregated daily stats.

Scheduleโ€‹

0 2 * * 0  (Every Sunday at 2:00 AM)

What It Doesโ€‹

  1. Identifies Old Data

    • Finds events older than 90 days
    • Counts total events to archive
  2. Batch Deletion

    • Deletes in batches of 1000
    • Prevents database overload
    • Includes small delays between batches
  3. Cleans Up Sessions

    • Also deletes sessions older than 90 days
    • Keeps database lean
  4. Preserves Aggregated Stats

    • Daily stats are kept indefinitely
    • Historical trends remain available

Benefitsโ€‹

  • โœ… Reduces database size (can save 80%+ storage)
  • โœ… Improves query performance
  • โœ… Lowers storage costs
  • โœ… Maintains historical trends (via daily stats)
  • โœ… GDPR compliant (data retention policy)

Data Retention Strategyโ€‹

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Data Retention โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ โ”‚
โ”‚ Raw Events: 90 days (then deleted) โ”‚
โ”‚ Sessions: 90 days (then deleted) โ”‚
โ”‚ Daily Stats: Forever (kept) โ”‚
โ”‚ โ”‚
โ”‚ Why? โ”‚
โ”‚ - Raw events for recent detailed analysis โ”‚
โ”‚ - Daily stats for long-term trends โ”‚
โ”‚ - Balance between detail and storage โ”‚
โ”‚ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Exampleโ€‹

Day 1-90:   Full detail available (raw events)
Day 91+: Aggregated stats only (daily summaries)

Query "Show me traffic for last 30 days"
โ†’ Uses raw events (fast, detailed)

Query "Show me traffic for last year"
โ†’ Uses daily stats (fast, summarized)

Logsโ€‹

[Analytics Archive] Starting data retention job...
[Analytics Archive] Archiving events older than 2023-10-15 (90 days)
[Analytics Archive] Found 45,234 event(s) to archive
[Analytics Archive] Deleted batch of 1000 events (1000/45234)
[Analytics Archive] Deleted batch of 1000 events (2000/45234)
...
[Analytics Archive] โœ… Archived 45,234 old event(s) (kept daily aggregated stats)
[Analytics Archive] โœ… Archived 12,456 old session(s)
[Analytics Archive] โœ… Data retention job completed successfully

๐ŸŽฏ Job Configuration Summaryโ€‹

JobFrequencyDurationImpact
Daily AggregationDaily (1 AM)~1-5 minLow
Session CleanupEvery 10 min~10-30 secVery Low
Data RetentionWeekly (Sun 2 AM)~5-30 minLow

๐Ÿ“Š Performance Impactโ€‹

Database Loadโ€‹

Daily Aggregation:
- Reads: ~10K-100K events/day
- Writes: 1-10 records/day
- Impact: Low (runs at 1 AM)

Session Cleanup:
- Reads: ~10-100 sessions
- Writes: ~10-100 updates
- Impact: Very Low (small batches)

Data Retention:
- Reads: ~1K-100K events
- Deletes: ~1K-100K events
- Impact: Low (batched, weekly)

Storage Savingsโ€‹

Without jobs:
- 1M events/month = ~500 MB/month
- 12 months = ~6 GB

With jobs:
- 3 months raw events = ~1.5 GB
- Daily stats = ~50 MB
- Total = ~1.55 GB (74% savings!)

๐Ÿ”ง Customizationโ€‹

Adjust Retention Periodโ€‹

// src/jobs/archive-old-analytics.ts
const retentionDays = 90; // Change to 30, 60, 180, etc.

Adjust Session Timeoutโ€‹

// src/jobs/cleanup-analytics-sessions.ts
const sessionTimeout = 30 * 60 * 1000; // Change to 15, 45, 60 min

Adjust Aggregation Timeโ€‹

// src/jobs/aggregate-daily-analytics.ts
export const config = {
schedule: "0 1 * * *", // Change to "0 3 * * *" for 3 AM
};

๐Ÿ› Troubleshootingโ€‹

Job Not Runningโ€‹

# Check logs
tail -f logs/medusa.log | grep "Analytics"

# Verify job is registered
# Jobs should auto-load from src/jobs/

Job Failingโ€‹

// Check error logs
[Analytics Job] โŒ Error during daily aggregation: ...

// Common issues:
// 1. Database connection
// 2. Missing permissions
// 3. Invalid date ranges

Performance Issuesโ€‹

// If aggregation is slow:
// 1. Add indexes on timestamp fields
// 2. Reduce batch sizes
// 3. Run at off-peak hours

// If cleanup is slow:
// 1. Increase frequency (every 5 min)
// 2. Add index on last_activity_at

๐Ÿ“ˆ Monitoringโ€‹

Key Metrics to Trackโ€‹

  1. Job Execution Time

    • Should be consistent
    • Spikes indicate issues
  2. Events Processed

    • Should match daily traffic
    • Drops indicate missing data
  3. Database Size

    • Should stabilize after 90 days
    • Growth indicates retention not working
  4. Error Rate

    • Should be near zero
    • Errors need investigation
// Alert if job fails
if (jobStatus === "failed") {
sendAlert("Analytics job failed!");
}

// Alert if aggregation is delayed
if (lastAggregation > 36 hours) {
sendAlert("Daily aggregation is behind!");
}

// Alert if database grows too large
if (databaseSize > 10 GB) {
sendAlert("Analytics database is large!");
}

โœ… Verificationโ€‹

Test Jobs Manuallyโ€‹

# In MedusaJS admin or via API
POST /admin/jobs/run
{
"job_name": "aggregate-daily-analytics"
}

Check Job Historyโ€‹

-- View recent aggregations
SELECT * FROM analytics_daily_stats
ORDER BY date DESC
LIMIT 10;

-- Check closed sessions
SELECT COUNT(*) FROM analytics_session
WHERE ended_at IS NOT NULL;

-- Verify data retention
SELECT COUNT(*) FROM analytics_event
WHERE timestamp < NOW() - INTERVAL '90 days';
-- Should be 0 or very low

๐ŸŽ‰ Summaryโ€‹

Your analytics system now has automated maintenance:

  • โœ… Daily aggregation for fast historical queries
  • โœ… Session cleanup for accurate metrics
  • โœ… Data retention for optimal performance

These jobs run automatically in the background, keeping your analytics system fast, accurate, and cost-effective! ๐Ÿš€