Analytics Background Jobs
This document describes the scheduled jobs that maintain and optimize the analytics system.
๐ Overviewโ
Three background jobs keep your analytics data clean, aggregated, and performant:
- Daily Aggregation - Summarizes yesterday's data
- Session Cleanup - Closes inactive sessions
- Data Retention - Archives old events
๐ Job 1: Daily Analytics Aggregationโ
File: src/jobs/aggregate-daily-analytics.ts
Purposeโ
Aggregates yesterday's raw analytics events into the analytics_daily_stats table for faster historical queries.
Scheduleโ
0 1 * * * (Every day at 1:00 AM)
What It Doesโ
-
Collects Yesterday's Events
- Gets all analytics events from yesterday (00:00 - 23:59)
- Groups by website_id
-
Calculates Aggregated Stats
- Total pageviews
- Total custom events
- Unique visitors
- Unique sessions
- Top 10 pages
- Top 10 referrers
-
Stores in Daily Stats Table
- Creates one record per website per day
- Enables fast historical queries
- Reduces database load
Benefitsโ
- โ 10x faster historical queries
- โ Reduces load on main events table
- โ Enables long-term trend analysis
- โ Pre-calculated metrics ready to display
Example Outputโ
{
website_id: "01JM1PEW9H0ES7GGMD173GM2T9",
date: "2024-01-15",
total_pageviews: 1247,
total_custom_events: 89,
unique_visitors: 342,
unique_sessions: 456,
top_pages: [
{ item: "/", count: 450 },
{ item: "/products", count: 234 },
{ item: "/about", count: 123 }
],
top_referrers: [
{ item: "google", count: 234 },
{ item: "direct", count: 189 },
{ item: "facebook", count: 45 }
]
}
Logsโ
[Analytics Job] Starting daily aggregation...
[Analytics Job] Aggregating data for 2024-01-15
[Analytics Job] โ
Aggregated 1247 events for website 01JM1PEW9H0ES7GGMD173GM2T9
[Analytics Job] โ
Daily aggregation completed for 1 website(s)
๐งน Job 2: Session Cleanupโ
File: src/jobs/cleanup-analytics-sessions.ts
Purposeโ
Closes sessions that have been inactive for 30+ minutes and calculates their final duration.
Scheduleโ
*/10 * * * * (Every 10 minutes)
What It Doesโ
-
Finds Stale Sessions
- Queries sessions with
last_activity_at> 30 minutes ago - Only sessions where
ended_atis null
- Queries sessions with
-
Closes Each Session
- Sets
ended_attolast_activity_at - Calculates
duration_seconds - Marks session as complete
- Sets
-
Updates Analytics
- Accurate session duration metrics
- Correct "current visitors" count
- Clean data for reporting
Benefitsโ
- โ Accurate active visitor counts
- โ Proper session duration metrics
- โ Clean data for analytics
- โ Prevents stale session buildup
Session Lifecycleโ
User visits page
โ
Session created (started_at)
โ
User browses (last_activity_at updated)
โ
User leaves (no activity for 30 min)
โ
Job closes session (ended_at, duration_seconds)
Exampleโ
// Before cleanup
{
session_id: "session_abc123",
started_at: "2024-01-15T10:00:00Z",
last_activity_at: "2024-01-15T10:25:00Z",
ended_at: null,
duration_seconds: null
}
// After cleanup (10:55 AM)
{
session_id: "session_abc123",
started_at: "2024-01-15T10:00:00Z",
last_activity_at: "2024-01-15T10:25:00Z",
ended_at: "2024-01-15T10:25:00Z",
duration_seconds: 1500 // 25 minutes
}
Logsโ
[Analytics Cleanup] Checking for inactive sessions before 2024-01-15T10:25:00Z
[Analytics Cleanup] Found 5 stale session(s) to close
[Analytics Cleanup] โ
Closed session session_abc123 (duration: 1500s)
[Analytics Cleanup] โ
Closed 5 inactive session(s)
๐๏ธ Job 3: Data Retention & Archivalโ
File: src/jobs/archive-old-analytics.ts
Purposeโ
Deletes raw analytics events older than 90 days while keeping aggregated daily stats.
Scheduleโ
0 2 * * 0 (Every Sunday at 2:00 AM)
What It Doesโ
-
Identifies Old Data
- Finds events older than 90 days
- Counts total events to archive
-
Batch Deletion
- Deletes in batches of 1000
- Prevents database overload
- Includes small delays between batches
-
Cleans Up Sessions
- Also deletes sessions older than 90 days
- Keeps database lean
-
Preserves Aggregated Stats
- Daily stats are kept indefinitely
- Historical trends remain available
Benefitsโ
- โ Reduces database size (can save 80%+ storage)
- โ Improves query performance
- โ Lowers storage costs
- โ Maintains historical trends (via daily stats)
- โ GDPR compliant (data retention policy)
Data Retention Strategyโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Data Retention โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Raw Events: 90 days (then deleted) โ
โ Sessions: 90 days (then deleted) โ
โ Daily Stats: Forever (kept) โ
โ โ
โ Why? โ
โ - Raw events for recent detailed analysis โ
โ - Daily stats for long-term trends โ
โ - Balance between detail and storage โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Exampleโ
Day 1-90: Full detail available (raw events)
Day 91+: Aggregated stats only (daily summaries)
Query "Show me traffic for last 30 days"
โ Uses raw events (fast, detailed)
Query "Show me traffic for last year"
โ Uses daily stats (fast, summarized)