Home Blog Technical Blog Always in Sync: Realtime Calendar Updates and Semantic Event Search

Always in Sync: Realtime Calendar Updates and Semantic Event Search

How webhooks and embeddings power the Busy Family calendar pipeline.

Always in Sync: Realtime Calendar Updates and Semantic Event Search

How webhooks and embeddings power the Busy Family calendar pipeline.

This post describes two architectural pillars of the Busy Family calendar system:

  1. Webhook-driven realtime ingestion
  2. Semantic indexing and vector search

The first ensures the assistant’s view of the calendar is fresh. The second ensures it is meaningfully searchable.

Part I — Realtime Event Ingestion (Push, Not Poll)

Realtime Calendar Event Ingestion (Webhook-Based)
Realtime Calendar Event Ingestion (Webhook-Based)

Polling is simple. It is also slow, wasteful, and guarantees staleness.

Instead, Busy Family uses Google Calendar push notification channels. When a calendar changes, Google notifies our webhook immediately. That notification triggers a safe, incremental synchronization process.

The flow looks like this:

1️⃣ Channel Registration

During onboarding, when a user selects calendars to manage:

  • We call the Google Calendar Events Watch API
  • We provide:
  • A UUID channelID
  • Our HTTPS webhook endpoint
  • Google returns:
  • resourceID
  • expiration (≈ 7 days)

We persist this to DynamoDB:

Table: CalendarWebhookChannels

  • channelID (PK)
  • familyID
  • calendarID
  • connectionID
  • resourceID
  • nextSyncToken
  • expiration
  • createdAt

A GSI on familyID allows renewal and cleanup operations.

This establishes push-only semantics. No polling fallback required.

2️⃣ Multi-Account Isolation

A family may connect multiple Google accounts.

Each OAuth connection has a stable connectionID (Google UserInfo id).

Tokens are stored in:

Table: FamilyConnectionCalendarTokens

  • Partition key: familyID
  • Sort key: connectionID

Each webhook channel stores its owning connectionID.

When a webhook fires:

  • We resolve the channel
  • Then fetch the correct access token
  • Tokens are never mixed across accounts

Failure in one account does not hide calendars from others.

3️⃣ Webhook Handling + Sync Lock

When Google sends a notification:

  1. CalendarWebhookHandler reads X-Goog-Channel-ID
  2. Loads the channel from DynamoDB
  3. Immediately responds 200 OK

Processing happens asynchronously.

Before starting a sync, we attempt to acquire a distributed sync lock using a DynamoDB conditional update:

lockUntil < now

If another sync is running, we exit gracefully.

This prevents duplicate incremental work during rapid notification bursts.

4️⃣ Incremental Sync via nextSyncToken

The sync step calls:

ScheduleRefreshService.refreshCalendarIncrementally()

If nextSyncToken exists:

  • We pass it to Google’s Events List API
  • Google returns only changed events

Important behaviors:

  • Paging may occur
  • nextSyncToken is returned only on the final page
  • We persist the new token after full pagination

If Google returns 410 Gone:

  • The sync token expired
  • We perform a full resync
  • We do not persist a token
  • The next run starts fresh

The first notification after channel creation has resourceState = "sync":

  • We perform a full fetch
  • Then store the initial nextSyncToken

At this point we have:

A minimal, incremental set of changed events.

5️⃣ Idempotent OpenSearch Writes

Changed events are written to:

OpenSearch index: calendar-events-v3

Document IDs are:

SHA-256(calendarID + eventIdentity + familyID)

This guarantees deterministic, collision-resistant IDs.

We use a scripted upsert:

  • A Painless script compares event fingerprint
  • If content unchanged → skip write
  • Prevents write amplification

Now the event store is updated.

That handles freshness.

Part II — Semantic Indexing Pipeline

Semantic Event Indexing Pipeline
Semantic Event Indexing Pipeline

Fresh data is not enough. It must be searchable by meaning.

Each changed event goes through a structured embedding pipeline.

6️⃣ Description Cleaning (Signal Over Noise)

Calendar descriptions are noisy.

They include:

  • Zoom join links
  • Teams dial-ins
  • HTML artifacts
  • Tracking URLs

We apply a two-stage cleaner.

Stage 1 — Regex Cleaner

  • Strip URLs
  • Remove meeting boilerplate
  • Collapse whitespace
  • Truncate to 1,500 chars

Stage 2 — Optional Titan Summarization

If cleaned text > 800 characters:

We send it to:

Titan Text G1 Lite amazon.titan-text-lite-v1

Prompt constraints:

  • Single sentence
  • Max 40 words
  • ≤ 80 output tokens
  • Low temperature
  • Preserve purpose + people + context
  • Do not invent time or location

This preserves semantic density while removing noise.

7️⃣ Embedding Construction

We build a single embedding string:

<summary>. <cleanedDescription>.
Location: <location>.
Calendar: <calendarDisplayName>.
Organizer: <organizer>.
Attendees: <attendee1>, <attendee2>

All top-level sections are joined with ". ".

Capped at 1,800 characters.

8️⃣ Vector Generation (Bedrock)

We send this string to:

Titan Embed Text v2 amazon.titan-embed-text-v2:0

Output:

  • 512-dimensional
  • Normalized vector

This becomes the embedding field in OpenSearch.

9️⃣ OpenSearch Configuration

Index: calendar-events-v3

  • Field: embedding
  • Type: knn_vector
  • Dimensions: 512
  • Engine: HNSW (nmslib)
  • Similarity: cosine

Now the event is searchable semantically.

Part III — Semantic Retrieval

Semantic Calendar Retrieval
Semantic Calendar Retrieval

When a user asks:

“What’s on my schedule around the holidays?”

The system:

  1. Embeds the query using Titan Embed v2
  2. Performs kNN search against embedding
  3. Applies filters:
  • familyID
  • optional calendarID
  • date range

Results are returned based on vector similarity — not keyword match.

An event titled:

“Grandma’s House — Dec 24”

can match “holidays” even if that word never appears.

This is semantic retrieval.

Architecture Summary

The full system combines:

🔄 Push-Based Realtime Ingestion

  • Webhook channels
  • Incremental sync
  • Distributed locking
  • Idempotent upserts

🧠 Dense Semantic Indexing

  • Noise reduction
  • Optional summarization
  • 512-dim embeddings
  • HNSW kNN search

Together, the result is:

  • No polling lag
  • No stale calendar views
  • No keyword-only limitations
  • Meaning-based retrieval

The family calendar is:

Always fresh. Always meaningfully searchable.