Always in Sync: Realtime Calendar updates and Semantic Event Search

February 26, 2026 • 10 min read • By Elene Strurua and Che Hammond

Always in Sync: Realtime Calendar Updates and Semantic Event Search

How webhooks and embeddings power the Busy Family calendar pipeline.

This post describes two architectural pillars of the Busy Family calendar system:

Webhook-driven realtime ingestion
Semantic indexing and vector search

The first ensures the assistant’s view of the calendar is fresh. The second ensures it is meaningfully searchable.

Part I — Realtime Event Ingestion (Push, Not Poll)

Realtime Calendar Event Ingestion (Webhook-Based)

Polling is simple. It is also slow, wasteful, and guarantees staleness.

Instead, Busy Family uses Google Calendar push notification channels. When a calendar changes, Google notifies our webhook immediately. That notification triggers a safe, incremental synchronization process.

The flow looks like this:

1️⃣ Channel Registration

During onboarding, when a user selects calendars to manage:

We call the Google Calendar Events Watch API
We provide:

A UUID channelID
Our HTTPS webhook endpoint
Google returns:

resourceID
expiration (≈ 7 days)

We persist this to DynamoDB:

Table: CalendarWebhookChannels

channelID (PK)
familyID
calendarID
connectionID
resourceID
nextSyncToken
expiration
createdAt

A GSI on familyID allows renewal and cleanup operations.

This establishes push-only semantics. No polling fallback required.

2️⃣ Multi-Account Isolation

A family may connect multiple Google accounts.

Each OAuth connection has a stable connectionID (Google UserInfo id).

Tokens are stored in:

Table: FamilyConnectionCalendarTokens

Partition key: familyID
Sort key: connectionID

Each webhook channel stores its owning connectionID.

When a webhook fires:

We resolve the channel
Then fetch the correct access token
Tokens are never mixed across accounts

Failure in one account does not hide calendars from others.

3️⃣ Webhook Handling + Sync Lock

When Google sends a notification:

CalendarWebhookHandler reads X-Goog-Channel-ID
Loads the channel from DynamoDB
Immediately responds 200 OK

Processing happens asynchronously.

Before starting a sync, we attempt to acquire a distributed sync lock using a DynamoDB conditional update:

lockUntil < now

If another sync is running, we exit gracefully.

This prevents duplicate incremental work during rapid notification bursts.

4️⃣ Incremental Sync via nextSyncToken

The sync step calls:

ScheduleRefreshService.refreshCalendarIncrementally()

If nextSyncToken exists:

We pass it to Google’s Events List API
Google returns only changed events

Important behaviors:

Paging may occur
nextSyncToken is returned only on the final page
We persist the new token after full pagination

If Google returns 410 Gone:

The sync token expired
We perform a full resync
We do not persist a token
The next run starts fresh

The first notification after channel creation has resourceState = "sync":

We perform a full fetch
Then store the initial nextSyncToken

At this point we have:

A minimal, incremental set of changed events.

5️⃣ Idempotent OpenSearch Writes

Changed events are written to:

OpenSearch index: calendar-events-v3

Document IDs are:

SHA-256(calendarID + eventIdentity + familyID)

This guarantees deterministic, collision-resistant IDs.

We use a scripted upsert:

A Painless script compares event fingerprint
If content unchanged → skip write
Prevents write amplification

Now the event store is updated.

That handles freshness.

Part II — Semantic Indexing Pipeline

Fresh data is not enough. It must be searchable by meaning.

Each changed event goes through a structured embedding pipeline.

6️⃣ Description Cleaning (Signal Over Noise)

Calendar descriptions are noisy.

They include:

Zoom join links
Teams dial-ins
HTML artifacts
Tracking URLs

We apply a two-stage cleaner.

Stage 1 — Regex Cleaner

Strip URLs
Remove meeting boilerplate
Collapse whitespace
Truncate to 1,500 chars

Stage 2 — Optional Titan Summarization

If cleaned text > 800 characters:

We send it to:

Titan Text G1 Lite amazon.titan-text-lite-v1

Prompt constraints:

Single sentence
Max 40 words
≤ 80 output tokens
Low temperature
Preserve purpose + people + context
Do not invent time or location

This preserves semantic density while removing noise.

7️⃣ Embedding Construction

We build a single embedding string:

<summary>. <cleanedDescription>.
Location: <location>.
Calendar: <calendarDisplayName>.
Organizer: <organizer>.
Attendees: <attendee1>, <attendee2>

All top-level sections are joined with ". ".

Capped at 1,800 characters.

8️⃣ Vector Generation (Bedrock)

We send this string to:

Titan Embed Text v2 amazon.titan-embed-text-v2:0

Output:

512-dimensional
Normalized vector

This becomes the embedding field in OpenSearch.

9️⃣ OpenSearch Configuration

Index: calendar-events-v3

Field: embedding
Type: knn_vector
Dimensions: 512
Engine: HNSW (nmslib)
Similarity: cosine

Now the event is searchable semantically.

Part III — Semantic Retrieval

When a user asks:

“What’s on my schedule around the holidays?”

The system:

Embeds the query using Titan Embed v2
Performs kNN search against embedding
Applies filters:

familyID
optional calendarID
date range

Results are returned based on vector similarity — not keyword match.

An event titled:

“Grandma’s House — Dec 24”

can match “holidays” even if that word never appears.

This is semantic retrieval.

Architecture Summary

The full system combines:

🔄 Push-Based Realtime Ingestion

Webhook channels
Incremental sync
Distributed locking
Idempotent upserts

🧠 Dense Semantic Indexing

Noise reduction
Optional summarization
512-dim embeddings
HNSW kNN search

Together, the result is:

No polling lag
No stale calendar views
No keyword-only limitations
Meaning-based retrieval

The family calendar is:

Always fresh. Always meaningfully searchable.