Zap Family Roadmap
Vision
Unify the three separate Zap projects (Zap, Zap-Mail, Zap-Cal) into a single integrated platform for personal communications, calendar, and writing assistance. The end state is a self-hosted "personal command centre" that handles email, calendar, document management, and AI-assisted writing from one interface.
Current Architecture (Feb 2026)
/var/www/zap/ Main Zap project (dashboard, shared infra)
/var/www/zap-mail/ Gmail client: focus inbox, sync, contacts, rules
/var/www/zap-cal/ Google Calendar client: event display, syncAll three share the same Google Cloud project and OAuth client ID but use separate ports (Zap-Mail: 8080, Zap-Cal: 8081) and separate token files. They are deployed on orcus.lan (Ubuntu server) behind Apache virtual hosts.
Pain Points
- Three separate codebases with overlapping dependencies (Google PHP client, SQLite, config patterns)
- OAuth confusion: Same client ID, different redirect URIs and scopes. Re-authorizing one can break the other if ports are mixed up. Documented extensively in
zap-mail/README.md. - No shared authentication layer: Each project manages its own tokens
- No unified UI: Switching between mail and calendar means different URLs
- Writing tools scattered: Google Docs accessed via browser, Word docs require conversion, no integrated drafting/revision support
zap-mail/bin/backfill-history.php-- import older emails with custom Gmail queries- Supports
--query,--after,--before,--dry-run, retry with backoff - Used to backfill all Charlie Hall emails (833) and Mexico/oil-related messages (4000+)
- Added
drive.readonlyOAuth scope to Zap-Mail zap-mail/bin/search-drive.php-- search Drive by name/content, list revisions, export Google Docs as text- Supports
--query,--type,--shared,--revisions=ID,--export=ID zap-mail/bin/mexico-chapter-timeline.php-- generates chronological timeline combining emails, Drive revisions, and Word doc uploads- Produces both console output and markdown file
- Template can be adapted for other research projects
- 23 chat transcripts ingested with full metadata (apps, topics, files, message counts)
- 348 plan files cross-referenced to chats
- FTS5 full-text search across all transcripts
- LLM summaries generated for all chats via delphi.lan (qwen3:30b)
- Deep structured analysis (artifacts, decisions, unfinished work) for all chats
- Live file watcher (
zap-chat-watcher.service) auto-ingests new/modified transcripts via inotify - Cron job every 30 minutes as safety net for deep analysis
- Web UI: chat history browser, transcript viewer, AI Q&A endpoint
- Energystats asset inventory (charts, scripts, data files)
- Link on orcus.lan home page
- Database migration: SQLite to PostgreSQL (
zapDB,cursor_prefixed tables) - Project registry: First-class projects (auto-discovered from chats, manually enrichable)
- Many-to-many: Chats and plans can belong to multiple projects
- Web pages: Project index (card/list toggle), Cursor hub, individual project detail pages
- REST JSON API: Full API with API key auth, ready for external access
- MCP server: Python MCP server for Cursor/AI agent integration (local + remote modes)
- Global Cursor rule: AI assistant proactively uses MCP tools across all workspaces
- Docs viewer: Rendered markdown docs at
/projects/docs/ - Manual project creation: Create projects via UI modal with slug, aliases, match patterns
- Project-chat discovery: Configurable aliases and match_patterns per project; rescan button + automatic matching during ingestion
- Cursor title sync from paris.lan (DONE):
sync-cursor-titles.phpSSHes to paris.lan (where Cursor runs on Windows 11), readsstate.vscdbper workspace, syncs actual Cursor sidebar titles and metadata (lines added/removed, files changed, mode, archive status) to Postgres. Runs every 5 minutes via cron. Detects title renames and logs history tocursor_chat_title_history. - 4-tier title priority: display_title (user override) > cursor_title (synced from paris.lan) > llm_title (from deep analysis) > first_query_short (fallback)
- Inline rename: Pencil icon on Cursor Hub and project detail pages to set display_title via PATCH API
- LLM title drift detection:
deep-analyse-chat.phpdetects when the LLM-generated title changes and logs it - MCP deployment verified (DONE): Project-level
.cursor/mcp.jsondeployed to all 6 orcus workspaces. Global config on paris.lan at%USERPROFILE%\.cursor\mcp.json. Discovered that for SSH remote workspaces, the project-level config is the one that actually provides tools to the agent (global config only shows in UI). Full setup guide, troubleshooting, and architecture documented inapps/zap-projects/README.md. - Cross-project documentation (DONE): All project READMEs (philoenic, philanthropy-planner, prospecta, prospecta.cc, quickstep) updated with standardised cross-chat context block pointing to zap-projects MCP tools and web UI.
- Plan auto-indexing (DONE): File watcher now monitors
/home/jd/.cursor/plans/and auto-indexes.plan.mdfiles via--plans-onlymode. Plans have their own FTS vector and GIN index. Chat FTS expanded to include full transcript text. - Cursor Hub UI polish (DONE): Plans list view working, datetime display with times (DD MMM HH:MM) for both chats and plans in card and list views.
- Chat/plan links fixed (DONE): Chat transcript viewer migrated from SQLite to PostgreSQL. Plan viewer page created with markdown rendering. Both chats and plans are now clickable from the hub.
- Lazy-load plans with pagination (DONE): Plans load 50 at a time with "Load more" button, matching chat pagination pattern. Tab selection (Chats/Plans) persists to localStorage. URL query param
?tab=planssupports deep-linking from plan-viewer back link. - PostgreSQL backups (
cron/backup-pg.sh): Daily automated dumps with 7 daily + 4 weekly + 3 monthly rotation - Dual-channel notifications: Telegram + ntfy.sh on backup failure, credentials from
~/.credentials - Backup history tracking:
backup_historytable records all events (success/failure) with paths, sizes, errors - SQLite backup integration: Philoenic and zap-mail scripts updated to use shared notification library and log to database
- Status pages:
orcus.lan/databases.php(local inventory) andzap.orcus.lan/projects/databases.php(LAN-wide view with backup history) - Clickable backup folders:
file://protocol links to open NAS folders directly from browser (Windows registry file provided) - Central task list queryable by: "What are my most important outstanding tasks?"
- Tasks linked to apps (zap-mail, philoenic, etc.), chats, and plan files
- Priority levels, due dates, dependencies
- Initially CLI + simple web UI; grows into full project management
- Currently tasks are scattered across roadmaps, plan files, handover docs, and memory
- No single view of "what should I work on next?"
- Zap-Projects is the connective tissue between all other Zap apps
- AI agents need to interrogate project history, chats, and plans across all workspaces
- Google Docs ingestion: multi-tabbed doc fetching via Docs API v1, text extraction from paragraphs and tables, hierarchical path metadata
- File ingestion: PDF, DOCX, TXT, MD, HTML, RTF support with text extraction and configurable chunking
- pgvector embeddings:
nomic-embed-textembeddings on titan (RTX 3060), HNSW index for cosine similarity search - Multi-source RAG search: 4-part synthesised answers (project docs with citations, LLM knowledge, SearXNG web + LLM, external library placeholder)
- Editorial engines: edit harmonisation, fact-check, currency scanner, bibliography checker, chart manager (built for Mexico chapter)
- LLM gateway integration: auto-routes embedding and chat requests to the best available Ollama server (delphi/phoebe/titan)
- Projects: Hidden Money (philanthropy book, 424 chunks embedded), Mexico oil chapter (78 edits, 528 bibliography entries)
- External document ingestion (HIGH PRIORITY): Ingest books, reports, PDFs into the chunking + embedding pipeline. The CLI tool (
ingest-file.php) already supports this -- needs to be run against the user's document collection. - Streaming responses: SSE-based streaming for long LLM synthesis calls (currently ~30-60s blocking).
- Document version tracking: Track versions across Google Docs, Word uploads, and email attachments.
- Writing assistant: LLM-powered drafting, summarisation, and revision suggestion within project context.
- Cross-format support: Unified revision history across Google Docs, Word, and PDF.
- Backend: PHP 8.3 + Apache 2.4, PostgreSQL 17 with pgvector, Ollama LLMs via LLM gateway
- Embedding:
nomic-embed-texton titan (dedicated, avoids delphi model swapping) - Search: Tiered -- SearXNG -> Brave -> Google via
SearchClient - LLM routing: Centralised gateway at
llm.orcus.lan(delphi 2x3090, phoebe 3060, titan 3060) - Frontend: Vanilla JS, dark theme, tabbed workspace UI
- Phase 0: Scaffold service (vhost, bootstrap, database with pgvectorscale, extract RAG from zap-writer)
- Phase 1: Ingestion quality -- late chunking (arxiv 2409.04701), RAPTOR semantic tree (arxiv 2401.18059), Chain-of-Density summaries, document tree/ToC, contextual retrieval, variable-granularity chunking, opt-in proposition/question indexing
- Phase 2: Privacy / LAN-only mode (per-collection toggle)
- Phase 3: Retrieval stack -- HyDE+Rocchio, query expansion, hybrid RRF, MMR, cross-encoder reranking (bge-reranker-v2-m3), CRAG with strip refinement, adaptive retrieval cutoff (CAR)
- Phase 4: Synthesis quality -- chain-of-thought, citation verification, confidence scoring, source highlighting
- Phase 5: SSE streaming
- Phase 6: Conversation memory + multi-hop retrieval
- Phase 7: Delphi data extraction integration (Marker PDF, web upload, ColPali visual retrieval)
- Phase 8: Evaluation framework (RAGAS, frontier model judging via Claude/GPT/Perplexity, external reviewer UI)
- Phase 9: Embedding model benchmarking (nomic-embed-text vs mxbai-embed-large vs arctic-embed vs BGE-M3)
- Develop on orcus, design for portability to dedicated container on phoebe
- PostgreSQL + pgvector + pgvectorscale (orcus now, pg01.lan later)
- Late chunking as primary embedding strategy (free contextual embeddings)
- Cross-encoder reranking over LLM-based scoring (faster, more accurate)
- Frontier models (Claude, GPT, Perplexity) for evaluation only, never production
- Per-collection privacy toggle excluding cloud models and web search
- Single token management service shared by all Zap apps
- One re-authorization flow covers all scopes (Gmail, Calendar, Drive, Contacts)
- Token stored in one location, read by all apps
- Eliminates the port-confusion problem entirely
- Move from per-app SQLite to a shared database (possibly PostgreSQL or a unified SQLite with FTS5)
- Cross-app queries: "show me emails about this calendar event" or "find the document attached to this email"
- Common navigation, authentication status, notification system
- Unified search across mail, calendar, and documents
- Database backup refactoring and standardization
- Integration with external servers (du1, du2)
- Shared notification systems
- Cross-project documentation standards
- Hot swap VM on titan.lan for disaster recovery
- Delphi.lan LLM redundancy with cloud failover
- System-wide monitoring and alerting
---
Phase 1: Immediate Tools (Q1 2026) -- IN PROGRESS
1a. Historical Email Backfill (DONE)
1b. Google Drive Integration (DONE)
1c. Project-Specific Timelines (DONE)
---
Phase 1d: Zap-Projects -- ACTIVE
A project management and AI chat history hub for the Zap platform. Lives at /var/www/zap/apps/zap-projects/.
Chat History & Deep Analysis (DONE)
Project Hub (IN PROGRESS -- Feb 2026)
Major expansion: project-centric organisation of all Cursor chats and plans.Database Backup System (DONE -- Feb 2026)
Centralized backup infrastructure for all databases across the LAN.
See plans: zap_project_hub_a03d3fc4.plan.md, project_hub_enhancements_17edd6b1.plan.md, plans_lazy_load_sticky_dc372c8e.plan.md, fix_chat_and_plan_links_cc7a4a3d.plan.md
Task Management (forthcoming)
Why Zap-Projects Matters
---
Phase 2: Zap-Writer Module -- ACTIVE (Feb 2026)
AI-assisted writing tool with document ingestion, RAG search, and editorial engines. Lives at /var/www/zap/apps/zap-writer/.
Completed (Feb 2026)
Forthcoming
Architecture
---
Phase 2b: Zap-RAG Shared Service -- PLANNED (Feb 2026)
Standalone RAG service extracting and improving the retrieval-augmented generation from zap-writer. Any Zap app can consume it via REST API. All local LLMs, privacy-first with LAN-only mode. Design phase complete, informed by 15+ arxiv papers and three frontier model critiques.
Code: apps/zap-rag/
Full plan: apps/zap-rag/design/MASTER-PLAN.md
Web UI (planned): https://rag.orcus.lan/
Build Phases
Key Architectural Decisions
---
Phase 3: Shared Infrastructure (Q3-Q4 2026)
3a. Unified OAuth
3b. Shared Database Layer
3c. Shared UI Components
3d. Cross-Project Infrastructure
See Zap-Projects roadmap at /var/www/zap/apps/zap-projects/ROADMAP.md for cross-project initiatives:
See Orcus server roadmap at /var/www/orcus.lan/ROADMAP.md for infrastructure:
---
Phase 4: Project Unification (2027)
The Unified Zap Platform
Merge all three projects into a single codebase under /var/www/zap/:
/var/www/zap/
apps/
mail/ (from current zap-mail)
cal/ (from current zap-cal)
writer/ (new: document management + AI)
contacts/ (extracted from zap-mail)
shared/
auth/ (unified OAuth)
database/ (shared DB layer)
api/ (internal API for cross-module communication)
ui/ (common frontend components)
config/
web/ (single entry point with routing)Migration Path
shared/ namespace/mail, /cal, /writer)/var/www/zap-mail/ and /var/www/zap-cal/ directories (keep as git history)Benefits
composer.json, one deployment---
ChatGPT / LLM Integration Details
API Approach
// Example: Summarise an email thread for a writing project
$client = new OpenAI\Client(getenv('OPENAI_API_KEY'));
$response = $client->chat()->create([
'model' => 'gpt-4o',
'messages' => [
['role' => 'system', 'content' => 'Summarise this email thread...'],
['role' => 'user', 'content' => $threadContent],
],
]);Use Cases
Cost Considerations
---
Documentation Access
All Zap platform documentation is now accessible via web viewer at:
/var/www/zap/docs/ (source files)The web viewer provides enhanced navigation, search, and mobile-friendly access to all platform documentation including CHANGELOG, CODING_HISTORY, ROADMAP, TESTING, INFRASTRUCTURE, and topic guides.
---
Open Questions
---
Quick Reference: Current CLI Tools
Zap-Mail (/var/www/zap-mail/bin/)
| Script | Purpose |
| -------- | --------- |
backfill-history.php | Import historical emails with custom Gmail queries |
search-drive.php | Search Google Drive, list revisions, export docs |
mexico-chapter-timeline.php | Generate timeline for Mexico chapter project |
detect-replies.php | Detect which emails have been replied to |
sync-sent.php | Sync sent messages for reply detection |
apply-categories.php | Apply category rules to existing messages |
llm-categorize.php | AI-powered email categorisation |
merge-labels.php | Merge Gmail label data |
Cron Jobs
cron/sync.php -- Main email sync (every minute)cron/monitor-sync.php -- Alert if sync stops working