📚 Zap Platform Documentation

Complete documentation for the Zap platform, including development guides, testing procedures, and infrastructure details

ROADMAP
Platform vision and development phases • Core Documentation

Zap Family Roadmap

Vision

Unify the three separate Zap projects (Zap, Zap-Mail, Zap-Cal) into a single integrated platform for personal communications, calendar, and writing assistance. The end state is a self-hosted "personal command centre" that handles email, calendar, document management, and AI-assisted writing from one interface.

Current Architecture (Feb 2026)

/var/www/zap/          Main Zap project (dashboard, shared infra)
/var/www/zap-mail/     Gmail client: focus inbox, sync, contacts, rules
/var/www/zap-cal/      Google Calendar client: event display, sync

All three share the same Google Cloud project and OAuth client ID but use separate ports (Zap-Mail: 8080, Zap-Cal: 8081) and separate token files. They are deployed on orcus.lan (Ubuntu server) behind Apache virtual hosts.

Pain Points

  • Three separate codebases with overlapping dependencies (Google PHP client, SQLite, config patterns)
  • OAuth confusion: Same client ID, different redirect URIs and scopes. Re-authorizing one can break the other if ports are mixed up. Documented extensively in zap-mail/README.md.
  • No shared authentication layer: Each project manages its own tokens
  • No unified UI: Switching between mail and calendar means different URLs
  • Writing tools scattered: Google Docs accessed via browser, Word docs require conversion, no integrated drafting/revision support
  • ---

    Phase 1: Immediate Tools (Q1 2026) -- IN PROGRESS

    1a. Historical Email Backfill (DONE)

  • zap-mail/bin/backfill-history.php -- import older emails with custom Gmail queries
  • Supports --query, --after, --before, --dry-run, retry with backoff
  • Used to backfill all Charlie Hall emails (833) and Mexico/oil-related messages (4000+)
  • 1b. Google Drive Integration (DONE)

  • Added drive.readonly OAuth scope to Zap-Mail
  • zap-mail/bin/search-drive.php -- search Drive by name/content, list revisions, export Google Docs as text
  • Supports --query, --type, --shared, --revisions=ID, --export=ID
  • 1c. Project-Specific Timelines (DONE)

  • zap-mail/bin/mexico-chapter-timeline.php -- generates chronological timeline combining emails, Drive revisions, and Word doc uploads
  • Produces both console output and markdown file
  • Template can be adapted for other research projects
  • ---

    Phase 1d: Zap-Projects -- ACTIVE

    A project management and AI chat history hub for the Zap platform. Lives at /var/www/zap/apps/zap-projects/.

    Chat History & Deep Analysis (DONE)

  • 23 chat transcripts ingested with full metadata (apps, topics, files, message counts)
  • 348 plan files cross-referenced to chats
  • FTS5 full-text search across all transcripts
  • LLM summaries generated for all chats via delphi.lan (qwen3:30b)
  • Deep structured analysis (artifacts, decisions, unfinished work) for all chats
  • Live file watcher (zap-chat-watcher.service) auto-ingests new/modified transcripts via inotify
  • Cron job every 30 minutes as safety net for deep analysis
  • Web UI: chat history browser, transcript viewer, AI Q&A endpoint
  • Energystats asset inventory (charts, scripts, data files)
  • Link on orcus.lan home page
  • Project Hub (IN PROGRESS -- Feb 2026)

    Major expansion: project-centric organisation of all Cursor chats and plans.
  • Database migration: SQLite to PostgreSQL (zap DB, cursor_ prefixed tables)
  • Project registry: First-class projects (auto-discovered from chats, manually enrichable)
  • Many-to-many: Chats and plans can belong to multiple projects
  • Web pages: Project index (card/list toggle), Cursor hub, individual project detail pages
  • REST JSON API: Full API with API key auth, ready for external access
  • MCP server: Python MCP server for Cursor/AI agent integration (local + remote modes)
  • Global Cursor rule: AI assistant proactively uses MCP tools across all workspaces
  • Docs viewer: Rendered markdown docs at /projects/docs/
  • Manual project creation: Create projects via UI modal with slug, aliases, match patterns
  • Project-chat discovery: Configurable aliases and match_patterns per project; rescan button + automatic matching during ingestion
  • Cursor title sync from paris.lan (DONE): sync-cursor-titles.php SSHes to paris.lan (where Cursor runs on Windows 11), reads state.vscdb per workspace, syncs actual Cursor sidebar titles and metadata (lines added/removed, files changed, mode, archive status) to Postgres. Runs every 5 minutes via cron. Detects title renames and logs history to cursor_chat_title_history.
  • 4-tier title priority: display_title (user override) > cursor_title (synced from paris.lan) > llm_title (from deep analysis) > first_query_short (fallback)
  • Inline rename: Pencil icon on Cursor Hub and project detail pages to set display_title via PATCH API
  • LLM title drift detection: deep-analyse-chat.php detects when the LLM-generated title changes and logs it
  • MCP deployment verified (DONE): Project-level .cursor/mcp.json deployed to all 6 orcus workspaces. Global config on paris.lan at %USERPROFILE%\.cursor\mcp.json. Discovered that for SSH remote workspaces, the project-level config is the one that actually provides tools to the agent (global config only shows in UI). Full setup guide, troubleshooting, and architecture documented in apps/zap-projects/README.md.
  • Cross-project documentation (DONE): All project READMEs (philoenic, philanthropy-planner, prospecta, prospecta.cc, quickstep) updated with standardised cross-chat context block pointing to zap-projects MCP tools and web UI.
  • Plan auto-indexing (DONE): File watcher now monitors /home/jd/.cursor/plans/ and auto-indexes .plan.md files via --plans-only mode. Plans have their own FTS vector and GIN index. Chat FTS expanded to include full transcript text.
  • Cursor Hub UI polish (DONE): Plans list view working, datetime display with times (DD MMM HH:MM) for both chats and plans in card and list views.
  • Chat/plan links fixed (DONE): Chat transcript viewer migrated from SQLite to PostgreSQL. Plan viewer page created with markdown rendering. Both chats and plans are now clickable from the hub.
  • Lazy-load plans with pagination (DONE): Plans load 50 at a time with "Load more" button, matching chat pagination pattern. Tab selection (Chats/Plans) persists to localStorage. URL query param ?tab=plans supports deep-linking from plan-viewer back link.
  • Database Backup System (DONE -- Feb 2026)

    Centralized backup infrastructure for all databases across the LAN.

  • PostgreSQL backups (cron/backup-pg.sh): Daily automated dumps with 7 daily + 4 weekly + 3 monthly rotation
  • Dual-channel notifications: Telegram + ntfy.sh on backup failure, credentials from ~/.credentials
  • Backup history tracking: backup_history table records all events (success/failure) with paths, sizes, errors
  • SQLite backup integration: Philoenic and zap-mail scripts updated to use shared notification library and log to database
  • Status pages: orcus.lan/databases.php (local inventory) and zap.orcus.lan/projects/databases.php (LAN-wide view with backup history)
  • Clickable backup folders: file:// protocol links to open NAS folders directly from browser (Windows registry file provided)
  • See plans: zap_project_hub_a03d3fc4.plan.md, project_hub_enhancements_17edd6b1.plan.md, plans_lazy_load_sticky_dc372c8e.plan.md, fix_chat_and_plan_links_cc7a4a3d.plan.md

    Task Management (forthcoming)

  • Central task list queryable by: "What are my most important outstanding tasks?"
  • Tasks linked to apps (zap-mail, philoenic, etc.), chats, and plan files
  • Priority levels, due dates, dependencies
  • Initially CLI + simple web UI; grows into full project management
  • Why Zap-Projects Matters

  • Currently tasks are scattered across roadmaps, plan files, handover docs, and memory
  • No single view of "what should I work on next?"
  • Zap-Projects is the connective tissue between all other Zap apps
  • AI agents need to interrogate project history, chats, and plans across all workspaces
  • ---

    Phase 2: Zap-Writer Module -- ACTIVE (Feb 2026)

    AI-assisted writing tool with document ingestion, RAG search, and editorial engines. Lives at /var/www/zap/apps/zap-writer/.

    Completed (Feb 2026)

  • Google Docs ingestion: multi-tabbed doc fetching via Docs API v1, text extraction from paragraphs and tables, hierarchical path metadata
  • File ingestion: PDF, DOCX, TXT, MD, HTML, RTF support with text extraction and configurable chunking
  • pgvector embeddings: nomic-embed-text embeddings on titan (RTX 3060), HNSW index for cosine similarity search
  • Multi-source RAG search: 4-part synthesised answers (project docs with citations, LLM knowledge, SearXNG web + LLM, external library placeholder)
  • Editorial engines: edit harmonisation, fact-check, currency scanner, bibliography checker, chart manager (built for Mexico chapter)
  • LLM gateway integration: auto-routes embedding and chat requests to the best available Ollama server (delphi/phoebe/titan)
  • Projects: Hidden Money (philanthropy book, 424 chunks embedded), Mexico oil chapter (78 edits, 528 bibliography entries)
  • Forthcoming

  • External document ingestion (HIGH PRIORITY): Ingest books, reports, PDFs into the chunking + embedding pipeline. The CLI tool (ingest-file.php) already supports this -- needs to be run against the user's document collection.
  • Streaming responses: SSE-based streaming for long LLM synthesis calls (currently ~30-60s blocking).
  • Document version tracking: Track versions across Google Docs, Word uploads, and email attachments.
  • Writing assistant: LLM-powered drafting, summarisation, and revision suggestion within project context.
  • Cross-format support: Unified revision history across Google Docs, Word, and PDF.
  • Architecture

  • Backend: PHP 8.3 + Apache 2.4, PostgreSQL 17 with pgvector, Ollama LLMs via LLM gateway
  • Embedding: nomic-embed-text on titan (dedicated, avoids delphi model swapping)
  • Search: Tiered -- SearXNG -> Brave -> Google via SearchClient
  • LLM routing: Centralised gateway at llm.orcus.lan (delphi 2x3090, phoebe 3060, titan 3060)
  • Frontend: Vanilla JS, dark theme, tabbed workspace UI
  • ---

    Phase 2b: Zap-RAG Shared Service -- PLANNED (Feb 2026)

    Standalone RAG service extracting and improving the retrieval-augmented generation from zap-writer. Any Zap app can consume it via REST API. All local LLMs, privacy-first with LAN-only mode. Design phase complete, informed by 15+ arxiv papers and three frontier model critiques.

    Code: apps/zap-rag/ Full plan: apps/zap-rag/design/MASTER-PLAN.md Web UI (planned): https://rag.orcus.lan/

    Build Phases

  • Phase 0: Scaffold service (vhost, bootstrap, database with pgvectorscale, extract RAG from zap-writer)
  • Phase 1: Ingestion quality -- late chunking (arxiv 2409.04701), RAPTOR semantic tree (arxiv 2401.18059), Chain-of-Density summaries, document tree/ToC, contextual retrieval, variable-granularity chunking, opt-in proposition/question indexing
  • Phase 2: Privacy / LAN-only mode (per-collection toggle)
  • Phase 3: Retrieval stack -- HyDE+Rocchio, query expansion, hybrid RRF, MMR, cross-encoder reranking (bge-reranker-v2-m3), CRAG with strip refinement, adaptive retrieval cutoff (CAR)
  • Phase 4: Synthesis quality -- chain-of-thought, citation verification, confidence scoring, source highlighting
  • Phase 5: SSE streaming
  • Phase 6: Conversation memory + multi-hop retrieval
  • Phase 7: Delphi data extraction integration (Marker PDF, web upload, ColPali visual retrieval)
  • Phase 8: Evaluation framework (RAGAS, frontier model judging via Claude/GPT/Perplexity, external reviewer UI)
  • Phase 9: Embedding model benchmarking (nomic-embed-text vs mxbai-embed-large vs arctic-embed vs BGE-M3)
  • Key Architectural Decisions

  • Develop on orcus, design for portability to dedicated container on phoebe
  • PostgreSQL + pgvector + pgvectorscale (orcus now, pg01.lan later)
  • Late chunking as primary embedding strategy (free contextual embeddings)
  • Cross-encoder reranking over LLM-based scoring (faster, more accurate)
  • Frontier models (Claude, GPT, Perplexity) for evaluation only, never production
  • Per-collection privacy toggle excluding cloud models and web search
  • ---

    Phase 3: Shared Infrastructure (Q3-Q4 2026)

    3a. Unified OAuth

  • Single token management service shared by all Zap apps
  • One re-authorization flow covers all scopes (Gmail, Calendar, Drive, Contacts)
  • Token stored in one location, read by all apps
  • Eliminates the port-confusion problem entirely
  • 3b. Shared Database Layer

  • Move from per-app SQLite to a shared database (possibly PostgreSQL or a unified SQLite with FTS5)
  • Cross-app queries: "show me emails about this calendar event" or "find the document attached to this email"
  • 3c. Shared UI Components

  • Common navigation, authentication status, notification system
  • Unified search across mail, calendar, and documents
  • 3d. Cross-Project Infrastructure

    See Zap-Projects roadmap at /var/www/zap/apps/zap-projects/ROADMAP.md for cross-project initiatives:

  • Database backup refactoring and standardization
  • Integration with external servers (du1, du2)
  • Shared notification systems
  • Cross-project documentation standards
  • See Orcus server roadmap at /var/www/orcus.lan/ROADMAP.md for infrastructure:

  • Hot swap VM on titan.lan for disaster recovery
  • Delphi.lan LLM redundancy with cloud failover
  • System-wide monitoring and alerting
  • ---

    Phase 4: Project Unification (2027)

    The Unified Zap Platform

    Merge all three projects into a single codebase under /var/www/zap/:

    /var/www/zap/
      apps/
        mail/           (from current zap-mail)
        cal/            (from current zap-cal)
        writer/         (new: document management + AI)
        contacts/       (extracted from zap-mail)
      shared/
        auth/           (unified OAuth)
        database/       (shared DB layer)
        api/            (internal API for cross-module communication)
        ui/             (common frontend components)
      config/
      web/              (single entry point with routing)

    Migration Path

  • Extract shared code (OAuth, DB, config) into shared/ namespace
  • Convert each app to use shared services via dependency injection
  • Build a unified router that mounts each app at a path prefix (/mail, /cal, /writer)
  • Merge databases with migration scripts
  • Retire the separate /var/www/zap-mail/ and /var/www/zap-cal/ directories (keep as git history)
  • Benefits

  • One composer.json, one deployment
  • One OAuth flow for everything
  • Cross-module features (email-to-calendar, document-from-email, AI summaries of threads)
  • Single URL with navigation between modules
  • Easier maintenance and fewer things to break
  • ---

    ChatGPT / LLM Integration Details

    API Approach

    // Example: Summarise an email thread for a writing project
    $client = new OpenAI\Client(getenv('OPENAI_API_KEY'));
    $response = $client->chat()->create([
        'model' => 'gpt-4o',
        'messages' => [
            ['role' => 'system', 'content' => 'Summarise this email thread...'],
            ['role' => 'user', 'content' => $threadContent],
        ],
    ]);

    Use Cases

  • Thread summarisation: "What does Charlie want me to do about the Mexico chapter?"
  • Version comparison: "What changed between v2.0 and v3.4 of Darley_MexicanOil?"
  • Draft generation: "Write a section on Mexico's petroleum product trade deficit based on these charts and emails"
  • Edit integration: "Apply Charlie's Word doc edits to my Google Doc version"
  • Research compilation: "Gather all data points about Mexico's oil production from these 50 emails"
  • Cost Considerations

  • GPT-4o: ~$2.50/1M input tokens, ~$10/1M output tokens
  • Typical email thread summary: ~2K tokens input, ~500 output = ~$0.01
  • Full chapter revision assistance: ~50K tokens = ~$0.60
  • Monthly budget estimate: $5-20 depending on usage intensity
  • ---

    Documentation Access

    All Zap platform documentation is now accessible via web viewer at:

  • Web Interface: https://zap.orcus.lan/projects/zap-docs
  • Cross-Reference: See orcus.lan server documentation at https://orcus.lan/docs.php
  • File Location: /var/www/zap/docs/ (source files)
  • The web viewer provides enhanced navigation, search, and mobile-friendly access to all platform documentation including CHANGELOG, CODING_HISTORY, ROADMAP, TESTING, INFRASTRUCTURE, and topic guides.

    ---

    Open Questions

  • Database choice: RESOLVED -- migrated to PostgreSQL 17 for concurrent access, full-text search (tsvector), pgvector (forthcoming), and JSONB support.
  • Frontend framework: Current PHP templates with vanilla JS. Move to a modern framework (Alpine.js? htmx? Vue?) for the unified UI?
  • Hosting: Continue on orcus.lan or consider cloud deployment for reliability?
  • Multi-user: Currently single-user (Julian's Gmail). Will this ever need multi-user support?
  • ---

    Quick Reference: Current CLI Tools

    Zap-Mail (/var/www/zap-mail/bin/)

    ScriptPurpose
    -----------------
    backfill-history.phpImport historical emails with custom Gmail queries
    search-drive.phpSearch Google Drive, list revisions, export docs
    mexico-chapter-timeline.phpGenerate timeline for Mexico chapter project
    detect-replies.phpDetect which emails have been replied to
    sync-sent.phpSync sent messages for reply detection
    apply-categories.phpApply category rules to existing messages
    llm-categorize.phpAI-powered email categorisation
    merge-labels.phpMerge Gmail label data

    Cron Jobs

  • cron/sync.php -- Main email sync (every minute)
  • cron/monitor-sync.php -- Alert if sync stops working