Skip to content

PrivyDoc — Local Document Intelligence Tool

2025 | Python, Foundry Local, Chainlit, PDF/DOCX Processing | GitHub

PrivyDoc is a secure, on-device document analysis solution powered by Microsoft Foundry Local, designed to handle sensitive documents without relying on the cloud. All AI-powered analysis happens locally on your device, ensuring complete data privacy.

Problem / Motivation

Organizations and individuals working with sensitive documents face critical challenges:

  • Data Privacy Concerns: Cloud-based document analysis tools expose confidential information to external services
  • Compliance Requirements: Regulatory restrictions prevent uploading sensitive documents to third-party platforms
  • Air-Gap Environments: Secure facilities require offline document processing capabilities
  • Limited Control: Cloud solutions lack transparency in how documents are processed and stored
  • Trust Issues: Users need verifiable guarantees that their data never leaves their device

PrivyDoc addresses these challenges by bringing AI-powered document analysis directly to your device using Microsoft Foundry Local, ensuring 100% local processing with zero data transmission.

Core Functionalities

Secure Document Processing

  • Multi-Format Support: Process PDF and DOCX files while preserving formatting
  • Structure Recognition: Automatically identify document sections and hierarchy
  • Text Extraction: Clean and normalize text while maintaining contextual cues
  • 100% Local Processing: All operations happen on-device with no external data transmission

AI-Powered Analysis

  • Smart Summarization: Generate concise overviews of entire documents or specific sections
  • Entity Recognition: Detect and extract people, organizations, locations, dates, and custom entities
  • Sentiment Analysis: Analyze emotional tone at both document and section levels
  • Topic Classification: Auto-categorize documents by subject matter for organization

Security & Compliance Features

  • Zero Data Transmission: No network calls except for local Foundry Local model loading
  • Air-Gap Compatible: Works in completely offline environments after initial setup
  • Analysis Traceability: Comprehensive logging of all processing interactions
  • Document Fingerprinting: Verify document integrity and processing history
  • Local Storage: All results saved in local JSON database for audit purposes

Export & Integration

  • Multiple Formats: Export analysis results as Markdown, JSON, or CSV
  • Structured Data: Standardized output schema for downstream processing
  • Analysis History: Browse and retrieve previous analyses with metadata
  • Batch Processing: Handle multiple documents efficiently

Description / How It Works

  1. Document Upload: Users upload PDF or DOCX files via web interface (Chainlit) or command line
  2. Text Extraction: System extracts and normalizes text while preserving structure
  3. Section Analysis: AI agent identifies logical document sections and hierarchy
  4. Entity Extraction: NER agent detects people, organizations, locations, dates, and custom entities
  5. Content Analysis: Analyzer agent generates summaries and performs sentiment analysis
  6. Results Compilation: All findings are structured and saved to local JSON database
  7. Export Options: Results available for download in Markdown, JSON, or CSV formats

Challenges & Issues Addressed

  • Data Privacy: Foundry Local ensures all LLM processing happens on-device without cloud dependencies
  • Model Performance: Selected lightweight models (qwen2.5-0.5b, phi-3.5-mini, phi-4) optimized for local execution
  • Processing Speed: Implemented caching and batch processing for efficient analysis
  • Memory Management: Designed for systems with 8-16GB RAM using optimized model loading
  • Format Compatibility: Robust PDF and DOCX parsing handles various document structures
  • User Experience: Chainlit web interface provides intuitive progress tracking and result exploration

Tech Stack & Frameworks

  • Languages / Frameworks: Python 3.10+, Chainlit (Web UI)
  • AI / ML: Microsoft Foundry Local, qwen2.5-0.5b (default), phi-3.5-mini, phi-4
  • Document Processing: pdfplumber (PDF extraction), python-docx (DOCX extraction)
  • Storage: JSON-based local analysis history
  • Deployment: Local web server (Chainlit) and CLI
  • Environment: Windows 10/11, macOS 12+, Linux (Ubuntu 20.04+)

Features / Capabilities

  • Multi-Format Document Support: PDF and DOCX with structure preservation
  • Local AI Analysis: Powered by Microsoft Foundry Local models
  • Comprehensive Entity Recognition: People, organizations, locations, dates, custom entities
  • Sentiment Analysis: Document-level and section-level emotional tone detection
  • Smart Summarization: Concise overviews for entire documents or sections
  • Topic Classification: Auto-categorize documents by subject
  • Web UI & CLI: Flexible interfaces for different workflows
  • Export Options: Markdown, JSON, CSV formats
  • Analysis History: Local JSON database for tracking previous analyses
  • Batch Processing: Handle multiple documents efficiently
  • Air-Gap Compatible: Works completely offline after initial model download

Potential Applications

  • Policy Teams: Analyze internal policy documents without exposing to cloud services
  • Legal Teams: Review confidential contracts and legal documents privately
  • Compliance Teams: Process regulatory documents with guaranteed data residency
  • Researchers: Handle sensitive research materials in secure environments
  • Healthcare: Analyze medical documents while maintaining HIPAA compliance
  • Financial Services: Process confidential financial documents in air-gap environments
  • Government: Handle classified or sensitive documents with security guarantees

Future Enhancements

  • Add support for additional document formats (TXT, RTF, HTML)
  • Implement custom entity training for domain-specific extraction
  • Develop document comparison and diff analysis features
  • Add support for larger Foundry Local models on high-memory systems
  • Implement multi-document relationship analysis
  • Create browser extension for in-browser document analysis
  • Add OCR capabilities for scanned document processing

Learning Outcomes

  • Implemented local-first AI architecture using Microsoft Foundry Local
  • Built multi-agent document processing pipeline with specialized analysis agents
  • Developed secure document analysis workflows with guaranteed data privacy
  • Integrated Chainlit web interface for interactive document exploration
  • Designed air-gap compatible system for offline operation
  • Learned lightweight model optimization for resource-constrained environments
  • Implemented comprehensive entity recognition with sentiment analysis

Note

PrivyDoc is designed for maximum data privacy. All document processing happens locally using Microsoft Foundry Local. No data is transmitted to external services except for initial model downloads. Ideal for handling sensitive, confidential, or regulated documents.