PrivyDoc — Local Document Intelligence Tool

Read the Tech Community article: PrivyDoc — Building a Zero Data Leak AI with Foundry Local & Microsoft Agent Framework

2025 | Python, Foundry Local, Chainlit, PDF/DOCX Processing | GitHub

⭐ 7 stars | 4 forks

PrivyDoc is a secure, on-device document analysis solution powered by Microsoft Foundry Local, designed to handle sensitive documents without relying on the cloud. All AI-powered analysis happens locally on your device, ensuring complete data privacy.

Problem / Motivation

Organizations and individuals working with sensitive documents face critical challenges:

Data Privacy Concerns: Cloud-based document analysis tools expose confidential information to external services
Compliance Requirements: Regulatory restrictions prevent uploading sensitive documents to third-party platforms
Air-Gap Environments: Secure facilities require offline document processing capabilities
Limited Control: Cloud solutions lack transparency in how documents are processed and stored
Trust Issues: Users need verifiable guarantees that their data never leaves their device

PrivyDoc addresses these challenges by bringing AI-powered document analysis directly to your device using Microsoft Foundry Local, ensuring 100% local processing with zero data transmission.

Core Functionalities

Secure Document Processing

Multi-Format Support: Process PDF and DOCX files while preserving formatting
Structure Recognition: Automatically identify document sections and hierarchy
Text Extraction: Clean and normalize text while maintaining contextual cues
100% Local Processing: All operations happen on-device with no external data transmission

AI-Powered Analysis

Smart Summarization: Generate concise overviews of entire documents or specific sections
Entity Recognition: Detect and extract people, organizations, locations, dates, and custom entities
Sentiment Analysis: Analyze emotional tone at both document and section levels
Topic Classification: Auto-categorize documents by subject matter for organization

Security & Compliance Features

Zero Data Transmission: No network calls except for local Foundry Local model loading
Air-Gap Compatible: Works in completely offline environments after initial setup
Analysis Traceability: Comprehensive logging of all processing interactions
Document Fingerprinting: Verify document integrity and processing history
Local Storage: All results saved in local JSON database for audit purposes

Export & Integration

Multiple Formats: Export analysis results as Markdown, JSON, or CSV
Structured Data: Standardized output schema for downstream processing
Analysis History: Browse and retrieve previous analyses with metadata
Batch Processing: Handle multiple documents efficiently

Description / How It Works

Document Upload: Users upload PDF or DOCX files via web interface (Chainlit) or command line
Text Extraction: System extracts and normalizes text while preserving structure
Section Analysis: AI agent identifies logical document sections and hierarchy
Entity Extraction: NER agent detects people, organizations, locations, dates, and custom entities
Content Analysis: Analyzer agent generates summaries and performs sentiment analysis
Results Compilation: All findings are structured and saved to local JSON database
Export Options: Results available for download in Markdown, JSON, or CSV formats

Challenges & Issues Addressed

Data Privacy: Foundry Local ensures all LLM processing happens on-device without cloud dependencies
Model Performance: Selected lightweight models (qwen2.5-0.5b, phi-3.5-mini, phi-4) optimized for local execution
Processing Speed: Implemented caching and batch processing for efficient analysis
Memory Management: Designed for systems with 8-16GB RAM using optimized model loading
Format Compatibility: Robust PDF and DOCX parsing handles various document structures
User Experience: Chainlit web interface provides intuitive progress tracking and result exploration

Tech Stack & Frameworks

Languages / Frameworks: Python 3.10+, Chainlit (Web UI)
AI / ML: Microsoft Foundry Local, qwen2.5-0.5b (default), phi-3.5-mini, phi-4
Document Processing: pdfplumber (PDF extraction), python-docx (DOCX extraction)
Storage: JSON-based local analysis history
Deployment: Local web server (Chainlit) and CLI
Environment: Windows 10/11, macOS 12+, Linux (Ubuntu 20.04+)

Features / Capabilities

Multi-Format Document Support: PDF and DOCX with structure preservation
Local AI Analysis: Powered by Microsoft Foundry Local models
Comprehensive Entity Recognition: People, organizations, locations, dates, custom entities
Sentiment Analysis: Document-level and section-level emotional tone detection
Smart Summarization: Concise overviews for entire documents or sections
Topic Classification: Auto-categorize documents by subject
Web UI & CLI: Flexible interfaces for different workflows
Export Options: Markdown, JSON, CSV formats
Analysis History: Local JSON database for tracking previous analyses
Batch Processing: Handle multiple documents efficiently
Air-Gap Compatible: Works completely offline after initial model download

Potential Applications

Policy Teams: Analyze internal policy documents without exposing to cloud services
Legal Teams: Review confidential contracts and legal documents privately
Compliance Teams: Process regulatory documents with guaranteed data residency
Researchers: Handle sensitive research materials in secure environments
Healthcare: Analyze medical documents while maintaining HIPAA compliance
Financial Services: Process confidential financial documents in air-gap environments
Government: Handle classified or sensitive documents with security guarantees

Future Enhancements

Add support for additional document formats (TXT, RTF, HTML)
Implement custom entity training for domain-specific extraction
Develop document comparison and diff analysis features
Add support for larger Foundry Local models on high-memory systems
Implement multi-document relationship analysis
Create browser extension for in-browser document analysis
Add OCR capabilities for scanned document processing

Learning Outcomes

Implemented local-first AI architecture using Microsoft Foundry Local
Built multi-agent document processing pipeline with specialized analysis agents
Developed secure document analysis workflows with guaranteed data privacy
Integrated Chainlit web interface for interactive document exploration
Designed air-gap compatible system for offline operation
Learned lightweight model optimization for resource-constrained environments
Implemented comprehensive entity recognition with sentiment analysis

Links

GitHub Repository: PrivyDoc
Tech Community Article: PrivyDoc: Building a Zero Data Leak AI with Foundry Local & Microsoft Agent Framework

Note

PrivyDoc is designed for maximum data privacy. All document processing happens locally using Microsoft Foundry Local. No data is transmitted to external services except for initial model downloads. Ideal for handling sensitive, confidential, or regulated documents.