Paperless Stack

Overview

The Paperless stack provides a self-hosted document management and archive system. It digitizes paper documents, manages PDFs, and makes them fully searchable through OCR. It replaces services like Evernote and Notion for document organization while offering better privacy and control.

Components

Paperless NGX

Image: ghcr.io/paperless-ngx/paperless-ngx:2.20.3
Purpose: Document scanning, OCR, and management
Container Name: paperless
Access: https://paperless.{{ main_domain }}
Storage: /mnt/storage/paperless/

Paperless Redis

Image: redis:8.4.0
Purpose: Caching and task queue
Container Name: paperless_redis
Memory Limit: 12MB

Paperless Gotenberg

Purpose: PDF generation and processing
Container Name: paperless_gotenberg

Paperless Tika

Purpose: Document format conversion and extraction
Container Name: paperless_tika

Key Features

Document OCR: Convert scanned PDFs to searchable text
Auto-Tagging: Automatic classification using machine learning
Full-Text Search: Search across all documents and extracted text
Consumption: Monitor watch folder for new documents
Thumbnails: Generate previews for quick browsing
Export: Export documents and archive
Multi-User: Support for multiple users with separate libraries
Barcode Recognition: Separate documents using barcodes
Backup Integration: Daily automated backups

Dependencies

Required Stacks

Databases: PostgreSQL database for document metadata and user accounts
Backbone: Traefik for HTTPS termination
Monitoring (optional): Service health monitoring

Network Configuration

internal network: Internal communication between Paperless components
web network: Exposed via Traefik for web access
db network: Database connectivity

Storage

Documents Directory: /mnt/storage/paperless/ - stores all documents and thumbnails
Backups: Automated daily backups configured via role
Redis Data: {{ docker_mounts_directory }}/paperless/redis/

Document Processing Pipeline

Consumption: Watch folder monitors for new PDFs
Processing: Extract text via Tika, generate thumbnails via Gotenberg
OCR: Extract searchable text from scanned documents
Indexing: Index documents for full-text search
Storage: Archive in organized structure

Deployment Notes

Paperless depends on Redis for task queue and Gotenberg/Tika for document processing
Health checks verify service availability every 30 seconds
Container restarts if health check fails after 5 retries
Storage volumes are mounted for document persistence
Database must be pre-created with appropriate user credentials
Multiple document formats supported: PDF, PNG, JPEG, TIFF, etc.

User-Facing Features

Web Interface: Browse and search documents
Document Upload: Drag-and-drop or folder watch import
Tagging: Manual and automatic tag assignment
Collections: Organize documents into collections
Search: Full-text search across document content
Export: Download documents in various formats
Sharing: Share documents with other users
Workflow: Automate processing with matching rules

Paperless integrates with the Monitoring stack for health checks and uses the Databases stack for metadata storage.