Projects with this topic
-
Advanced enterprise Free Open Source DMS (document management system).
Updated -
-
Cloud-native intelligent document processing pipeline using AWS serverless services, OCR workflows, and event-driven automation for structured document extraction.
Updated -
Script Python permettant d’extraire des images depuis un manifeste IIIF, de les traiter avec Tesseract OCR, et de générer des fichiers de sortie dans différents formats. Il est conçu pour les bibliothèques, archives et projets de numérisation nécessitant une reconnaissance optique de caractères imprimés brutes.
Updated -
A modular Clinical NLP Pipeline built to process and analyze unstructured medical text using both traditional machine learning and transformer-based approaches.
The project combines multiple components including OCR, text preprocessing, feature engineering, classification, named entity recognition, and visualization into a single end-to-end pipeline. It supports extracting clinical insights from raw documents and predicting medical categories using both TF-IDF + SVM and BERT-based models.
The system was designed and implemented as a structured Python project, with each stage separated into independent modules for scalability and maintainability.
Key Highlights
Built an end-to-end NLP pipeline for clinical text processing. Implemented SVM (≈51% accuracy) and BERT (≈77% accuracy) models. Integrated OCR for extracting text from medical documents. Performed Named Entity Recognition (NER) on clinical data. Designed modular architecture (src/) for clean code organization. Exported outputs for visualization and dashboard integration.Updated -
DocuMind es un sistema de organización automática de documentos para Linux desktop, impulsado por IA local (Ollama/Llama3 o HuggingFace). Procesa PDFs, imágenes, vídeos, audio y código: extrae texto/OCR, transcribe, analiza contenido y clasifica/archiva según ISO 15489 (facturas, legal, trabajo, personal, multimedia). Detecta duplicados, registra auditoría en SQLite y prioriza privacidad offline.
Desarrollada en Python 3.10+ con PyMuPDF, Tesseract, Vosk/Whisper, multiprocessing y optimizaciones (xxHash, caching, GPU), demuestra expertise en integración LLM locales/multimodales, procesamiento paralelo, arquitectura modular escalable y evolución hacia GUI PyQt6 con drag-and-drop, búsqueda full-text y empaquetado RPM/Flatpak. (612 caracteres)
Updated -
-
Solving 4chan captcha
Updated -
-
-
Sistema event-driven con Kafka que transforma documentos no estructurados en especificaciones de software completas. Extrae texto con OCR, procesa NER con transformers, clasifica oraciones y generar SRS en múltiples formatos.
Updated -
Process UrT gameplay to gather distance stats for Game Life Balance: https://game-life-balance.com
Updated -
This project focuses on developing a prototype application for extracting headlines and content from digitized newspaper images stored in the SIDAK (Sistem Informasi Database Koleksi) system of the Monumen Pers Nasional, utilizing computer vision and deep learning techniques.
The prototype aims to overcome the limitations of standard OCR tools by integrating YOLOv8 object detection to precisely identify and separate newspaper headlines and article content before text extraction.
Updated -
MiniAiLive Intelligent ID OCR for Reliable Identity Verification From document verification to data entry, our MiniAiLive OCR solution can help transform your identity verification process.
Updated -
MiniAiLive Intelligent ID OCR for Reliable Identity Verification From document verification to data entry, our MiniAiLive OCR solution can help transform your identity verification process.
Updated -
MiniAiLive's Complete Document Liveness Detection Solution for Digital Onboarding
Updated -
-
U-Net Adaptive Generalized Image Binarization for Documents
Mirrored from: https://github.com/venkatakolagotla/robin Originally forked from: https://github.com/masyagin1998/robin
Updated -
Simple application to tecognize custom text
Updated