O
ocr

Projects with this topic

View Mayan EDMS project

Mayan EDMS / Mayan EDMS

Advanced enterprise Free Open Source DMS (document management system).

Django Python ocr document man... pdf indexing dms enterprise workflow business business-pro... antivirus

672

Updated Jun 08, 2026

672 318 15 172

Updated Jun 08, 2026
View penmark-python project

penmark / penmark-python

Python port of penmark — OCR handwritten/printed paper into .txt/.md/.csv via Ollama vision models. Faithful port of the Rust penmark v0.5.6.

penmark Python ocr Markdown

0

Updated Jun 03, 2026

0 0 0 0

Updated Jun 03, 2026
View aws-intelligent-document-processing project

Sania Hashmi / aws-intelligent-document-processing

Cloud-native intelligent document processing pipeline using AWS serverless services, OCR workflows, and event-driven automation for structured document extraction.

Amazon Web S... AWS Textract Amazon Web S... AWS Lambda Terraform ocr document-pro... cloud-computing event-driven Python intelligent-... s3 AWS CloudWatch

0

Updated May 26, 2026

0 0 0 0

Updated May 26, 2026
View iiif2ocr project

François Gandolfi / iiif2ocr

Script Python permettant d’extraire des images depuis un manifeste IIIF, de les traiter avec Tesseract OCR, et de générer des fichiers de sortie dans différents formats. Il est conçu pour les bibliothèques, archives et projets de numérisation nécessitant une reconnaissance optique de caractères imprimés brutes.

iiif ocr

0

Updated May 15, 2026

0 0 0 0

Updated May 15, 2026
View clinical-nlp-pipeline project

Thippesh Mugalikatte Siddappa / clinical-nlp-pipeline

A modular Clinical NLP Pipeline built to process and analyze unstructured medical text using both traditional machine learning and transformer-based approaches.

The project combines multiple components including OCR, text preprocessing, feature engineering, classification, named entity recognition, and visualization into a single end-to-end pipeline. It supports extracting clinical insights from raw documents and predicting medical categories using both TF-IDF + SVM and BERT-based models.

The system was designed and implemented as a structured Python project, with each stage separated into independent modules for scalability and maintainability.

Key Highlights
Built an end-to-end NLP pipeline for clinical text processing. Implemented SVM (≈51% accuracy) and BERT (≈77% accuracy) models. Integrated OCR for extracting text from medical documents. Performed Named Entity Recognition (NER) on clinical data. Designed modular architecture (src/) for clean code organization. Exported outputs for visualization and dashboard integration.

Python machine lear... data science NLP(Natural ... BERT bart ocr SVM text classif... TFIDF named entity... deep learning

0

Updated Apr 26, 2026

0 0 0 0

Updated Apr 26, 2026
View DocuMind project

ALEXENDROS.me / DocuMind

DocuMind es un sistema de organización automática de documentos para Linux desktop, impulsado por IA local (Ollama/Llama3 o HuggingFace). Procesa PDFs, imágenes, vídeos, audio y código: extrae texto/OCR, transcribe, analiza contenido y clasifica/archiva según ISO 15489 (facturas, legal, trabajo, personal, multimedia). Detecta duplicados, registra auditoría en SQLite y prioriza privacidad offline.

Desarrollada en Python 3.10+ con PyMuPDF, Tesseract, Vosk/Whisper, multiprocessing y optimizaciones (xxHash, caching, GPU), demuestra expertise en integración LLM locales/multimodales, procesamiento paralelo, arquitectura modular escalable y evolución hacia GUI PyQt6 con drag-and-drop, búsqueda full-text y empaquetado RPM/Flatpak. (612 caracteres)

Linux Python local-ai Document-Man... ollama ocr multimedia-p... desktop-app SQLite offline-ai automation pyqt6

0

Updated Mar 23, 2026

0 0 0 0

Updated Mar 23, 2026
View INT1341 License Plate Detection project

Ha Duy Long / INT1341 License Plate Detection

An AI-based computer vision project for automatic vehicle license plate detection and recognition using deep learning and OCR

AI ocr yolo Python

0

Updated Mar 16, 2026

0 0 0 0

Updated Mar 16, 2026
View ChanCaptcha project

Crimson Tomato / ChanCaptcha

Solving 4chan captcha

ocr captcha 4chan computer vision firefox-exte...

0

Updated Mar 15, 2026

0 0 0 0

Updated Mar 15, 2026
View EasyOCR with FastAPI project

Tamal Ahmed / EasyOCR with FastAPI

AnalyzeWithOCR is a FastAPI-based backend service that downloads a PDF from a public URL, performs layout-aware text extraction and OCR, and returns structured, page-wise text output via a REST API.

Python fastapi ocr easyocr pdf

0

Updated Feb 18, 2026

0 0 0 0

Updated Feb 18, 2026
View cloudboys-portfolio project

Jones Johnsson / cloudboys-portfolio

Cloud-native data engineering + ML POC: ingest Reddit images, run OCR, store results in BigQuery/Cloud Storage, and serve analytics via FastAPI + a React dashboard.

Data Enginee... gcp fastapi ocr Python React Docker Git devops Markdown

0

Updated Jan 25, 2026

0 0 0 0

Updated Jan 25, 2026
View SRS Platform project

Pulga / SRS Platform

Sistema event-driven con Kafka que transforma documentos no estructurados en especificaciones de software completas. Extrae texto con OCR, procesa NER con transformers, clasifica oraciones y generar SRS en múltiples formatos.

ocr NLP kafka Python transformers

0

Updated Jan 05, 2026

0 0 0 0

Updated Jan 05, 2026
View UrT OCR project

Josh / UrT OCR

Process UrT gameplay to gather distance stats for Game Life Balance: https://game-life-balance.com

ocr opencv gaming data analysis

0

Updated Nov 15, 2025

0 0 0 0

Updated Nov 15, 2025
View ocr-layout-newspaper-yolov8 project

hendarto kurniawan / ocr-layout-newspaper-yolov8

This project focuses on developing a prototype application for extracting headlines and content from digitized newspaper images stored in the SIDAK (Sistem Informasi Database Koleksi) system of the Monumen Pers Nasional, utilizing computer vision and deep learning techniques.

The prototype aims to overcome the limitations of standard OCR tools by integrating YOLOv8 object detection to precisely identify and separate newspaper headlines and article content before text extraction.

machine lear... yolov8 object detec... ocr

0

Updated Jul 15, 2025

0 0 0 0

Updated Jul 15, 2025
View ID-DocumentRecognition-Linux project

MiniAiLive / ID-DocumentRecognition-Linux

MiniAiLive Intelligent ID OCR for Reliable Identity Verification From document verification to data entry, our MiniAiLive OCR solution can help transform your identity verification process.

document rec... Identity Ver... document reader kyc eKYC ocr id card reco... id document ... user-onboarding On-Premises

1

Updated Oct 15, 2024

1 0 0 0

Updated Oct 15, 2024
View ID-DocumentRecognition-Windows project

MiniAiLive / ID-DocumentRecognition-Windows

MiniAiLive Intelligent ID OCR for Reliable Identity Verification From document verification to data entry, our MiniAiLive OCR solution can help transform your identity verification process.

document rec... id card reco... id document ... passport reader On-Premises user-onboarding kyc eKYC Identity Ver... ocr document ver...

1

Updated Oct 15, 2024

1 0 0 0

Updated Oct 15, 2024
View ID-DocumentLivenessDetection-Windows project

MiniAiLive / ID-DocumentLivenessDetection-Windows

MiniAiLive's Complete Document Liveness Detection Solution for Digital Onboarding

document rec... document liv... onboarding kyc eKYC ocr Driver License passport document reader id card veri... id card reco...

1

Updated Oct 15, 2024

1 0 0 0

Updated Oct 15, 2024
View exam-analysis project

outsource / exam-analysis

Python ocr

0

Updated Sep 27, 2024

0 0 0 0

Updated Sep 27, 2024
View UNAGI project

vkolagotla / UNAGI

U-Net Adaptive Generalized Image Binarization for Documents

Mirrored from: https://github.com/venkatakolagotla/robin Originally forked from: https://github.com/masyagin1998/robin

unet ocr binarization preprocessing semantic seg... unet segment... text binariz... python3 python package

2

Updated Dec 29, 2023

2 0 0 0

Updated Dec 29, 2023
View Tesseract Text Recognition App project

Arseny / Tesseract Text Recognition App

Simple application to tecognize custom text

Python tesseract ocr text recogni...

0

Updated Sep 30, 2023

0 0 0 0

Updated Sep 30, 2023
View SinergyRH Automatic Clock-In project

Rafael Almeida de Bem / SinergyRH Automatic Clock-In

automation Python ocr

2

Updated Mar 07, 2023

2 0 0 0

Updated Mar 07, 2023