About Gutenberg Digital Publishing
We are building an AI‑native knowledge infrastructure that converts Arabic heritage publications from scanned archives into structured, machine‑readable datasets, semantic indexes, and knowledge‑graph outputs.
Business Description
We solve a structural gap in Arabic data: historic content exists mostly as unindexed scans. Our pipeline performs OCR, metadata extraction, entity/topic enrichment, and semantic indexing—producing API‑ready outputs for research and AI applications.
Target Users
- Universities & research centers
- NLP/LLM labs & developers
- Libraries & cultural institutions
Product
Traction
- Processed pages (production +43,000 in pipeline)
- Indexed articles: 8,000++
- Validated OCR accuracy: 99%+
- Pages in expansion pipeline: +400,000
Pipeline overview
- Imaging capture + QA
- Multi‑stage OCR + text cleanup
- Metadata extraction (issue/article/author/topics)
- Entity extraction & semantic linking
- Semantic index + knowledge graph + APIs
Business Model (Revenue)
- API Access (SaaS): usage‑based plans for /articles, /search, and /graph endpoints.
- Institutional subscriptions: advanced analytics dashboards, exports/snapshots, and controlled access.
- Dataset licensing: licensing the structured corpus and knowledge‑graph snapshots under clear terms.
- Enterprise services: running the pipeline for private archives and delivering structured outputs.
Team & Advisors
Ahmed Elwakil
Founder & Managing Director
25+ years building Arabic digital content and knowledge systems. Founder of Arabia for Research & Information Systems; leading the transformation of heritage archives into structured, searchable knowledge.
Ahmed El-Dakhakhny
Engineering Consultant · Technical Advisor
Senior software engineer & technical architect (9+ years). Expertise in distributed systems and performance optimization; contributed to scaling fintech products to 500K+ monthly active users.
Dr. Abdel-Razek Eissa
Research Advisor · Modern History
Supports scholarly verification, historical context, and research guidance for the archive outputs.
Adel Naggaar
Project Supervisor · Sources & Acquisition
Oversees source acquisition and supply coordination, supporting project execution management and overall operations.
Operations team: Imaging (Alaa Mahmoud, Hager Morshed, Noura El-Qabbani) · Review (Rashid El-Khashab, Ali El-Helaly) · Indexing (Mohamed Badran, Abdelrahman Sherif) · Quality Control (Khadija Tamim, Youssef Elwakil). · Linguistics advisor (Ghareeb Qassem).
Contact
Email: editor@gutenbergdigital.net
Phone: +201008000450