About Gutenberg Digital Publishing

We are building an AI‑native knowledge infrastructure that converts Arabic heritage publications from scanned archives into structured, machine‑readable datasets, semantic indexes, and knowledge‑graph outputs.

Startup overview Live MVP demo Product screenshots

Business Description

We solve a structural gap in Arabic data: historic content exists mostly as unindexed scans. Our pipeline performs OCR, metadata extraction, entity/topic enrichment, and semantic indexing—producing API‑ready outputs for research and AI applications.

Target Users

  • Universities & research centers
  • NLP/LLM labs & developers
  • Libraries & cultural institutions

Product

Live MVP demo: /1/index.html
Startup overview: /startup.html
Product screenshots: /screenshots.html

Traction

  • Processed pages (production +43,000 in pipeline)
  • Indexed articles: 8,000++
  • Validated OCR accuracy: 99%+
  • Pages in expansion pipeline: +400,000

Pipeline overview

  1. Imaging capture + QA
  2. Multi‑stage OCR + text cleanup
  3. Metadata extraction (issue/article/author/topics)
  4. Entity extraction & semantic linking
  5. Semantic index + knowledge graph + APIs

Business Model (Revenue)

  • API Access (SaaS): usage‑based plans for /articles, /search, and /graph endpoints.
  • Institutional subscriptions: advanced analytics dashboards, exports/snapshots, and controlled access.
  • Dataset licensing: licensing the structured corpus and knowledge‑graph snapshots under clear terms.
  • Enterprise services: running the pipeline for private archives and delivering structured outputs.

Team & Advisors

Ahmed Elwakil
Founder & Managing Director
25+ years building Arabic digital content and knowledge systems. Founder of Arabia for Research & Information Systems; leading the transformation of heritage archives into structured, searchable knowledge.
Ahmed El-Dakhakhny
Engineering Consultant · Technical Advisor
Senior software engineer & technical architect (9+ years). Expertise in distributed systems and performance optimization; contributed to scaling fintech products to 500K+ monthly active users.
Dr. Abdel-Razek Eissa
Research Advisor · Modern History
Supports scholarly verification, historical context, and research guidance for the archive outputs.
Adel Naggaar
Project Supervisor · Sources & Acquisition
Oversees source acquisition and supply coordination, supporting project execution management and overall operations.
Operations team: Imaging (Alaa Mahmoud, Hager Morshed, Noura El-Qabbani) · Review (Rashid El-Khashab, Ali El-Helaly) · Indexing (Mohamed Badran, Abdelrahman Sherif) · Quality Control (Khadija Tamim, Youssef Elwakil). · Linguistics advisor (Ghareeb Qassem).

Contact