comic-translate: an AI-powered comic translator that leverages LLMs and inpainting to translate text within comic panels

comic-translate: an AI-powered comic translator that leverages LLMs and inpainting to translate text within comic panels

What it solves

Comic Translate is designed to translate comics from various languages (including English, Korean, Japanese, French, Simplified Chinese, Traditional Chinese, Russian, German, Dutch, Spanish, and Italian) into other languages, overcoming the limitations of traditional machine translators that often struggle with distant language pairs.

How it works

The project uses a multi-stage pipeline to process comic pages:

  1. Speech Bubble Detection: Uses a custom RT-DETR-v2 model trained on 11k comic images to detect bubbles and segment text.
  2. OCR: Employs various OCR engines depending on the language (manga-ocr for Japanese, Pororo for Korean, and PPOCRv5 for others), with optional support for Gemini 2.0 Flash and Microsoft Azure Vision.
  3. Inpainting: Removes existing text from the image using a Manga/Anime finetuned LaMa checkpoint or an AOT-GAN based model to clean the bubbles.
  4. Translation: Leverages SOTA LLMs (such as GPT-4, Claude, and Gemini) to translate the text. The models are provided with the entire page's text for context, and optionally the image itself.
  5. Text Rendering: Renders the translated text back into the original bounding boxes.

Who it’s for

Comic and manga readers who want to translate comics from foreign languages they cannot read, as well as those who looking for a professional-grade translation quality using LLMs.

Highlights

  • LLM-Powered Translation: Uses state-of-the-art LLMs for higher quality translations compared to traditional tools.
  • Comprehensive Language Support: Supports a wide range of global languages.
  • Multi-modal Pipeline: Combines detection, OCR, and inpainting to provide a seamless translation experience.
  • Flexible Deployment: Available as a desktop app for Windows and macOS, and a browser extension for Chromium-based browsers.

Sources