Skip to main content

Overview

Meikipop includes several built-in OCR providers optimized for Japanese text recognition. Each provider offers different trade-offs between accuracy, speed, cost, and resource requirements.

Provider comparison

ProviderTypeSpeedAccuracyRequirementsBest for
DummyLocalInstantN/ANoneDevelopment and testing
meikiocrLocalFastHighGPU recommendedOffline gaming, privacy
Google Lens v2RemoteMediumVery HighInternetOnline use, best accuracy
owocrHybridMediumHighowocr daemonFlexible deployment
Chrome Screen AILocalFastMediumChrome componentsChrome browser integration

Built-in providers

Dummy OCR

The dummy provider is designed as a template for creating custom providers. It returns fixed mock data for testing.
src/ocr/providers/dummy/provider.py
class DummyProvider(OcrProvider):
    """
    A template for creating new OCR providers.
    
    When this provider is selected, it returns a fixed set of Japanese text
    to allow for testing of the popup window without a real OCR backend.
    """
    NAME = "Dummy OCR (Developer Template)"
Implementation highlights:
  • Returns hardcoded Japanese text with both horizontal and vertical examples
  • Demonstrates proper coordinate normalization
  • Shows character-level and word-level Word objects
  • Fully commented for educational purposes
Use cases:
  • Developing and testing UI without a real OCR backend
  • Template for creating custom providers
  • Understanding the data transformation process
Example output:
# Returns two paragraphs:
# 1. Horizontal: "これは横書きテキストです"
# 2. Vertical: "縦書き"

meikiocr (local)

The meikiocr provider uses a high-performance local model specifically optimized for Japanese video game text.
src/ocr/providers/meikiocr/provider.py
class MeikiOcrProvider(OcrProvider):
    """
    An OCR provider that uses the high-performance meikiocr library.
    This provider is specifically optimized for recognizing Japanese text from video games.
    """
    NAME = "meikiocr (local)"
    
    def __init__(self):
        self.ocr_client = MeikiOCR()
        logger.info(f"Running on: {self.ocr_client.active_provider}")
Implementation highlights:
  • Uses the meikiocr Python library
  • Converts PIL images to NumPy RGB arrays
  • Returns character-level boxes for precise lookups
  • Groups individual lines into paragraphs using postprocessing
  • Filters out non-Japanese text
Configuration:
DET_CONFIDENCE_THRESHOLD = 0.5  # Detection confidence
REC_CONFIDENCE_THRESHOLD = 0.1  # Recognition confidence
Processing pipeline:
1

Initialize

Creates a MeikiOCR client that handles model downloading and session management internally.
2

Convert image

Converts PIL Image to NumPy RGB array for library compatibility.
3

Run OCR

Calls run_ocr() with confidence thresholds to get character-level results.
4

Transform results

Converts [x1, y1, x2, y2] pixel coordinates to normalized BoundingBox objects.
5

Group paragraphs

Uses group_lines_into_paragraphs() to combine related lines.
Key methods:
def _to_normalized_bbox(self, bbox_pixels: list, img_width: int, img_height: int) -> BoundingBox:
    """Converts an [x1, y1, x2, y2] pixel bbox to a normalized meikipop BoundingBox."""
    x1, y1, x2, y2 = bbox_pixels
    box_w, box_h = x2 - x1, y2 - y1
    
    center_x = (x1 + box_w / 2) / img_width
    center_y = (y1 + box_h / 2) / img_height
    norm_w = box_w / img_width
    norm_h = box_h / img_height
    
    return BoundingBox(center_x, center_y, norm_w, norm_h)
Requirements:
  • Install: pip install meikiocr
  • GPU recommended for best performance
  • Models downloaded automatically on first run

Google Lens v2 (remote)

This provider sends screenshots to Google’s servers. Do not use with sensitive or private information.
src/ocr/providers/glensv2/provider.py
class GoogleLensOcrV2(OcrProvider):
    NAME = "Google Lens (remote)"
    
    def __init__(self):
        self._session = requests.Session()
        self._session.headers.update({
            'Content-Type': 'application/x-protobuf',
            'X-Goog-Api-Key': 'AIzaSyDr2UxVnv_U85AbhhY8XSHSIavUW0DC-sY',
            'User-Agent': 'Mozilla/5.0 ...'
        })
Implementation highlights:
  • Uses Google Lens API via protobuf protocol
  • Maintains persistent HTTP session for performance
  • Supports low-bandwidth mode (50% resolution, 16-color quantization)
  • Returns normalized coordinates directly (no conversion needed)
  • Filters for Japanese text using regex
Image processing:
if config.glens_low_bandwidth:
    # Reduce size by ~50%
    scale_factor = math.sqrt(0.5)
    new_width = int(image.width * scale_factor)
    new_height = int(image.height * scale_factor)
    processed_image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)
    # Reduce to 16 colors
    processed_image = processed_image.convert('L').quantize(colors=16)
    processed_image.save(bio, format='PNG')
else:
    # Standard quality
    processed_image.save(bio, format='JPEG', quality=90)
Text direction detection:
for para in glens_response.objects_response.text.text_layout.paragraphs:
    is_vertical = para.writing_direction == WritingDirection.TOP_TO_BOTTOM
Requirements:
  • Active internet connection
  • Accepts Google’s data processing terms
Performance:
  • Network latency: ~200-500ms typical
  • Request timeout: 10 seconds
  • Logs detailed timing information

owocr (WebSocket)

The owocr provider connects to a running owocr daemon via WebSocket, allowing flexible deployment options.
src/ocr/providers/owocr/provider.py
class OwocrWebsocketProvider(OcrProvider):
    """
    An OCR provider that connects to a running owocr instance via websockets.
    This provider uses the synchronous websockets client to maintain a
    persistent connection.
    """
    NAME = "owocr (Websocket)"
Implementation highlights:
  • Maintains persistent WebSocket connection
  • Automatic reconnection on connection loss
  • Uses direct IP (127.0.0.1) to avoid localhost resolution delays
  • Two-part response protocol (acknowledgment + JSON results)
  • Returns normalized coordinates directly
Connection handling:
OWOCR_WEBSOCKET_URI = "ws://127.0.0.1:7331"

def _connect(self) -> bool:
    try:
        self.websocket = connect(
            OWOCR_WEBSOCKET_URI,
            open_timeout=3,
            ping_interval=20,
            ping_timeout=20
        )
        return True
    except Exception as e:
        logger.error(f"Could not connect to owocr: {e}")
        logger.info("Please ensure owocr is running with:")
        logger.info("owocr -r websocket -w websocket -of json -e glens")
        return False
Communication protocol:
1

Send image

Converts PIL Image to BMP format and sends as binary.
2

Receive acknowledgment

Waits for “True” confirmation (5 second timeout).
3

Receive results

Waits for JSON response with OCR results (30 second timeout).
4

Transform data

Converts owocr’s format to meikipop’s Paragraph objects.
Retry logic:
for attempt in range(2):
    try:
        if self.websocket is None:
            if not self._connect():
                return None
        # ... perform scan
    except ConnectionClosed:
        logger.warning("Websocket connection lost. Will attempt to reconnect...")
        self.websocket = None
        if attempt == 0:
            continue  # Retry once
Requirements:
  • Running owocr daemon
  • Command: owocr -r websocket -w websocket -of json -e glens
  • WebSocket connection to localhost:7331

Chrome Screen AI (local)

This provider uses Chrome’s Screen AI component for local, offline OCR processing.
src/ocr/providers/screenai/provider.py
class ScreenAiOcr(OcrProvider):
    NAME = "Chrome Screen AI (local)"
    
    # Class-level variables to ensure the native DLL is only initialized ONCE
    _is_initialized = False
    _lib = None
Implementation highlights:
  • Uses Chrome’s native Screen AI library via ctypes
  • Singleton pattern for library initialization (once per app lifetime)
  • Suppresses verbose native library output
  • Returns character-level (symbol) boxes
  • Automatically downsizes large images (>4MP)
Library initialization:
base_dir = Path.home() / ".config" / "screen_ai"
model_dir = base_dir / "resources"
dll_name = 'chrome_screen_ai.dll' if sys.platform == 'win32' else 'libchromescreenai.so'
Image preparation:
if image.width * image.height > 4000000:
    image.thumbnail((2000, 2000), Image.Resampling.LANCZOS)

img_rgba = image.convert('RGBA')
width, height = img_rgba.size
img_bytes = img_rgba.tobytes()

# Create Skia bitmap structure
bitmap.fPixmap.fPixels = ctypes.cast(ctypes.c_char_p(img_bytes), ctypes.c_void_p)
bitmap.fPixmap.fRowBytes = width * 4
bitmap.fPixmap.fInfo.fColorInfo.fColorType = 4  # kRGBA_8888
Output suppression:
@contextmanager
def suppress_output():
    """Redirects C/C++ level stdout and stderr to devnull."""
    devnull = os.open(os.devnull, os.O_WRONLY)
    original_stdout = os.dup(1)
    original_stderr = os.dup(2)
    os.dup2(devnull, 1)
    os.dup2(devnull, 2)
    try:
        yield
    finally:
        # Restore original streams
        os.dup2(original_stdout, 1)
        os.dup2(original_stderr, 2)
Text direction detection:
is_vertical = (line_box.direction == 3)  # DIRECTION_TOP_TO_BOTTOM
Requirements:
  • Download Screen AI components from: https://chrome-infra-packages.appspot.com/p/chromium/third_party/screen-ai
  • Extract to: ~/.config/screen_ai/resources/
  • Platform: Windows (DLL) or Linux (SO)

Common patterns

Postprocessing: Grouping lines into paragraphs

Most providers use the shared group_lines_into_paragraphs() utility:
from src.ocr.providers.postprocessing import group_lines_into_paragraphs

# After converting to line-level Paragraph objects
raw_lines: List[Paragraph] = [...]
final_paragraphs = group_lines_into_paragraphs(raw_lines)
This function:
  • Combines adjacent lines into logical paragraphs
  • Respects text direction (vertical vs. horizontal)
  • Improves text readability and context

Japanese text filtering

Several providers filter for Japanese text:
import re

JAPANESE_REGEX = re.compile(r'[\u3040-\u309F\u30A0-\u30FF\u4E00-\u9FAF]')

line_has_japanese = any(JAPANESE_REGEX.search(w.plain_text) for w in line.words)
if not line_has_japanese:
    continue
Character ranges:
  • \u3040-\u309F: Hiragana
  • \u30A0-\u30FF: Katakana
  • \u4E00-\u9FAF: Kanji

Selecting a provider

Choose based on your requirements: For offline gaming:
Use meikiocr (local) - Best balance of speed and accuracy without internet
For maximum accuracy:
Use Google Lens v2 (remote) - Highest quality but requires internet
For development:
Use Dummy OCR - Test UI without actual OCR processing
For custom deployment:
Use owocr (Websocket) - Run OCR service on a different machine
For Chrome integration:
Use Chrome Screen AI (local) - Leverage existing Chrome components

Next steps

Create custom provider

Build your own OCR provider using these as examples

OCR provider interface

Understand the interface contract and data models