> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/rtr46/meikipop/llms.txt
> Use this file to discover all available pages before exploring further.

# Available OCR providers

> Reference documentation for all built-in OCR providers in meikipop with implementation details and usage examples

## Overview

Meikipop includes several built-in OCR providers optimized for Japanese text recognition. Each provider offers different trade-offs between accuracy, speed, cost, and resource requirements.

## Provider comparison

| Provider         | Type   | Speed   | Accuracy  | Requirements      | Best for                   |
| ---------------- | ------ | ------- | --------- | ----------------- | -------------------------- |
| Dummy            | Local  | Instant | N/A       | None              | Development and testing    |
| meikiocr         | Local  | Fast    | High      | GPU recommended   | Offline gaming, privacy    |
| Google Lens v2   | Remote | Medium  | Very High | Internet          | Online use, best accuracy  |
| owocr            | Hybrid | Medium  | High      | owocr daemon      | Flexible deployment        |
| Chrome Screen AI | Local  | Fast    | Medium    | Chrome components | Chrome browser integration |

## Built-in providers

### Dummy OCR

<Note>
  The dummy provider is designed as a template for creating custom providers. It returns fixed mock data for testing.
</Note>

```python src/ocr/providers/dummy/provider.py theme={null}
class DummyProvider(OcrProvider):
    """
    A template for creating new OCR providers.
    
    When this provider is selected, it returns a fixed set of Japanese text
    to allow for testing of the popup window without a real OCR backend.
    """
    NAME = "Dummy OCR (Developer Template)"
```

**Implementation highlights:**

* Returns hardcoded Japanese text with both horizontal and vertical examples
* Demonstrates proper coordinate normalization
* Shows character-level and word-level `Word` objects
* Fully commented for educational purposes

**Use cases:**

* Developing and testing UI without a real OCR backend
* Template for creating custom providers
* Understanding the data transformation process

**Example output:**

```python theme={null}
# Returns two paragraphs:
# 1. Horizontal: "これは横書きテキストです"
# 2. Vertical: "縦書き"
```

### meikiocr (local)

<Info>
  The meikiocr provider uses a high-performance local model specifically optimized for Japanese video game text.
</Info>

```python src/ocr/providers/meikiocr/provider.py theme={null}
class MeikiOcrProvider(OcrProvider):
    """
    An OCR provider that uses the high-performance meikiocr library.
    This provider is specifically optimized for recognizing Japanese text from video games.
    """
    NAME = "meikiocr (local)"
    
    def __init__(self):
        self.ocr_client = MeikiOCR()
        logger.info(f"Running on: {self.ocr_client.active_provider}")
```

**Implementation highlights:**

* Uses the `meikiocr` Python library
* Converts PIL images to NumPy RGB arrays
* Returns character-level boxes for precise lookups
* Groups individual lines into paragraphs using postprocessing
* Filters out non-Japanese text

**Configuration:**

```python theme={null}
DET_CONFIDENCE_THRESHOLD = 0.5  # Detection confidence
REC_CONFIDENCE_THRESHOLD = 0.1  # Recognition confidence
```

**Processing pipeline:**

<Steps>
  <Step title="Initialize">
    Creates a `MeikiOCR` client that handles model downloading and session management internally.
  </Step>

  <Step title="Convert image">
    Converts PIL Image to NumPy RGB array for library compatibility.
  </Step>

  <Step title="Run OCR">
    Calls `run_ocr()` with confidence thresholds to get character-level results.
  </Step>

  <Step title="Transform results">
    Converts `[x1, y1, x2, y2]` pixel coordinates to normalized `BoundingBox` objects.
  </Step>

  <Step title="Group paragraphs">
    Uses `group_lines_into_paragraphs()` to combine related lines.
  </Step>
</Steps>

**Key methods:**

```python theme={null}
def _to_normalized_bbox(self, bbox_pixels: list, img_width: int, img_height: int) -> BoundingBox:
    """Converts an [x1, y1, x2, y2] pixel bbox to a normalized meikipop BoundingBox."""
    x1, y1, x2, y2 = bbox_pixels
    box_w, box_h = x2 - x1, y2 - y1
    
    center_x = (x1 + box_w / 2) / img_width
    center_y = (y1 + box_h / 2) / img_height
    norm_w = box_w / img_width
    norm_h = box_h / img_height
    
    return BoundingBox(center_x, center_y, norm_w, norm_h)
```

**Requirements:**

* Install: `pip install meikiocr`
* GPU recommended for best performance
* Models downloaded automatically on first run

### Google Lens v2 (remote)

<Warning>
  This provider sends screenshots to Google's servers. Do not use with sensitive or private information.
</Warning>

```python src/ocr/providers/glensv2/provider.py theme={null}
class GoogleLensOcrV2(OcrProvider):
    NAME = "Google Lens (remote)"
    
    def __init__(self):
        self._session = requests.Session()
        self._session.headers.update({
            'Content-Type': 'application/x-protobuf',
            'X-Goog-Api-Key': 'AIzaSyDr2UxVnv_U85AbhhY8XSHSIavUW0DC-sY',
            'User-Agent': 'Mozilla/5.0 ...'
        })
```

**Implementation highlights:**

* Uses Google Lens API via protobuf protocol
* Maintains persistent HTTP session for performance
* Supports low-bandwidth mode (50% resolution, 16-color quantization)
* Returns normalized coordinates directly (no conversion needed)
* Filters for Japanese text using regex

**Image processing:**

```python theme={null}
if config.glens_low_bandwidth:
    # Reduce size by ~50%
    scale_factor = math.sqrt(0.5)
    new_width = int(image.width * scale_factor)
    new_height = int(image.height * scale_factor)
    processed_image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)
    # Reduce to 16 colors
    processed_image = processed_image.convert('L').quantize(colors=16)
    processed_image.save(bio, format='PNG')
else:
    # Standard quality
    processed_image.save(bio, format='JPEG', quality=90)
```

**Text direction detection:**

```python theme={null}
for para in glens_response.objects_response.text.text_layout.paragraphs:
    is_vertical = para.writing_direction == WritingDirection.TOP_TO_BOTTOM
```

**Requirements:**

* Active internet connection
* Accepts Google's data processing terms

**Performance:**

* Network latency: \~200-500ms typical
* Request timeout: 10 seconds
* Logs detailed timing information

### owocr (WebSocket)

<Info>
  The owocr provider connects to a running owocr daemon via WebSocket, allowing flexible deployment options.
</Info>

```python src/ocr/providers/owocr/provider.py theme={null}
class OwocrWebsocketProvider(OcrProvider):
    """
    An OCR provider that connects to a running owocr instance via websockets.
    This provider uses the synchronous websockets client to maintain a
    persistent connection.
    """
    NAME = "owocr (Websocket)"
```

**Implementation highlights:**

* Maintains persistent WebSocket connection
* Automatic reconnection on connection loss
* Uses direct IP (127.0.0.1) to avoid localhost resolution delays
* Two-part response protocol (acknowledgment + JSON results)
* Returns normalized coordinates directly

**Connection handling:**

```python theme={null}
OWOCR_WEBSOCKET_URI = "ws://127.0.0.1:7331"

def _connect(self) -> bool:
    try:
        self.websocket = connect(
            OWOCR_WEBSOCKET_URI,
            open_timeout=3,
            ping_interval=20,
            ping_timeout=20
        )
        return True
    except Exception as e:
        logger.error(f"Could not connect to owocr: {e}")
        logger.info("Please ensure owocr is running with:")
        logger.info("owocr -r websocket -w websocket -of json -e glens")
        return False
```

**Communication protocol:**

<Steps>
  <Step title="Send image">
    Converts PIL Image to BMP format and sends as binary.
  </Step>

  <Step title="Receive acknowledgment">
    Waits for "True" confirmation (5 second timeout).
  </Step>

  <Step title="Receive results">
    Waits for JSON response with OCR results (30 second timeout).
  </Step>

  <Step title="Transform data">
    Converts owocr's format to meikipop's `Paragraph` objects.
  </Step>
</Steps>

**Retry logic:**

```python theme={null}
for attempt in range(2):
    try:
        if self.websocket is None:
            if not self._connect():
                return None
        # ... perform scan
    except ConnectionClosed:
        logger.warning("Websocket connection lost. Will attempt to reconnect...")
        self.websocket = None
        if attempt == 0:
            continue  # Retry once
```

**Requirements:**

* Running owocr daemon
* Command: `owocr -r websocket -w websocket -of json -e glens`
* WebSocket connection to localhost:7331

### Chrome Screen AI (local)

<Note>
  This provider uses Chrome's Screen AI component for local, offline OCR processing.
</Note>

```python src/ocr/providers/screenai/provider.py theme={null}
class ScreenAiOcr(OcrProvider):
    NAME = "Chrome Screen AI (local)"
    
    # Class-level variables to ensure the native DLL is only initialized ONCE
    _is_initialized = False
    _lib = None
```

**Implementation highlights:**

* Uses Chrome's native Screen AI library via ctypes
* Singleton pattern for library initialization (once per app lifetime)
* Suppresses verbose native library output
* Returns character-level (symbol) boxes
* Automatically downsizes large images (>4MP)

**Library initialization:**

```python theme={null}
base_dir = Path.home() / ".config" / "screen_ai"
model_dir = base_dir / "resources"
dll_name = 'chrome_screen_ai.dll' if sys.platform == 'win32' else 'libchromescreenai.so'
```

**Image preparation:**

```python theme={null}
if image.width * image.height > 4000000:
    image.thumbnail((2000, 2000), Image.Resampling.LANCZOS)

img_rgba = image.convert('RGBA')
width, height = img_rgba.size
img_bytes = img_rgba.tobytes()

# Create Skia bitmap structure
bitmap.fPixmap.fPixels = ctypes.cast(ctypes.c_char_p(img_bytes), ctypes.c_void_p)
bitmap.fPixmap.fRowBytes = width * 4
bitmap.fPixmap.fInfo.fColorInfo.fColorType = 4  # kRGBA_8888
```

**Output suppression:**

```python theme={null}
@contextmanager
def suppress_output():
    """Redirects C/C++ level stdout and stderr to devnull."""
    devnull = os.open(os.devnull, os.O_WRONLY)
    original_stdout = os.dup(1)
    original_stderr = os.dup(2)
    os.dup2(devnull, 1)
    os.dup2(devnull, 2)
    try:
        yield
    finally:
        # Restore original streams
        os.dup2(original_stdout, 1)
        os.dup2(original_stderr, 2)
```

**Text direction detection:**

```python theme={null}
is_vertical = (line_box.direction == 3)  # DIRECTION_TOP_TO_BOTTOM
```

**Requirements:**

* Download Screen AI components from:
  `https://chrome-infra-packages.appspot.com/p/chromium/third_party/screen-ai`
* Extract to: `~/.config/screen_ai/resources/`
* Platform: Windows (DLL) or Linux (SO)

## Common patterns

### Postprocessing: Grouping lines into paragraphs

Most providers use the shared `group_lines_into_paragraphs()` utility:

```python theme={null}
from src.ocr.providers.postprocessing import group_lines_into_paragraphs

# After converting to line-level Paragraph objects
raw_lines: List[Paragraph] = [...]
final_paragraphs = group_lines_into_paragraphs(raw_lines)
```

This function:

* Combines adjacent lines into logical paragraphs
* Respects text direction (vertical vs. horizontal)
* Improves text readability and context

### Japanese text filtering

Several providers filter for Japanese text:

```python theme={null}
import re

JAPANESE_REGEX = re.compile(r'[\u3040-\u309F\u30A0-\u30FF\u4E00-\u9FAF]')

line_has_japanese = any(JAPANESE_REGEX.search(w.plain_text) for w in line.words)
if not line_has_japanese:
    continue
```

Character ranges:

* `\u3040-\u309F`: Hiragana
* `\u30A0-\u30FF`: Katakana
* `\u4E00-\u9FAF`: Kanji

## Selecting a provider

Choose based on your requirements:

**For offline gaming:**

```
Use meikiocr (local) - Best balance of speed and accuracy without internet
```

**For maximum accuracy:**

```
Use Google Lens v2 (remote) - Highest quality but requires internet
```

**For development:**

```
Use Dummy OCR - Test UI without actual OCR processing
```

**For custom deployment:**

```
Use owocr (Websocket) - Run OCR service on a different machine
```

**For Chrome integration:**

```
Use Chrome Screen AI (local) - Leverage existing Chrome components
```

## Next steps

<CardGroup cols={2}>
  <Card title="Create custom provider" icon="code" href="/development/creating-custom-provider">
    Build your own OCR provider using these as examples
  </Card>

  <Card title="OCR provider interface" icon="book" href="/development/ocr-provider-interface">
    Understand the interface contract and data models
  </Card>
</CardGroup>
