> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/rtr46/meikipop/llms.txt
> Use this file to discover all available pages before exploring further.

# OCR provider interface

> Complete reference for the OcrProvider interface contract and data models used in meikipop

## Overview

The `OcrProvider` interface defines the contract that all OCR providers must implement to work with meikipop. This abstraction allows you to swap different OCR backends without modifying the core application logic.

All interface definitions are located in `src/ocr/interface.py`.

## OcrProvider abstract class

Your custom provider must inherit from `OcrProvider` and implement its abstract methods:

```python src/ocr/interface.py theme={null}
class OcrProvider(abc.ABC):
    """
    Abstract base class for an OCR provider.

    Any class that implements this interface can be used by the application's
    OcrProcessor. This allows for easily swapping out different OCR backends.
    """

    @property
    @abc.abstractmethod
    def NAME(self) -> str:
        """A user-friendly name for this provider."""
        raise NotImplementedError

    @abc.abstractmethod
    def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
        """
        Performs OCR on the given image.

        Args:
            image: A PIL Image object to perform OCR on.

        Returns:
            A list of Paragraph objects found in the image, or None if an
            error occurred. Returns an empty list if no text is found.
        """
        raise NotImplementedError
```

### Required properties

<ParamField path="NAME" type="str" required>
  A unique, user-friendly string for your provider (e.g., `"My Cool OCR"`). This name appears in the settings and tray icon menus.
</ParamField>

### Required methods

<ParamField path="scan" type="method" required>
  The core method where all OCR processing happens.

  **Parameters:**

  * `image` (PIL.Image.Image): The screen region to scan

  **Returns:**

  * `List[Paragraph]`: If OCR succeeds (return empty list `[]` if no text found)
  * `None`: If a critical error occurred
</ParamField>

<Note>
  The `scan` method receives a PIL Image object and must return data in meikipop's standard format. Your main task is converting your OCR engine's output into this format.
</Note>

## Data models

Your `scan` method must return data using these three immutable dataclasses:

### BoundingBox

Represents the location and size of text with normalized coordinates.

```python src/ocr/interface.py theme={null}
@dataclass(frozen=True)
class BoundingBox:
    """A normalized bounding box. All coordinates are floats between 0.0 and 1.0."""
    center_x: float
    center_y: float
    width: float
    height: float
```

<ParamField path="center_x" type="float" required>
  Horizontal center position, normalized to 0.0-1.0 range (0.0 is left edge)
</ParamField>

<ParamField path="center_y" type="float" required>
  Vertical center position, normalized to 0.0-1.0 range (0.0 is top edge)
</ParamField>

<ParamField path="width" type="float" required>
  Width of the bounding box, normalized to 0.0-1.0 range
</ParamField>

<ParamField path="height" type="float" required>
  Height of the bounding box, normalized to 0.0-1.0 range
</ParamField>

<Warning>
  All coordinates and dimensions **must be normalized** to a 0.0-1.0 float range, relative to the input image's dimensions. `(0.0, 0.0)` represents the top-left corner.
</Warning>

#### Converting pixel coordinates to normalized format

If your OCR engine returns absolute pixel coordinates, you need to convert them:

```python theme={null}
# Raw data from your OCR engine
raw_box = {'x': 50, 'y': 100, 'w': 200, 'h': 40}
img_width, img_height = 1000, 800

# Conversion to normalized center_x, center_y, width, height
center_x = (raw_box['x'] + raw_box['w'] / 2) / img_width  # 0.15
center_y = (raw_box['y'] + raw_box['h'] / 2) / img_height  # 0.15
width = raw_box['w'] / img_width  # 0.2
height = raw_box['h'] / img_height  # 0.05

# Create the meikipop object
meiki_box = BoundingBox(center_x, center_y, width, height)
```

### Word

Represents a single recognized text element.

```python src/ocr/interface.py theme={null}
@dataclass(frozen=True)
class Word:
    """Represents a single word recognized by the OCR."""
    text: str  # this can be either a word or a single character
    separator: str  # The separator that follows the word (e.g., a space) - optional
    box: BoundingBox
```

<ParamField path="text" type="str" required>
  The recognized text. Can be a full word (`"日本語"`) or a single character (`"日"`). Single-character boxes often lead to more precise lookups.
</ParamField>

<ParamField path="separator" type="str" required>
  The character that follows the word. Usually an empty string `""` for Japanese text.
</ParamField>

<ParamField path="box" type="BoundingBox" required>
  The bounding box for this specific word or character.
</ParamField>

<Tip>
  Meikipop's hit-scanning works well with both word-level and character-level boxes. Providing single-character boxes often leads to more precise dictionary lookups.
</Tip>

### Paragraph

Represents a block of text composed of words.

```python src/ocr/interface.py theme={null}
@dataclass(frozen=True)
class Paragraph:
    """Represents a block of text, composed of words."""
    full_text: str
    words: List[Word]
    box: BoundingBox
    is_vertical: bool  # True if text is top-to-bottom - optional
```

<ParamField path="full_text" type="str" required>
  The complete, reconstructed text of the paragraph.
</ParamField>

<ParamField path="words" type="List[Word]" required>
  A list of `Word` objects that form this paragraph.
</ParamField>

<ParamField path="box" type="BoundingBox" required>
  The bounding box encompassing the entire paragraph.
</ParamField>

<ParamField path="is_vertical" type="bool" required>
  Must be `True` if the text is written top-to-bottom (vertical Japanese text). If your OCR engine doesn't provide this information, you can infer it from the bounding box aspect ratio: `height > width`.
</ParamField>

<Info>
  For Japanese text, correctly setting `is_vertical` is crucial for proper text rendering and lookups.
</Info>

## Example implementation

Here's how a typical `scan` method transforms OCR data:

```python src/ocr/providers/dummy/provider.py theme={null}
def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
    try:
        # 1. Get OCR data from your engine
        raw_ocr_results = your_ocr_engine.recognize(image)
        
        # 2. Transform to meikipop format
        paragraphs: List[Paragraph] = []
        img_width, img_height = image.size
        
        for ocr_line in raw_ocr_results:
            # Convert pixel bbox to normalized BoundingBox
            line_bbox_data = ocr_line.get("bbox")
            center_x = (line_bbox_data['x'] + line_bbox_data['w'] / 2) / img_width
            center_y = (line_bbox_data['y'] + line_bbox_data['h'] / 2) / img_height
            norm_w = line_bbox_data['w'] / img_width
            norm_h = line_bbox_data['h'] / img_height
            
            line_box = BoundingBox(center_x, center_y, norm_w, norm_h)
            
            # Determine text direction
            is_vertical = line_bbox_data['h'] > line_bbox_data['w']
            
            # Process words
            words_in_para: List[Word] = []
            for word_data in ocr_line.get("words", []):
                # Convert word coordinates
                word_box = convert_to_bounding_box(word_data['bbox'], img_width, img_height)
                words_in_para.append(Word(text=word_data['text'], separator="", box=word_box))
            
            # Create Paragraph object
            paragraph = Paragraph(
                full_text=ocr_line.get("text"),
                words=words_in_para,
                box=line_box,
                is_vertical=is_vertical
            )
            paragraphs.append(paragraph)
        
        # 3. Return the results
        return paragraphs
        
    except Exception as e:
        logger.error(f"Error in OCR: {e}", exc_info=True)
        return None  # Return None on critical errors
```

## Next steps

<CardGroup cols={2}>
  <Card title="Create a custom provider" icon="code" href="/development/creating-custom-provider">
    Step-by-step guide to building your own OCR provider
  </Card>

  <Card title="Available providers" icon="list" href="/development/available-providers">
    Explore the built-in OCR providers in meikipop
  </Card>
</CardGroup>
