> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/rtr46/meikipop/llms.txt
> Use this file to discover all available pages before exploring further.

# Create a custom OCR provider

> Step-by-step guide to building your own OCR provider for meikipop with automatic discovery and integration

## Overview

This guide explains how to create your own OCR provider to use with meikipop. This allows you to integrate any OCR engine you prefer, whether it's an offline model, a web service, or a commercial API.

<Info>
  The best way to start is to copy the entire `/src/ocr/providers/dummy/` directory, rename it, and modify its contents. The dummy provider is a fully commented template designed for this purpose.
</Info>

## Automatic discovery

Meikipop automatically discovers and loads any valid OCR provider. To be discovered, your provider must meet two conditions:

<Steps>
  <Step title="Create a subdirectory">
    Your provider must be in its own subdirectory inside `/src/ocr/providers/`. For example: `/src/ocr/providers/my_cool_ocr/`.
  </Step>

  <Step title="Create provider.py">
    Your subdirectory must have a `provider.py` file containing a class that inherits from `OcrProvider`.
  </Step>
</Steps>

Once these conditions are met, meikipop will automatically detect and load your provider on startup.

## Implementation steps

### Step 1: Set up the directory structure

Create your provider directory:

```bash theme={null}
mkdir src/ocr/providers/my_cool_ocr
touch src/ocr/providers/my_cool_ocr/__init__.py
touch src/ocr/providers/my_cool_ocr/provider.py
```

### Step 2: Define your provider class

In `provider.py`, create a class that inherits from `OcrProvider`:

```python src/ocr/providers/my_cool_ocr/provider.py theme={null}
import logging
from typing import List, Optional

from PIL import Image

from src.ocr.interface import OcrProvider, Paragraph, Word, BoundingBox

logger = logging.getLogger(__name__)


class MyCoolOcrProvider(OcrProvider):
    """
    A custom OCR provider using My Cool OCR engine.
    """
    # The NAME is displayed in the settings and tray menu
    NAME = "My Cool OCR"
    
    def __init__(self):
        """Initialize your OCR client here."""
        # Import and initialize your OCR library
        # self.client = my_cool_ocr.Client(api_key="...")
        pass
    
    def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
        """
        Performs OCR on the given image.
        
        This method must:
        1. Get OCR data from your engine
        2. Convert it to meikipop's format
        3. Return a list of Paragraphs
        """
        try:
            # Your implementation here
            return self._process_image(image)
        except Exception as e:
            logger.error(f"Error in {self.NAME}: {e}", exc_info=True)
            return None
    
    def _process_image(self, image: Image.Image) -> List[Paragraph]:
        # Your OCR processing logic
        pass
```

<Note>
  The `NAME` property is required and must be a unique, user-friendly string. This name appears in the settings and tray menu.
</Note>

### Step 3: Implement the scan method

Your `scan` method must perform three key tasks:

<Steps>
  <Step title="Obtain OCR data">
    Call your OCR engine to get raw results. This could be:

    * A Python library call
    * A REST API request
    * A command-line tool execution
    * A local model inference
  </Step>

  <Step title="Transform the data">
    Convert your OCR engine's proprietary format into meikipop's standard data model using `BoundingBox`, `Word`, and `Paragraph` objects.
  </Step>

  <Step title="Return the results">
    Return a `List[Paragraph]` on success, an empty list `[]` if no text found, or `None` if a critical error occurred.
  </Step>
</Steps>

## Complete example: Dummy provider

Here's the complete dummy provider that demonstrates all required transformations:

```python src/ocr/providers/dummy/provider.py theme={null}
import logging
from typing import List, Optional

from PIL import Image

from src.ocr.interface import OcrProvider, Paragraph, Word, BoundingBox

logger = logging.getLogger(__name__)


class DummyProvider(OcrProvider):
    NAME = "Dummy OCR (Developer Template)"

    def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
        logger.info(f"{self.NAME} received an image of size {image.size}. Returning mock data.")
        
        try:
            # --- 1. OBTAIN OCR DATA ---
            # Simulated output from a fictional OCR engine with pixel coordinates
            mock_ocr_result = [
                {
                    "text": "これは横書きテキストです",
                    "bbox": {"x": 100, "y": 150, "w": 400, "h": 40},
                    "words": [
                        {"text": "これは", "bbox": {"x": 100, "y": 150, "w": 90, "h": 40}},
                        {"text": "横書き", "bbox": {"x": 200, "y": 150, "w": 90, "h": 40}},
                        {"text": "テキストです", "bbox": {"x": 300, "y": 150, "w": 200, "h": 40}},
                    ]
                },
                {
                    "text": "縦書き",
                    "bbox": {"x": 600, "y": 200, "w": 50, "h": 300},
                    "words": [
                        {"text": "縦", "bbox": {"x": 600, "y": 200, "w": 50, "h": 95}},
                        {"text": "書", "bbox": {"x": 600, "y": 305, "w": 50, "h": 95}},
                        {"text": "き", "bbox": {"x": 600, "y": 405, "w": 50, "h": 95}},
                    ]
                }
            ]
            
            # --- 2. PROCESS AND TRANSFORM THE DATA ---
            paragraphs: List[Paragraph] = []
            img_width, img_height = image.size
            
            if img_width == 0 or img_height == 0:
                logger.error("Invalid image dimensions received.")
                return None
            
            for ocr_line in mock_ocr_result:
                line_text = ocr_line.get("text")
                line_bbox_data = ocr_line.get("bbox")
                
                # Convert pixel bbox to normalized coordinates (0.0 to 1.0)
                center_x = (line_bbox_data['x'] + line_bbox_data['w'] / 2) / img_width
                center_y = (line_bbox_data['y'] + line_bbox_data['h'] / 2) / img_height
                norm_w = line_bbox_data['w'] / img_width
                norm_h = line_bbox_data['h'] / img_height
                
                line_box = BoundingBox(center_x, center_y, norm_w, norm_h)
                
                # Infer text direction from aspect ratio
                is_vertical = line_bbox_data['h'] > line_bbox_data['w']
                
                # Process words within the line
                words_in_para: List[Word] = []
                for word_data in ocr_line.get("words", []):
                    word_bbox_data = word_data.get("bbox")
                    
                    # Convert word coordinates
                    word_center_x = (word_bbox_data['x'] + word_bbox_data['w'] / 2) / img_width
                    word_center_y = (word_bbox_data['y'] + word_bbox_data['h'] / 2) / img_height
                    word_norm_w = word_bbox_data['w'] / img_width
                    word_norm_h = word_bbox_data['h'] / img_height
                    
                    word_box = BoundingBox(word_center_x, word_center_y, word_norm_w, word_norm_h)
                    words_in_para.append(Word(text=word_data['text'], separator="", box=word_box))
                
                # Assemble the Paragraph object
                paragraph = Paragraph(
                    full_text=line_text,
                    words=words_in_para,
                    box=line_box,
                    is_vertical=is_vertical
                )
                paragraphs.append(paragraph)
            
            # --- 3. RETURN THE RESULT ---
            return paragraphs
            
        except Exception as e:
            logger.error(f"An error occurred in {self.NAME}: {e}", exc_info=True)
            return None
```

<Tip>
  You can provide the interface file, your provider template, and sample JSON output from your OCR engine to an AI assistant (like GPT-4 or Claude) and ask it to write the adapter code for you. This can get you 90% of the way there.
</Tip>

## Data transformation patterns

### Converting bounding boxes

Your OCR engine likely returns pixel coordinates. You must normalize them:

```python theme={null}
# From pixel coordinates (top-left corner + dimensions)
raw_box = {'x': 50, 'y': 100, 'w': 200, 'h': 40}
img_width, img_height = 1000, 800

# To normalized center-based coordinates
center_x = (raw_box['x'] + raw_box['w'] / 2) / img_width  # 0.15
center_y = (raw_box['y'] + raw_box['h'] / 2) / img_height  # 0.15
width = raw_box['w'] / img_width  # 0.2
height = raw_box['h'] / img_height  # 0.05

meiki_box = BoundingBox(center_x, center_y, width, height)
```

### Determining text direction

If your OCR engine doesn't provide text direction, infer it from the bounding box:

```python theme={null}
# Vertical text typically has height > width
is_vertical = bounding_box.height > bounding_box.width

# Or from pixel dimensions before normalization
is_vertical = raw_bbox['h'] > raw_bbox['w']
```

### Handling word vs. character granularity

<Tip>
  Meikipop works well with both word-level and character-level boxes. Character-level boxes often provide more precise lookups.
</Tip>

```python theme={null}
# Character-level (recommended for Japanese)
for char_info in line_chars:
    words_in_line.append(Word(
        text=char_info['char'],  # Single character
        separator="",
        box=convert_bbox(char_info['bbox'])
    ))

# Word-level (also works)
for word_info in line_words:
    words_in_line.append(Word(
        text=word_info['text'],  # Full word
        separator="",
        box=convert_bbox(word_info['bbox'])
    ))
```

## Common OCR integration patterns

### Python library integration

```python theme={null}
import my_cool_ocr_library

class MyCoolOcrProvider(OcrProvider):
    def __init__(self):
        self.client = my_cool_ocr_library.Client(api_key="...")
    
    def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
        raw_results = self.client.recognize(image)
        return self._transform_results(raw_results)
```

### REST API integration

```python theme={null}
import requests
import io

class ApiOcrProvider(OcrProvider):
    def __init__(self):
        self.api_url = "https://api.myocr.com/v1/scan"
        self.api_key = "your-api-key"
    
    def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
        # Convert image to bytes
        buffer = io.BytesIO()
        image.save(buffer, format='PNG')
        
        # Make API request
        response = requests.post(
            self.api_url,
            files={'image': buffer.getvalue()},
            headers={'Authorization': f'Bearer {self.api_key}'}
        )
        
        if response.status_code != 200:
            return None
        
        return self._transform_results(response.json())
```

### Command-line tool integration

```python theme={null}
import subprocess
import json
import tempfile

class CliOcrProvider(OcrProvider):
    def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
        # Save to temp file
        with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as tmp:
            image.save(tmp.name)
            
            # Run CLI tool
            result = subprocess.run(
                ['ocr-tool', '--json', tmp.name],
                capture_output=True,
                text=True
            )
            
            if result.returncode != 0:
                return None
            
            raw_results = json.loads(result.stdout)
            return self._transform_results(raw_results)
```

## Activating your provider

Once your provider is implemented:

<Steps>
  <Step title="Run meikipop">
    Start the application. Your provider will be automatically discovered.
  </Step>

  <Step title="Open the tray menu">
    Right-click the meikipop tray icon.
  </Step>

  <Step title="Select OCR provider">
    Navigate to **OCR Provider** in the menu.
  </Step>

  <Step title="Choose your provider">
    Select your provider by its `NAME` from the list. Meikipop will now use your class for all OCR operations.
  </Step>
</Steps>

<Warning>
  Make sure your provider's `NAME` is unique to avoid conflicts with existing providers.
</Warning>

## Testing and debugging

### Enable debug logging

```python theme={null}
import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

def scan(self, image: Image.Image) -> Optional[List[Paragraph]]:
    logger.debug(f"Received image of size {image.size}")
    # Your code here
    logger.debug(f"Found {len(paragraphs)} paragraphs")
```

### Validate coordinates

```python theme={null}
def _validate_bbox(self, box: BoundingBox) -> bool:
    """Ensure all coordinates are in valid range."""
    if not (0.0 <= box.center_x <= 1.0):
        logger.warning(f"Invalid center_x: {box.center_x}")
        return False
    if not (0.0 <= box.center_y <= 1.0):
        logger.warning(f"Invalid center_y: {box.center_y}")
        return False
    if not (0.0 <= box.width <= 1.0):
        logger.warning(f"Invalid width: {box.width}")
        return False
    if not (0.0 <= box.height <= 1.0):
        logger.warning(f"Invalid height: {box.height}")
        return False
    return True
```

## Next steps

<CardGroup cols={2}>
  <Card title="OCR provider interface" icon="book" href="/development/ocr-provider-interface">
    Complete reference for the interface and data models
  </Card>

  <Card title="Available providers" icon="list" href="/development/available-providers">
    Explore the built-in OCR providers for more examples
  </Card>
</CardGroup>
