> ## Documentation Index > Fetch the complete documentation index at: https://mintlify.com/rtr46/meikipop/llms.txt > Use this file to discover all available pages before exploring further. # OCR provider interface > Complete reference for the OcrProvider interface contract and data models used in meikipop ## Overview The `OcrProvider` interface defines the contract that all OCR providers must implement to work with meikipop. This abstraction allows you to swap different OCR backends without modifying the core application logic. All interface definitions are located in `src/ocr/interface.py`. ## OcrProvider abstract class Your custom provider must inherit from `OcrProvider` and implement its abstract methods: ```python src/ocr/interface.py theme={null} class OcrProvider(abc.ABC): """ Abstract base class for an OCR provider. Any class that implements this interface can be used by the application's OcrProcessor. This allows for easily swapping out different OCR backends. """ @property @abc.abstractmethod def NAME(self) -> str: """A user-friendly name for this provider.""" raise NotImplementedError @abc.abstractmethod def scan(self, image: Image.Image) -> Optional[List[Paragraph]]: """ Performs OCR on the given image. Args: image: A PIL Image object to perform OCR on. Returns: A list of Paragraph objects found in the image, or None if an error occurred. Returns an empty list if no text is found. """ raise NotImplementedError ``` ### Required properties A unique, user-friendly string for your provider (e.g., `"My Cool OCR"`). This name appears in the settings and tray icon menus. ### Required methods The core method where all OCR processing happens. **Parameters:** * `image` (PIL.Image.Image): The screen region to scan **Returns:** * `List[Paragraph]`: If OCR succeeds (return empty list `[]` if no text found) * `None`: If a critical error occurred The `scan` method receives a PIL Image object and must return data in meikipop's standard format. Your main task is converting your OCR engine's output into this format. ## Data models Your `scan` method must return data using these three immutable dataclasses: ### BoundingBox Represents the location and size of text with normalized coordinates. ```python src/ocr/interface.py theme={null} @dataclass(frozen=True) class BoundingBox: """A normalized bounding box. All coordinates are floats between 0.0 and 1.0.""" center_x: float center_y: float width: float height: float ``` Horizontal center position, normalized to 0.0-1.0 range (0.0 is left edge) Vertical center position, normalized to 0.0-1.0 range (0.0 is top edge) Width of the bounding box, normalized to 0.0-1.0 range Height of the bounding box, normalized to 0.0-1.0 range All coordinates and dimensions **must be normalized** to a 0.0-1.0 float range, relative to the input image's dimensions. `(0.0, 0.0)` represents the top-left corner. #### Converting pixel coordinates to normalized format If your OCR engine returns absolute pixel coordinates, you need to convert them: ```python theme={null} # Raw data from your OCR engine raw_box = {'x': 50, 'y': 100, 'w': 200, 'h': 40} img_width, img_height = 1000, 800 # Conversion to normalized center_x, center_y, width, height center_x = (raw_box['x'] + raw_box['w'] / 2) / img_width # 0.15 center_y = (raw_box['y'] + raw_box['h'] / 2) / img_height # 0.15 width = raw_box['w'] / img_width # 0.2 height = raw_box['h'] / img_height # 0.05 # Create the meikipop object meiki_box = BoundingBox(center_x, center_y, width, height) ``` ### Word Represents a single recognized text element. ```python src/ocr/interface.py theme={null} @dataclass(frozen=True) class Word: """Represents a single word recognized by the OCR.""" text: str # this can be either a word or a single character separator: str # The separator that follows the word (e.g., a space) - optional box: BoundingBox ``` The recognized text. Can be a full word (`"日本語"`) or a single character (`"日"`). Single-character boxes often lead to more precise lookups. The character that follows the word. Usually an empty string `""` for Japanese text. The bounding box for this specific word or character. Meikipop's hit-scanning works well with both word-level and character-level boxes. Providing single-character boxes often leads to more precise dictionary lookups. ### Paragraph Represents a block of text composed of words. ```python src/ocr/interface.py theme={null} @dataclass(frozen=True) class Paragraph: """Represents a block of text, composed of words.""" full_text: str words: List[Word] box: BoundingBox is_vertical: bool # True if text is top-to-bottom - optional ``` The complete, reconstructed text of the paragraph. A list of `Word` objects that form this paragraph. The bounding box encompassing the entire paragraph. Must be `True` if the text is written top-to-bottom (vertical Japanese text). If your OCR engine doesn't provide this information, you can infer it from the bounding box aspect ratio: `height > width`. For Japanese text, correctly setting `is_vertical` is crucial for proper text rendering and lookups. ## Example implementation Here's how a typical `scan` method transforms OCR data: ```python src/ocr/providers/dummy/provider.py theme={null} def scan(self, image: Image.Image) -> Optional[List[Paragraph]]: try: # 1. Get OCR data from your engine raw_ocr_results = your_ocr_engine.recognize(image) # 2. Transform to meikipop format paragraphs: List[Paragraph] = [] img_width, img_height = image.size for ocr_line in raw_ocr_results: # Convert pixel bbox to normalized BoundingBox line_bbox_data = ocr_line.get("bbox") center_x = (line_bbox_data['x'] + line_bbox_data['w'] / 2) / img_width center_y = (line_bbox_data['y'] + line_bbox_data['h'] / 2) / img_height norm_w = line_bbox_data['w'] / img_width norm_h = line_bbox_data['h'] / img_height line_box = BoundingBox(center_x, center_y, norm_w, norm_h) # Determine text direction is_vertical = line_bbox_data['h'] > line_bbox_data['w'] # Process words words_in_para: List[Word] = [] for word_data in ocr_line.get("words", []): # Convert word coordinates word_box = convert_to_bounding_box(word_data['bbox'], img_width, img_height) words_in_para.append(Word(text=word_data['text'], separator="", box=word_box)) # Create Paragraph object paragraph = Paragraph( full_text=ocr_line.get("text"), words=words_in_para, box=line_box, is_vertical=is_vertical ) paragraphs.append(paragraph) # 3. Return the results return paragraphs except Exception as e: logger.error(f"Error in OCR: {e}", exc_info=True) return None # Return None on critical errors ``` ## Next steps Step-by-step guide to building your own OCR provider Explore the built-in OCR providers in meikipop