CodBi
    Preparing search index...

    Provides the AI.functionality.

    Initial Author: Callari, Salvatore (Callari@WaXCode.net) Maintainer: Callari, Salvatore (Callari@WaXCode.net)

    Index

    Constructors

    Methods

    Constructors

    Methods

    • This functionality scans the selected files of a HTMLInputElement either prints the scanned text, extracts substrings from the scanned text or verifies that the scanned text matches the pattern using the Tesseract AI-OCR engine.

      PDF Support: PDF files are automatically detected. PDFs with text (>100 characters) are processed client-side without using the AI backend. PDFs with minimal text (scanned documents) are rendered to images and sent to Tesseract for OCR.

      Automatic Orientation Detection: The Tesseract OCR engine will automatically detect and correct image orientation using its OSD (Orientation and Script Detection).

      • Mode: Either Print, Verify or Extract Fields.
      • Pattern: The RegEx to use to either extract the substrings from the scanned text or to verify that the scanned text matches the pattern. When the mode is Extract Fields all fields within the parent container of the one containing the HTMLInputElement toProcess that have the CodBi-CSS-Class CodBi_AI_OCR_Receiver are used to receive the extracted fields. For each such field, a corresponding parameter Pattern_... must be defined to specify the RegEx to use to extract the substrings from the scanned text for that field. The name of the field is specified after the dash and are matched to the data-cb-Field of the field to extract the substrings from the scanned text.
      • Separator: If Mode is set to Extract Fields, this parameter defines the separator for the results of multiple files. Default is a comma.
      • MaxPages: Maximum number of PDF pages to process (default: 5). Set to 0 for no limit. Only applies to PDFs.
      • RegExFlags: Optional regex flags to apply to all patterns (e.g., "i" for case-insensitive, "m" for multiline, "s" for dotall). Multiple flags can be combined (e.g., "im"). These flags are transmitted to the Tesseract servlet and applied to pattern matching.
      • Preprocess: Optional boolean flag to enable image preprocessing before OCR. When set to true, applies grayscale conversion, adaptive binarization (Otsu's method), and noise reduction to improve text recognition accuracy. Default is false.
      • InvalidImageText: The text to display if one or more of the images do not comply to the specified Pattern in mode Verify.
      • WrongFileMessage: The text to display for the manual verification checkbox label in mode Verify.
      • ProcessingImageText: The text to append to the label of the HTMLInputElement toProcess while the images are processed.
      • Maximum The number of files that may be uploaded. If the number of selected files exceeds this number, the processing is aborted and a warning is logged in the console.
      • QueueBadge: If set to "true", shows a badge with the current queue position while waiting for inference. Overrides the AI_QueueBadge plugin property for this instance. Default: determined by plugin property.
      • QueueText: Text appended after the queue position number in the badge (e.g. "in queue" → badge shows "3 in queue"). Default: empty.
      • CodBi_AI_OCR_Receiver: Elements with this class within the parent container of the one holding the HTMLInputElement toProcess are used to receive the extracted fields when Mode is set to Extract Fields. Each such element should have data-cb-Field set to the name of the field to receive the extracted text for (see Pattern_... config parameter). In Print mode, a single textarea with this class is expected to receive the full OCR text output.

      Parameters

      • toLoad: { [key: string]: unknown }

        Provided by the CodBi.

      • toProcess: Element

        Provided by the CodBi.

      Returns void