Class AI_OCR

Provides the AI.functionality.

Remarks

Maintainer: Callari, Salvatore (Salvatore.Callari@Ansbach.de)

Index

Constructors

constructor

Methods

functionality

Constructors

constructor

new AI_OCR(): AI_OCR
Returns AI_OCR

Methods

`Static`functionality

functionality(toLoad: { [key: string]: unknown }, toProcess: Element): void
This functionality scans the selected files of a HTMLInputElement either prints the scanned text, extracts substrings from the scanned text or verifies that the scanned text matches the pattern using the Tesseract AI-OCR engine.

PDF Support: PDF files are automatically detected. PDFs with text (>100 characters) are processed client-side without using the AI backend. PDFs with minimal text (scanned documents) are rendered to images and sent to Tesseract for OCR.

Automatic Orientation Detection: The Tesseract OCR engine will automatically detect and correct image orientation using its OSD (Orientation and Script Detection).

Config Parameter:
- Mode: Either Print, Verify or Extract Fields.
- Pattern: The RegEx to use to either extract the substrings from the scanned text or to verify that the scanned text matches the pattern. When the mode is Extract Fields all fields within the parent container of the one containing the HTMLInputElement toProcess that have the CodBi-CSS-Class AI_TESSERACT_Name are used to receive the extracted fields. For each such field, a corresponding parameter Pattern_... must be defined to specify the RegEx to use to extract the substrings from the scanned text for that field. The name of the field is specified after the dash and are matched to the data-cb-Field of the field to extract the substrings from the scanned text.
- Separator: If Mode is set to Extract Fields, this parameter defines the separator for the results of multiple files. Default is a comma.
- MaxPages: Maximum number of PDF pages to process (default: 5). Set to 0 for no limit. Only applies to PDFs.
- RegExFlags: Optional regex flags to apply to all patterns (e.g., "i" for case-insensitive, "m" for multiline, "s" for dotall). Multiple flags can be combined (e.g., "im"). These flags are transmitted to the Tesseract servlet and applied to pattern matching.
- Preprocess: Optional boolean flag to enable image preprocessing before OCR. When set to true, applies grayscale conversion, adaptive binarization (Otsu's method), and noise reduction to improve text recognition accuracy. Default is false.
- InvalidImageText: The text to display if one or more of the images do not comply to the specified Pattern in mode Verify.
- WrongFileMessage: The text to display for the manual verification checkbox label in mode Verify.
- ProcessingImageText: The text to append to the label of the HTMLInputElement toProcess while the images
CSS Classes:
- AI_TESSERACT_Name: Elements with this class within the parent container of the one holding the HTMLInputElement toProcess are used to receive the extracted fields when Mode is set to Extract Fields. Each such element should have data-cb-Field set to the name of the field to receive the extracted text for (see Pattern_... config parameter). *
Parameters
- toLoad: { [key: string]: unknown }
  Provided by the CodBi.
- toProcess: Element
  Provided by the CodBi.
Returns void
- Defined in Git/CodBi-Dev/src/main/web/packages/form/src/js/Functionalities/ai.ocr.ts:71

Class AI_OCR

Remarks

Index

Constructors

Methods

Constructors

constructor

Returns AI_OCR

Methods

`Static`functionality

Config Parameter:

CSS Classes:

Parameters

Returns void

Settings

On This Page

Class AI_OCR

Remarks

Index

Constructors

Methods

Constructors

constructor

Returns AI_OCR

Methods

Staticfunctionality

Config Parameter:

CSS Classes:

Parameters

Returns void

Settings

On This Page

`Static`functionality