Skip to content

Ocr

This module can be used for detecting text in images.

Extracting text from a PDF file

This functionality can be used to extract text from a PDF file where the regular Pdf module can’t be used. This is normally the case for scanned PDF files and files generated without embedded character information.

javascript
var ocr = Module.load("Ocr", {version: "vX.Y.Z"});
var result = ocr.readPdf("/Users/robot/Desktop/test.pdf", {lang: "Danish"});

The result is a OcrResult JSON object with the following properties:

  • text is the extracted text of the complete document
  • confidence is the confidence level of the OCR result
  • pages is an array of OcrPage objects, one for each page in the PDF, they’ll have the following properties:
    • text is the extracted text
    • confidence is the confidence level of the OCR result for the page
    • pageNumber is the page number
    • textDirection is the text direction of the page
    • pdfText is the extracted text from the PDF if it is possible to directly extract it
    • bounds is the bounding box of the text in the page
    • contentArea is the content area of the page
    • barcodes is an array of OcrBarcode objects, one for each barcode in the page, each object will have the following properties:
      • format is the barcode format
      • number is the barcode number if available
      • value is the value of the barcode
      • text is the text of the barcode
      • bounds is the bounding box of the barcode
    • lines is an array of OcrLine objects, one for each line in the page, each object will have the following properties:
      • text is the extracted text of the line
      • confidence is the confidence level of the OCR result for the line
      • bounds is the bounding box of the line
      • textDirection is the text direction of the page
    • blocks is an array of OcrBlock objects, one for each text-block in the page, each object will have the following properties:
      • text is the extracted text of the block
      • confidence is the confidence level of the OCR result for the block
      • bounds is the bounding box of the block
      • textDirection is the text direction of the block
      • blockType is the block type of the block
      • bounds is the bounding box of the block
      • blockNumber is the block number of the block
      • lines is an array of OcrLine objects, one for each line in the block (same type as for lines of the page)
    • paragraphs is an array of OcrParagraph objects, one for each paragraph in the page, each object will have the following properties:
      • text is the extracted text of the paragraph
      • confidence is the confidence level of the OCR result for the paragraph
      • bounds is the bounding box of the paragraph
    • words is an array of OcrWord objects, one for each word in the page, each object will have the following properties:
      • text is the extracted text of the word
      • confidence is the confidence level of the OCR result for the word
      • bounds is the bounding box of the word
      • textDirection is the text direction of the word

Find the location of a text in an image

The bounds method can be used to find the location of a text in an image.

javascript
var ocr = Module.load("Ocr", {version: "vX.Y.Z"});
// Get the bounding box of the text
var bounds = ocr.bounds("<base64 encoded image>", {lang: "Danish"});

Construct a path for a field based on a text

If you e.g. have a screenshot and need a path to a specific field containing some word or text, you can use the fieldPath method.

javascript
var ocr = Module.load("Ocr", {version: "vX.Y.Z"});
// We're looking for an OK button
var p = ocr.fieldPath("<base64 encoded image, e.g. a screenshot>", "OK", {lang: "Danish"});
// Lets click on the button
new Field(p).click();