Pdf 
This module can be used to extract text from PDF files.
Note: That it is not possible to extract text from scanned PDF files.
Manatee compatibility
This module is compatible with Manatee v2.0.
Usage 
Extract text from PDF file 
You can use the textBlocks function to extract text blocks from a PDF file.
js
// Load and instantiate the module, replace x.y.z with a proper version
var pdf = Module.load("Pdf", {version: "vX.Y.Z"});
var result = pdf.textBlocks("/path/to/file.pdf", {page: 2});
// result is an `TextBlocks` object which has a blocks property containing the extracted text blocks
var blocks = result.blocks;The options argument (the 2nd argument) can contain the following properties:
pagethe page number to extract text from. Defaults to all pages.passwordthe password to decrypt the PDF file.betweenLineMultiplierthe space allowed between blocks (average space multiplied with this variable). Defaults to 1.3.
The TextBlocks object has a find method which can be used to find text blocks in the PDF file. You can use it like:
js
// Find a block of text below the headline "Some headline"
result.find("below", "Some headline");
// We dont want rotated text, so we can use the `allowRotatedText` property
var block1 = result.find("below", "Some headline", {allowRotatedText: false});
// You can also use a boundingBox (rectangle) as an argument to find another block relative to it
var block2 = result.find("below", block1.boundingBox);You can use
pdf.Belowaka"Below"pdf.Aboveaka"Above"pdf.LeftOfaka"LeftOf"pdf.RightOfaka"RightOf"pdf.Nearestaka"Nearest"to find the nearest block of text
as the first argument and a regular expression as the second argument.
TextBlocks also has a blocks property which contains the extracted text blocks. Each TextBlock object has the following properties:
text(string) the combined text of all lines in the blocklines(array of strings) the individual lines in the blockseparator(string) the separator used to separate the lines in the blockboundingBox(object) the bounding box of the blocktopLeft(number) the coordinate of the top left corner of the bounding boxtopRight(number) the coordinate of the top right corner of the bounding boxbottomLeft(number) the coordinate of the bottom left corner of the bounding boxbottomRight(number) the coordinate of the bottom right corner of the bounding boxwidth(number) the width of the bounding boxheight(number) the height of the bounding box
readingOrder(number) the reading order of the blocktextOrientation(number) the text orientation of the block
Releases 
v3.0.0 (2023-12-19) 
- Release for Manatee v2
 
v1.0.3 (2022-02-17) 
- Feature: Added support for password protected PDF files
 
v1.0.2 (2022-02-16) 
Initial release.
