Components All New MacOS Windows Linux iOS
Examples Mac & Win Server Client Guides Statistic FMM Blog Deprecated Old

DynaPDF.Parser.ExtractText

Extracts text from parser.

Component Version macOS Windows Linux Server iOS SDK
DynaPDF 14.0 ✅ Yes ✅ Yes ✅ Yes ✅ Yes ✅ Yes
MBS( "DynaPDF.Parser.ExtractText"; PDF; Flags { ; Left; Bottom; Right; Top } )   More

Parameters

Parameter Description Example Flags
PDF The PDF reference. $pdf
Flags The flags for text extraction.
Can include Default, SortTextX, SortTextY, SortTextXY, DeleteOverlappingText and/or NoHeuristic.
Usually you may want to use SortTextX here.
"SortTextX"
Left The left coordinate of the rectangle. 0 Optional
Bottom The bottom coordinate of the page. 0 Optional
Right The right coordinate of the page. 595 Optional
Top The top coordinate of the page. 842 Optional

Result

Returns text or error.

Description

Extracts text from parser.
The function extracts the text of a page with the same algorithm that DynaPDF.FindText uses to find text on a page. In order to get exactly the same result the flag "SortTextX" must be set.

The function DynaPDF.ExtractText of the PDF instance calls in fact this function internally.

The optional parameter Area defined using left, bottom, right and top can be set to restrict text extraction to that rectangle. The rectangle must be defined as if the page would be viewed in a PDF viewer. That means in bottom up coordinates and the orientation must be considered. The page coordinate system is de-rotated before text extraction starts since this produces better results. The width and height must be calculated from the crop box if set, or from the media box otherwise. Note also that the width and height must be exchanged if the orientation is 90, -90, 270, or -270 degrees.

See also

Release notes

Blog Entries

This function checks for a license.

Created 23th November 2023, last changed 22nd August 2024


DynaPDF.Parser.DeleteText - DynaPDF.Parser.FindText