Components | All | New | MacOS | Windows | Linux | iOS | ||||
Examples | Mac & Win | Server | Client | Guides | Statistic | FMM | Blog | Deprecated | Old |
DynaPDF.Parser.ExtractText
Extracts text from parser.
Component | Version | macOS | Windows | Linux | Server | iOS SDK |
DynaPDF | 14.0 | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
MBS( "DynaPDF.Parser.ExtractText"; PDF; Flags { ; Left; Bottom; Right; Top } ) More
Parameters
Parameter | Description | Example | Flags |
---|---|---|---|
The PDF reference. | |||
Flags | The flags for text extraction. Can include Default, SortTextX, SortTextY, SortTextXY, DeleteOverlappingText and/or NoHeuristic. Usually you may want to use SortTextX here. |
"SortTextX" | |
Left | The left coordinate of the rectangle. | 0 | Optional |
Bottom | The bottom coordinate of the page. | 0 | Optional |
Right | The right coordinate of the page. | 595 | Optional |
Top | The top coordinate of the page. | 842 | Optional |
Result
Returns text or error.
Description
Extracts text from parser.The function extracts the text of a page with the same algorithm that DynaPDF.FindText uses to find text on a page. In order to get exactly the same result the flag "SortTextX" must be set.
The function DynaPDF.ExtractText of the PDF instance calls in fact this function internally.
The optional parameter Area defined using left, bottom, right and top can be set to restrict text extraction to that rectangle. The rectangle must be defined as if the page would be viewed in a PDF viewer. That means in bottom up coordinates and the orientation must be considered. The page coordinate system is de-rotated before text extraction starts since this produces better results. The width and height must be calculated from the crop box if set, or from the media box otherwise. Note also that the width and height must be exchanged if the orientation is 90, -90, 270, or -270 degrees.
See also
Release notes
- Version 14.0
- Added DynaPDF parser functions: DynaPDF.Parser.ChangeAltFont, DynaPDF.Parser.Create, DynaPDF.Parser.DeleteText, DynaPDF.Parser.ExtractText, DynaPDF.Parser.FindText, DynaPDF.Parser.Line, DynaPDF.Parser.ParsePage, DynaPDF.Parser.ReplaceSelText, DynaPDF.Parser.SetAltFont, DynaPDF.Parser.TextMatrix, DynaPDF.Parser.WriteToPage.
Blog Entries
This function checks for a license.
Created 23th November 2023, last changed 22nd August 2024