Components | All | New | MacOS | Windows | Linux | iOS | ||||
Examples | Mac & Win | Server | Client | Guides | Statistic | FMM | Blog | Deprecated | Old |
DynaPDF.Parser.ExtractText
Extracts text from parser.
Component | Version | macOS | Windows | Linux | Server | iOS SDK |
DynaPDF | 14.0 | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
MBS( "DynaPDF.Parser.ExtractText"; PDF; Flags { ; Left; Bottom; Right; Top } ) More
Parameters
Parameter | Description | Example | Flags |
---|---|---|---|
The PDF reference. | |||
Flags | The flags for text extraction. Can include Default, SortTextX, SortTextY, SortTextXY, DeleteOverlappingText and/or NoHeuristic. Usually you may want to use SortTextX here. The flag MediaBox limits text extraction to the media box. The flag CropBox uses the crop box (if missing media box) for the rectangle. |
"SortTextX" | |
Left | The left coordinate of the rectangle. | 0 | Optional |
Bottom | The bottom coordinate of the page. | 0 | Optional |
Right | The right coordinate of the page. | 595 | Optional |
Top | The top coordinate of the page. | 842 | Optional |
Result
Returns text or error.
Description
Extracts text from parser.The function extracts the text of a page with the same algorithm that DynaPDF.FindText uses to find text on a page. In order to get exactly the same result the flag "SortTextX" must be set.
The function DynaPDF.ExtractText of the PDF instance calls in fact this function internally.
The optional parameter Area defined using left, bottom, right and top can be set to restrict text extraction to that rectangle. The rectangle must be defined as if the page would be viewed in a PDF viewer. That means in bottom up coordinates and the orientation must be considered. The page coordinate system is de-rotated before text extraction starts since this produces better results. The width and height must be calculated from the crop box if set, or from the media box otherwise. Note also that the width and height must be exchanged if the orientation is 90, -90, 270, or -270 degrees.
Examples
Extract text of a page
Set Variable [ $page ; Value: 1 ]
Set Variable [ $r ; Value: MBS("DynaPDF.Parser.ParsePage"; $pdf; $page; "EnableTextSelection") ]
If [ MBS("IsError") ]
Show Custom Dialog [ "Failed to parse page" ; $r ]
Else
Set Variable [ $$text ; Value: MBS("DynaPDF.Parser.ExtractText"; $pdf; $page; "SortTextX NoHeuristic CropBox") ]
End If
See also
- DynaPDF.ExtractText
- DynaPDF.FindText
- DynaPDF.Parser.DeleteText
- DynaPDF.Parser.ParsePage
- DynaPDF.Parser.TextMatrix
- IsError
Release notes
- Version 15.1
- Added flags MediaBox and CropBox to DynaPDF.Parser.ExtractText, DynaPDF.ExtractText and DynaPDF.ExtractDocumentText.
- Version 14.0
- Added DynaPDF parser functions: DynaPDF.Parser.ChangeAltFont, DynaPDF.Parser.Create, DynaPDF.Parser.DeleteText, DynaPDF.Parser.ExtractText, DynaPDF.Parser.FindText, DynaPDF.Parser.Line, DynaPDF.Parser.ParsePage, DynaPDF.Parser.ReplaceSelText, DynaPDF.Parser.SetAltFont, DynaPDF.Parser.TextMatrix, DynaPDF.Parser.WriteToPage.
Blog Entries
- MBS FileMaker Plugin, version 15.1pr1
- New in MBS FileMaker Plugin 14.0
- MBS FileMaker Plugin, version 13.6pr1
This function checks for a license.
Created 23th November 2023, last changed 31st January 2025
