Home
        3-Heights™ PDF Extract API, User Manual
         Contents
1.                    QMS                                        3 Heights TM  PDF Optimizer API 1 60     3 Heights TM  PDF Printer API 1 60  3 Heiahts TM  PDF Renderer API 1 60 Fa   gt                    3 Heights TM  PDF Export API 1 60    Location  D  Bin bin PDFParser dll  Language  Standard          ASP Script    The PDF Extract component can be accessed in an ASP script using the call  Server CreateObject and a class name as parameter  For example to create PDF  Extract Document object  use a command like this     set pdfDoc   Server CreateObject  PDFParser  Document      Here is a small ASP sample how to create a Document object and then retrieve the  total number of pages in a PDF file  The path to the PDF  myfile pdf  needs to be  modified                                       lt    Language VBScript   gt    lt    option explict  dim pdfDoc  set pdfDoc   Server CreateObject   PDFParser Document    if not pdfDoc Open  Server Mappath   myfile pdf    then  Response Write   lt p gt    Response Write  Could not open file    amp    lt br gt    end if  Response Write   lt p gt    Response Write  Number of pages     amp  pdfDoc PageCount      lt br gt    Response Write   lt  p gt      gt           PDF Tools AG   Premium PDF Technology    3 3    3 Heights    PDF Extract API Version 4 5    Page 23 of 80  July 9  2015          NET       There should be at least one  NET sample for MS Visual Studio 2005 available in the    ZIP archive of the Windows Version of the 3 Heights    PDF 
2.           Table  Interfaces    Interface Programming Languages       NET The MS software platform  NET can be used with any  NET capable  programming language such as     CH   VB  NET  JA  others    JNI The Java native interface  JNI  is for use with Java     COM The component object model  COM  interface can be used with any  COM capable programming language  such as     MS Visual Basic   MS Office Products such as Access or Excel  VBA   C     VBScript   others    c The native C interface is for use with C and C                PDF Tools AG   Premium PDF Technology       3 Heights    PDF Extract API Version 4 5 Page 13 of 80    July 9  2015       Distributed Files    The software developer kit  SDK  contains all files that are used for developing the  software  The roles of all files with respect to the four different interfaces is shown in  Table  Files for Development  The files are split in four categories     Req  This file is required for this interface     Opt  This file is optional  e g  pdcjk dll is used to support Asian languages  it  is not used for other languages   See also Table  File Description to  identify which files are required for your application     Doc  This file is for documentation only     An empty field indicates this file is not used at all for this particular  interface     Table  Files for Development          Name    NET JNI COM C  binXPDFParser dll Req  Req  Req  Req   binYpdcjk dl1 Opt  Opt  Opt  Opt   bin  NET d11 Req    bin  NET xml Doc    b
3.      usssssnannnnnnnnnnnnnnnnn nennen nn nun nun nun nun nun nun nun nun nennen nennen 17   1 8 Uninstall  Install a new version       cece cece eee nun nun nun nun nun nun nennen nennen nennen 17  1 9  UNIX O EEE 17  Installation on Unix Systems         uasserennnnennnnnnnnnennn nenne nun nennen nen nennen nen 17  Installation on Mac OS X  cece nennen en enn nenne une nnnn nennen nenne 17   ES AAA 3 In LIEBEN 18  2 License Management        uuanuannannannnnnunnunnunnunnunnunnunnnunnunnunnunnannunnunnannunnn 19  2 1 Graphical License Manager Tool             uz urur nHnnan en nnn nn namen nn teens tees ea nenn nen 19  List all installed license keyS   nennensennennennennnn nn eee eee eee eens testes une n nennen nennen 19   Add and delete license keyS         u2ss snsnennnnennnnnnnn nenne nenne nenne nenne nenne nennen 19   Display the properties of a license          erauserannenanennnnnnnnnennnn en nn en nn ee tees 20   Select between different license keys for a single product           urs4 4 444  20   2 2 Command Line License Manager Tool            c ceceeeeeee eee ee nn nn nn ann anne namen nen 20  List all installed license KCYS           cceceee cece eee eee e eee nnennnn ernennen 20   Add and delete license keys    unnernennennennennnnennennenn nenn nenne nun nun nun nennen nennen 20   Select between different license keys for a single product           44s4 4 444  so 20   23   license KEY SO Aira ii a beck ld ee Caled 21  WiNdOWS  een li 21   Mac  OS Kivi oth EN e 
4.     BeginOCM ocmM  4  note that OCM blocks can be nested  typically  uses for hierarchical OCGs       Path Path gray 64 square       EndOCM      BeginOCM OCM  5  OCG 5 is  Gray 128    smatch Path gray 128   square      EndOCM  EndOCM          PDF Tools AG   Premium PDF Technology    
5.     TOOLS COM    Premium PDF Technology          3 Heights    PDF Extract API    Version 4 5    User Manual       Contact  pdfsupport pdf tools com    Owner  PDF Tools AG    Kasernenstrasse 1  8184 Bachenb  lach  Switzerland    Switzerland    http   www pdf tools com          Copyright    2003 2015    3 Heights    PDF Extract API Version 4 5 Page 2 of 80  July 9  2015       Table of Contents    Table Of Contents          uuuzunuunnunnannannannnnnannannunnunnunnunnunnunnunnunnnunnunnunnunnunnunnunnannen 2  1 Introduction         uunuunuunnunnunnnnnnunnunnunnunnannunnunnunnunnunnunnunnunnnnnnunnunnunnannunnunen 9  Lil  Description  near ee a a aa er 9  1 2      FUNGHONS Er A A A A A 9  A O 10  Formats  u Rena 10  COMME sa 10   13   A nee tee 10  14     Operating  Systems  a ne ie ei cad 10  1 5 Installation   Software Developer Kit           4s4 r4sHernn nen nn nenne nenn nenne nenne nenne 12  Interfaco Sh    en sense 12  Distributed les a a LI 13   Color Profiles  an  N ai N a A END la 14   1 6 Deployment   Runtime Kit           s r4srerannennn nen nn nenne nenne nenne nenne nenne nenne nennen 14  Distributed  Files Anna na 14  Deploying the Application     uunersennennennennen eee eee en een nn nennen nennen nennen nennen 15  Example  2 3 NO 15   1 7 Interface specific Installation Steps        essssseneen non none nen nn e estes nennen nennen nennen 15  COM Interface  a a A Alsen ii ds 15  Java Interf  ce    ii nn a nn Te 16   NET Interface    ak ie 16   Native C Interface  
6.    and    World    and otherwise as    Hello World       Merge text tokens that are a single space width apart   displacement   insert space  Do not set this option  if  you need the RawString property           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 73 of 80  July 9  2015       Example  If set  the text objects    Hello    and    World    are  extracted as    Hello World     if they are approximately one  space width apart     eTECPosMergeMultiSpace Merge text tokens that are one or more space widths  apart  displacement   insert multiple spaces  Do not set  this option  if you need the RawString property   Example  If set  the text objects    Hello    and    World    are  extracted as    Hello World     where spaces are inserted  to represent the distance of the objects           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5    July 9  2015    Page 74 of 80             5 Interface Changes  5 1 Changes from 1 4 to 1 4 1  This is a list of interface changes from version 1 4  1 4 0 21  to version 1 4 1   1 4 1 24    Annotation Interface New  Property TextLabel  ColorSpace Interface New  Property Colorant  Content Interface New  Property Flags  Destination Interface New  Property Zoom  Font Interface Removed  Property FirstChar  Property LastChar  Image Interface New  Method StoreInMemory  Method GetImage  Page Interface New  Property BleedBox  Property TrimBox  Property ArtBox   Property Device
7.   2015       Subject  Property String Subject  Accessors  Get    Return the subject from the document s info object     Title  Property String Title  Accessors  Get    Return the title from the document s info object     Page Interface       ArtBox  Property Variant ArtBox  ACCESSOS COE    This property returns the art box rectangle given by the coordinates left  bottom  right   top  The values are returned as an array of four single precision real numbers  The art  box is optional  it defines the region that contains meaningful content intended by the  creator  If there is no art box set  the crop box is returned     BleedBox  Property Variant BleedBox    Accessors  Get    Return the bleed box rectangle given by the coordinates left  bottom  right  top  The  values are returned as an array of four single precision real numbers  The bleed box is  optional  it defining the region to which the contents of the page should be clipped  when output in a production environment  If there is no bleed box set  the crop box is  returned     Content  Property IPDFContent  Content    Accessors  Get    Return an interface to the content stream of the page  see Content Interface      CropBox  Property Variant CropBox    Accessors  Get          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 35 of 80  July 9  2015       Return the crop box rectangle given by the coordinates left  bottom  right  top  The  values are returned as an array of four single pr
8.   Compile       Debug References  Unused References      Reference Paths              Reference Name   Type   version   Copy Local   Path             References  libpdFNET    NET 1 7 0 13 True C  Program Files pdf tools bin libpdf NET dll  PdfExtractNET    NET 1 7 0 13 True C  Program Filesipdf tools bin PdfExtractNET dll   Resources System    NET 2 0 0 0 False CHWINNT Microsoft NET Frameworkiv2 0 507271System dil  System Data    NET 2 0 0 0 False CHWINNT Microsoft NET Framework v2 0 507271System Data dll   Settings System Deployment    NET 2 0 0 0 False CHWINNT Microsoft NET Frameworkiyv2 0 507271 5ystem Deployment dil  System Drawing    NET 2 0 0 0 False CHWINNT Microsoft NET Framework v2 0 507271 System Drawing dil   Signing System  Windows Forms  NET 2 0 0 0 False Ch WINNT  Microsoft  NET Framework w2 0 50727 5ystem Windows Forms  dll  System  xml    NET 2 0 0 0 False C  WINNT  Microsoft  NET Framework yv2 0 50727 System Xml dll   Security   Publish       Imported namespaces    Pdftools  Pd Add User Import      Pdftools                           Pdftools PdfExtract    System  CodeDom   System  CodeDom  Compiler   System  Collections  ObjectModel   System  Collections  Specialized   System  ComponentModel                                                       Add        Remove   Update Web Reference         System  ComponentModel  Design  System  ComponentModel  Design  Serialization xl Update User Import      a                   The  NET interface can now be used as shown 
9.   Property Integer ComponentsPerPixel    Accessors  Get    Return the number of components per pixel     HighIndex  Property Integer HighIndex  ACCESS Orgs COT    Return the highest value of the indexed colors  It is O when no indexed color space is  used     IsColor  Property Boolean IsColor    Accessors  Get    Return true when the color space is color     IsIndexed  Property Boolean IsIndexed    Accessors  Get          PDF Tools AG   Premium PDF Technology    4 9    3 Heights    PDF Extract API Version 4 5 Page 59 of 80  July 9  2015       Return true when the image uses indexed colors     IsMonochrome  Property Boolean IsMonochrome  ACCOSSOTSS COE    Return true when the color space is monochrome     Lookup  Property Variant Lookup    Accessors  Get    Return the lookup table     Name  Property String Name  Accessors  Get    Return the name of the color space as string  for example  DeviceGrey    DeviceRGB   or  Indexed       TransformMatrix Interface    a  b  C  d  e  f   Property Single  Property Single  Property Single  Property Single  Property Single    h 0 QQ oe mw    Property Single  Accessors  Get    The transformation matrix in PDF is specified by six numbers  All information about  orientation  rotation  scaling  skewing and translation can be calculated based on these  six numbers  However PDF Extract also provides properties which compute these  values     The values e and f represent the translation  In a matrix  100 1 ef   e is the distance  on the x axis fr
10.   eText  elmage  ePath  eSave    eRestore    TPDFErrorCode    Start of a sequence of objects  whose visibility is defined  by an optional content membership string     End of OCM sequence   No content object   Text object   Image object   Path object   Save the current graphics state  Restore the current graphics state    All TPDFErrorCode enumerations start with  PDF_  followed by a single letter which is  one of  S    E    W  or  I   an underscore and a descriptive text  The single letter gives  in an indication of the type of error  These are  Success  Error  Warning  Information   With respect to corrupt PDF files  An error indicates a corruption in the PDF  the file  may or may not be readable  A warning indicates the file is readable but not valid           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 72 of 80    July 9  2015       A full list of all PDF Tools error codes is available in the header file pdferror h  The error  codes that are listed to file access are listed here     PDF_S_SUCCESS    PDF E EVAL    PDF E FILEOPEN       PDF E FILECREATE    PDF E PASSWORD    TPDFOrientation  eOrientationUndef    eOrientationTopLeft    eOrientationTopRight    eOrientationBottomRight    eOrientationBottomLeft    eOrientationLeftTop    eOrientationRightTop    eOrientationRightBottom    eOrientationLeftBottom    The operation was completed successfully     This software is an evaluation version  Please contact  www pdf tools com     The 
11.  11i and later   a64  Itanium  64  it   IBM AIX 5 1 and later  64 bit    Linux  32 and 64 bit    Mac OS X 10 4 and later  32 and 64 bit           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 11 of 80  July 9  2015       e Sun Solaris 2 8 and later  SPARC and Intel  e FreeBSD 4 7 and later 32 bit or FreeBSD 9 3 and later 64 bit  on request           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 12 of 80    July 9  2015       e Installation    1 5 Installation   Software Developer Kit       The installation of the software requires the following steps     1  Download the software  which is provided as ZIP archive from your download    account     2  Unzip the files using a tool like WinZip to a directory on your local hard disk  where your program files reside  Check the appropriate option to preserve file  paths  folder names   The list of files including sub directories of the developer  kit  SDK  is listed in Table  Files for Development     3  Identify which interface   NET  JNI  COM  C  you are using and perform the  specific installation steps for that interface  These steps are described in the  following chapters     Interfaces    The 3 Heigths    PDF Extract API provides four different interfaces  The installation and  deployment of the software depend on the interface you are using     The table below shows the supported interfaces and with which programming  languages they can be used 
12.  Accessors  Get    Return the destination of a link annotation  This entry is permitted if an A  action   entry is present     Flags   Property Long Flags   AGECASS OS CC   Return the flags of the annotation as 32 bit integer     Invisible   Hidden  PDF 1 2    Print  PDF 1 2    NoZoom  PDF 1 3   NoRotate  PDF 1 3   NoView  PDF 1 3   ReadOnly  PDF 1 3   Locked  PDF 1 4   ToggleNoView  PDF 1 5     oOOWOrAtoauw4F WDE    IsMarkup  Property Booloean IsMarkup  nNeeessors n Get    Return whether the annotation is a markup annotation  The following annotations are  considered markup annotations     e Free Text annotations  e Annotations that have a pop up window that may display text    e Sound annotations    Name  Property String Name  Accessors  Get    Return the name of the annotation as string           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 63 of 80  July 9  2015       Rect  Property Variant Rect  Accessors  Get    Return the rectangle of the annotation as x1  y1  x2  y2  Where x1  y1 is the lower left  corner of the annotation and x2  y2 the upper right corner  The coordinates are raw  pdf coordinates  In order to calculate where the rectangle is positioned on the page as  displayed by a viewer  the rectangle must be cropped using the page   s CropBox and  rotated using the Rotate attribute        Subj  Property String Subj  Accessors  Get    Return the text representing a short description of the subject  This property is only  a
13.  July 9  2015       APOS Y POS A EA A AAA AAA 47  4 6   GraphicsState  Intertace midi A nennen 47  Alda SA A dd 47  AE a eee nent nee een EEE EEE nun nun ESSERE EEE nun nun EEE EE EES 47  CharS paGinG A O TT 48  TM A A AAA AAA 48  DE NR 48  BE E NO 48  FINA aC tad  a 48  FINGOIOFEMNY Kisii tc a laud EAA A ah 48  ENCARGA i   49  FIICOLORS PACE 200 AAA Re 50  EINOVverprintrFl  g rs dl dale 50  REA 50  FON Esc iia 50  A E 50  HorizontalScalin aaa 50  E A ran nk aa aa anaa aa 51  A Ha oh de bck sas ba hd bn add bane ak dd bak ate Gas ahd od 51  LINGIJOIN seis eave Sta coat dy co SM ta ci i i 51  EineWidthurase  sen ren a RL een 51  A a a a a a ann nd ah ER ha nen sendin ae Ehe een 51  OverprintMode  n     nr  are an 52  RenderingIntent    u    en e ei a aan 52  SmoothnessTolerance        zu2snsnnnnnnnnnnnnnnnnnnen nenne nun nun nun nun nun nun nun nun nun nun nn 52  SoftMask ur  4 58 A 52  Strok amp Adjustment u    nm  ea een ana 52  SPACE Width iii a nn Renee 52  Stroke AlphaConstantianiiii a ad 52  StrokeColorGMy Koi en in Lena nen rd 53  StrokeC0lo  rRGB i      u un nn Ihn 53  Stroke ColorS pace in    EDEL ee 53  StrokeO  VverprintFlag  a  ame ee ab ia seer vad teva a a de i ives 53  TEX TKNOCKOU diia su 53  TextRenderingMode ncaa a a a LER Lei  53  TETRIS A a can a eae eae ae a eae Meee 54  WOrdSpaGinG iii A A 54  4 7 AAA A  54  A OE 54  AI A a 54  Base ii a Pendens eens da 55  C  apHeight  iii a ae TER cy inka eg tated 55  CA ai iia 55  Descente osaa E S a Veeialedee ender P
14.  are italic    17 AllCap Font has no lowercase letters    18 SmallCap Lowercase letters are small uppercase letters    19 ForceBold If set  bold glyphs are painted bold even at very small  text size    FontBBox    Property Variant FontBBox  Accessors  Get    Return the font bounding box  The font bounding box is the rectangle in which all  glyphs would fit  if they were placed on top of each other with their origins at the same  point     FontFile  Property Variant FontFile  Accessors  Get    Return a stream that contains a Typel font program     FontFileType  Property Integer FontFileType  ACCS SOS COT    Return the type of the font  A value of 1 corresponds to a Type 1 font program  A  FontFile2 contains a TrueType font program  In most cases a value of 1  2 or 3 will be  returned     ItalicAngle  Property Single ItalicAngle  neeessors  Geis    Return the counter clockwise angle of the dominant vertical strokes of the font     Leading  Property Single Leading  Neeessors  COT    Return the desired spacing between baselines of consecutive lines of text           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 57 of 80  July 9  2015       MaxWidth  Property Single MaxWidth  DeeessorsseGeie    Return the maximum width of the glyphs in the font     MissingWidth  Property Single MissingWidth  DNeeessorsseGeis    Return the value of the width which is used for character codes for which the glyph is  missing in the font directory   s Width arr
15.  are stored in the registry   e HKLM Software PDF Tools AG  for all users   e HKCU Software PDF Tools AG  for the current user     Mac OS X  The license keys are stored in the file system   e  Library Application Support PDF Tools AG  for all users   e   Library Application Support PDF Tools AG  for the current user     Unix   Linux  The license keys are stored in the file system   e  etc opt pdf tools  for all users   e    pdf tools  for the current user     Note  The user  group and permissions of those directories are set explicitly by the  license manager tool     It may be necessary to change permissions to make the licenses readable for all users   Example     chmod  R gotrx  etc opt pdf tools    Getting started    Visual Basic       In order to use the component in a Visual Basic 6 project  you have to add the  component as a project reference as shown below  The version which is registered will  show up           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 22 of 80  July 9  2015    References   Project1 yr xi    Available References                         Visual Basic For Applications  visual Basic runtime objects and procedures  Visual Basic objects and procedures   OLE Automation   145 Helper COM Component 1 0 Type Library   145 RADIUS Protocol 1 0 Type Library al  3 Heights TM  Font To PDF Conversion API 1 60  __ 3 Heights TM  Image to PDF Converter API 1 60 Priority      3 Heights TM  PDF Annotation API 1 60  ights TM   
16.  consists of  NET assemblies  which are added to the project and a native DLL  which is  called by the  NET assemblies  This has to be accounted for when installing and  deploying the tool     The  NET assemblies   NET d    are to be added as references to the project  They are  required at compilation time  See also chapter  Getting Started      PDFParser dll is not a  NET assembly  but a native DLL  It is not to be added as a  reference in the project     The native DLL PDFParser dll is called by the  NET assembly PdfExtractNET dll   PDFParser  dl  must be found at execution time by the Windows operating system     The common way to do this is adding PDFParser dll as an existing item to the project  and set its property    Copy to output directory    to    Copy if newer        Alternatively the directory where PDFParser dll resides can be added to the  environment variable    PATH    or it can simply be copied manually to the output  directory           PDF Tools AG   Premium PDF Technology    1 8    3 Heights    PDF Extract API Version 4 5 Page 17 of 80  July 9  2015       Native C Interface  e The header file expa_c h needs to be included in the C C   program   e The Object File Library  ib PDFParser lib needs to be linked to the project     e  PDFParser dll should be on the environment variable PATH or  if using MS Visual  Studio  in the directory for executable files     Uninstall  Install a new version       1 9    In order to uninstall the product undo all the steps d
17.  constant           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 53 of 80  July 9  2015       StrokeColorCMYK  Property Long StrokeColorCMYK  neeessors Geie    Return the CMYK color quad for stroking operations  The color value is obtained by  converting the color values of the property StrokeColor by means of the  StrokeColorSpace  The CMYK quads are encoded using the following formula  Quad       C   256    M    256   Y    256   K     StrokeColorRGB  Property Long StrokeColorRGB  ASE SSS OMS Cie    Return the RGB color triple for stroking operations  The color value is obtained by  converting the color values of the property StrokeColor by means of the  StrokeColorSpace  The RGB triples are encoded using the following formula  Triple      R   256    G    256   B     StrokeColorSpace  Property PDFColorSpace StrokeColorSpace    Accessors  Get    Return an interface to the current color space that is used for stroking operations  see  ColorSpace Interface   The color space is used to interpret color values of the property  StrokeColor     StrokeOverprintFlag  Property Boolean StrokeOverprintFlag  nNeeessorsemGeis    This property returns the overprint flag for stroking painting operations     TextKnockout   Property Boolean TextKnockout   nNeeessors aGeite   Return the text knockout flag  This Boolean flag determines what text elements are    considered elementary objects for purposes of color compositing in the transparent  imaging mo
18.  degrees while displaying  A positive number turns the  page clockwise  The value must be a multiple of 90  i e  valid values are  270   180     90  O  90  180  270     TrimBox  Property Variant TrimBox  nNeeessors mGet    Return the trim box rectangle given by the coordinates left  bottom  right  top  The  values are returned as an array of four single precision real numbers  The trim box is  optional  it defines the intended dimensions of the finished page after trimming  If  there is no trim box set  the crop box is returned     Content Interface       BreakWords   Property Boolean BreakWords  Accessors  Get  Set  Default  True    This property is deprecated and superseded by the TextExtConfiguration property  In  order to get the same behavior as with BreakWords  use the following options     eBreakWords true  Set the eTECBreakSpaceUnicode flag and clear the flags  eTECPosMergeSingleSpace and eTECPosMergeMultiSpace     eBreakWords false  Clear the eTECBreakSpaceUnicode flag and set the flags  eTECPosMergeSingleSpace and eTECPosMergeMultiSpace        BoundingBox  Property Variant BoundingBox    Accessors  Get  Set       Default  CropBox of the page    The bounding box is a rectangle in user space units  1 72 inch   The rectangle is used   when the Reset   method is called with AccountForRotate set to TRUE and has an  effect on the coordinate transform  The bounding box must be set before calling  Reset             PDF Tools AG   Premium PDF Technology    3 Heights    PDF E
19.  embed  Unicode mapping information for a symbolic font           PDF Tools AG   Premium PDF Technology    4 6    3 Heights    PDF Extract API Version 4 5 Page 47 of 80  July 9  2015       Width  Property Single Width  Accessors  Get    Return the width of the string in points     XPos  YPos   Property Variant XPos  Property Variant YPos  DeeessossemGeie    Return the X and Y position of the characters  The return value is a 1 dimensional  array holding the positions of all characters     Ifthe a Text contains n characters   XPos 0  represents the 1  character  XPos n 1  represents the last character     XPos n  is a calculated  virtual position of where the next character would start  This  position and the actual position of the next character can be compared to decide  whether they belong to the same word  or not     GraphicsState Interface       Entries which have a complex structure  such as a function  are not retrievable with  the 3 Heights    PDF Extract Tool  These are for example  black generation functions   BG   transfer functions  TR or  under color removal functions    UCR     The extract tool has the ability to return colors in RGB or CMYK  If the requested color  space is different from the actual color space in the PDF  the color conversion is down  using color profiles     AlphalsShape  Property Boolean AlphalsShape  Neeessors  COE    Return the    AlphalsShape    flag  It is true if the soft mask contains shape values  it  returns false for opacity     Blen
20.  license is selected in the license list  its properties are displayed in the right pane  of the window     Select between different license keys for a single product    More than one license key can be installed for a specific product  The checkbox on the  left side in the license list marks the currently active license key     Command Line License Manager Tool       The command line license manager tool  icmgr is available in the bin directory for all  platforms except Windows     A complete description of all commands and options can be obtained by running the  program without parameters     licmgr    List all installed license keys  liemgr List    The currently active license for a specific product ist marked with a star         on the left  side     Add and delete license keys    Install new license key  licmgr store X XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX       Delete old license key       licmgr delete X XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX       Both commands have the optional argument  s that defines the scope of the action   e y  For all users    e u  Current user    Select between different license keys for a single product  licmgr select X XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX             PDF Tools AG   Premium PDF Technology    2 3    3 Heights    PDF Extract API Version 4 5 Page 21 of 80  July 9  2015       License Key Storage       3 1    Depending on the platform the license management system uses different stores for  the license keys     Windows  The license keys
21.  method opens a PDF memory block     e  makes the objects contained in the PDF  document accessible  If the document is already open it is closed first     e Parameters     MemBlock  The memory block containing the PDF file given as a one  dimensional byte array     Password  optional   the user or the owner password of the encrypted PDF  document  If this parameter is left out an empty string is used as a default     e Return value   True  The document was opened successfully from memory     False  The document in memory is not readable     Page  Property PDFPage Page  Accessors  Get    This property allows to retrieve an interface to the currently selected page of a  document     PageCount  Property Long PageCount  nNeeessors nGeie    Return the number of pages of an open document  If the document is closed then zero  is returned     For collections  aka  PDF Portfolios  with no cover page  this property returns 0     PageNo   Property Long PageNo  Accessors  Get  Set  Dekan     This property allows to set and get the currently selected page of an open document  given its page number  The numbers are counted from 1 for the first page to the value  ofthe PageCount attribute for the last page  If the document is closed zero is returned     Producer    Property String Producer  Accessors  Get    Return the name of the producer from the document s info object           PDF Tools AG   Premium PDF Technology    4 2    3 Heights    PDF Extract API Version 4 5 Page 34 of 80  July 9
22.  that is provided with the Windows operating system  located in  C  windows system32   The following screenshot shows the registration of  PDFExtract  dll           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 16 of 80  July 9  2015     Command Prompt       C  Program Files pdf tools bin gt regsvur32 pdfparser dll_          If the registration process succeeds the following box is displayed   x    f     9 DilRegisterServer in pdfparser  dll succeeded           The registration can also be done silently  e g  for deployment  using the switch  s     Other Files  The other DLLs do not need to be registered  but for simplicity it is  suggested that they are in the same directory as the PDFParser dll     Java Interface    For compilation and execution  The Java Archive jar EXPA jar needs to be on the  class search path  This can be done by either adding it to the environment variable  CLASSPATH  or by specifying it using the switch  classpath     javac  classpath   C  pdf tools jar EXPA jar TextExt java    For execution  Additionally the Library bin PDFParser dlil needs to be on the library  path  This can be achieved by either adding it to the environment variable PATH  or by  specifying it using the switch  Djava library path     java  classpath   C  pdf tools jar EXPA jar  Djava library path    C  pdf   tools bin TextExt input pdf       NET Interface    The 3 Heights    PDF Extract API does not provide a pure  NET solution  Instead  it 
23.  the current  GraphicsState   The image space that is transformed by the CTM is the unit square  0  O 1 1   i e  the unit square is mapped to the rectangle or parallelogram in which the  image is to be painted  For example the coordinate on the page of the bottom right  corner of the untransformed image is the transformation of the coordinate  1 1      Image Resolution    Images are resources in a PDF document  Every image can be referenced multiple  times in the document  The image itself doesn   t have resolution  it only has a  resolution when referenced on a page  The resolution depends on the ratio of the  dimensions of the image and its size on the page  it can be different every time     Image Orientation    Images can be stored with an orientation other than TopLeft  default   In order to  display them visually correctly  there is a transformation matrix applied to invert the  orientation  In order to ensure the images are saved with the same orientation as they  are displayed on the PDF  use the method ChangeOrientation as shown in the sample     Optional Content  Layers        In order to associate content objects to Optional Content Groups  OCG  that define  their visibility  the following steps have to be taken  First  the IgnoreOCM property  must be set to true  Second  use the Content interface s GetNextObject   method to  extract content objects  Whenever a BeginOCM operator is encountered  the OCM  property contains the optional content membership string that 
24.  u444 n HR ann nn en nn 78  Text Extraction of Text Marked as Symbolic                4   u4 4HR HR ann an en 79  Image  Extr  ction a 79  Image RESOM ON escanea re dan ne ee 79          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 8 of 80  July 9  2015       IMAGE OREA Pettit aa Pelee Sa ictal da ici dela ctci lett aS 79  5 14 Optional Content  Layers           z4ur rnnnennn nen nn nenne eee eee eee eee eee tenets 79          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 9 of 80  July 9  2015             1 Introduction  1 1 Description  The 3 Heights    PDF Extract Tool is a solution for extracting and querying various  attributes and page content from a PDF document  This includes texts  images  graphic  objects  including paths   metadata and embedded fonts   It is also possible to query the properties of objects  Intelligent mechanisms  significantly increase extraction rates  for instance when extracting text   PDF Extract Tool  Texts   Fonts rn Se     Pages   Contents 7  Document   Metadata      TIFF  JPEG      ____Outines a  Parameters  1 2 Functions       The PDF Extract Tool is used to extract text  images and graphic objects  including  paths  from PDF documents  Text is extractable as lines and as individual words  It is  also possible to query information such as position  color  font and font size  Intelligent  functions such as heuristics  word formation support and character set 
25. 00 to  OxFF  GG is green  RR is red     Decimal     To retrieve the values for blue  green  red  apply the following formulas  integer   division   and bitwise and And      Triple   PDFPARSERLib GraphicsState FillColorRGB  B   Triple   65536   G  Triple   256  And 255   R   Triple And 255    Example   Triple   8388736  purple     B   8388736   65536   128  G    8388736   256  And 255   0  R   8388736 And 255   128          PDF Tools AG   Premium PDF Technology    Page 50 of 80    3 Heights    PDF Extract API Version 4 5       July 9  2015  There are also other ways to retrieve these values than using the above formulas     FillColorSpace  Property PDFColorSpace FillColorSpace    Accessors  Get  Return an interface to the current color space that is used for filling operations  see    ColorSpace Interface   The color space is used to interpret color values of the property    FillColor     FillOverprintFlag  Property Boolean FillOverprintFlag    nNeeessors Get  Return the overprint flag for painting operations other than stroking     FlatnessTolerance  Property Single FlatnessTolerance    Accessors  Get  Return the flatness tolerance  Must be a positive number  A small number means    higher precision     Font  Property IPDFFont  Font    Accessors  Get  Return an interface to the text s font object that describe the character encoding as    well as the shape of the character glyphs     FontSize  Property Single FontSize    Accessors  Get  Return the current font size for text s
26. A one bit signifies a transparent pixel and a zero bit signifies a pixel  with the current fill color  see GraphicsState Interface      SMask  Propertiy Variant SMask    With this property the soft mask of an image can be extracted     Store  Method Boolean Store String FileName  TPDFCompression Compression   Store the image as a file    e Parameters     FileName  The name of the disk file include path  drive  or Server string  according to the operating system   s naming rules  The type of the image is  defined by its extension    jpg  or   tif       Compression  optional   The compression type  for TIFF images   The  default value is eComprDefault     e Return values   True  The file has successfully been written     False  An error has occurred and the disk file is unusable     StoreInMemory    Method Boolean StoreInMemory  String Extension  TPDFCompression  Compression     Store the image in memory  The saved image can be retrieved using the method  GetImage     e Parameters           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 45 of 80  July 9  2015       Extension  The type of the image is defined by its extension    jpg  or    tif       Compression  optional   The compression type  for TIFF images   The  default value is eComprDefault     e Return values   True  The image has successfully been saved     False otherwise     Width  Property Long Width    Accessors  Get    Return the width of the image in pixels  also called samp
27. Coli anni 61  COMUNES A A ee Aen Aiea eee Fangen 61  Dar ee TE 62  Dei ar DE e Rall 62  PAS at en ne er a 62  E aa eg ae eect ces 62  Name aia a Annan 62  Rd A A a HERE esau a 63  SUD A A A A A ASA 63  SUDY PE la a ann a a 63  TexXtLabel ni anne 63  UREA O ee ee 63  VEREICES En ee ee ee ae A LAANE ee here 63  4 12 Outlineltem Interface          cece cece eee nun nun nun nun nun nun nun nun nun nun nennen nennen 64  A irbeirsen 64  Desa dada 64  A O O 64  4 13 Destination Interface      eusessennennonnnnnnnnnnnnn non nun nr nr rr rr rr rr rr rr rr 64  BOOM aa tele e nesta velew hake holes nesta wel whole pele Seka weld hed whale Bekw wuld whek Reale UA bale 64  S A tA ihe kata ddA wise hada Meds A E ad tale AL 64  PAGGNO iii A A A Malet ee 64  RIGA id 65  E ORTA 65  TP ge Waianae    65  ZOOM en ee aris LD ara ne Renee ons 65  4 14 Ocg Interf  ce un  ee la RR ee 65  A MN NE 66          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 7 of 80  July 9  2015       4 15    4 17    5    5 1  5 2  5 3  5 4  5 5  5 6  5 7  5 8  5 9  5 10  5 11  5 12    Level RR Be ee 66  Names  ra a nl ee EEE 66  UD i   66  Example Livin is oa din ov aie vc cick dea ee elev da A ce dees Feed 67  EX MPIG 2    ea nes ea naar nr nee Erna welds va RER aan tenes FRa HR 67  PBFObject  Interface    lol o Ln do aa A di 67  Begin  GetNext  End  ur nn a es 67  TA NN 68  Dispose  DestroyObject    nn ia 68  GEtEIEMENE  doc  ias 68  GELEN O O RN 68  G  tStrea M noa A en 68  TR
28. Colorant  Text Interface New  Property BoundingBox  Property FontSize  Property  Length  Property Rotation  Property Width  Property XPos   Property YPos  Removed  Property RawString  Property TextMatrix  Property  NextXPos  Property NextYPos  The properties TexMatrix  NextXPos  NextYPos are marked as  deprecated   No changes in the following interfaces  Alternatelmage  Document  GraphicsState   Outlineltem  TransformationMatrix  5 2 Changes from 1 4 1 to 1 5  This is a list of interface changes from version 1 41  1 4 1 24  to version 1 5   1 5 0 40    Annotation Interface New  Property Subj  Property Dest  Property URI  ColorSpace Interface New  Property Colorant  Content Interface New  Property SpaceFactor  Document Interface New  Method GetDestination  Property IsLinearized  Font Interface Removed  Property FirstChar  Property LastChar  Text Interface Removed  Property TexMatrix  Property NextXPos  Property  NextYPos  5 3 Changes from 1 5 to 1 6       This is a list of interface changes from version 1 5  1 5 0 40  to version 1 6  1 6 0 41            PDF Tools AG   Premium PDF Technology    5 4    3 Heights    PDF Extract API Version 4 5 Page 75 of 80  July 9  2015       Annotation Interface New  Property Vertices   ColorSpace Interface New  Properties ColorantName  IsColor  IsMonochrome  Content Interface New  Property BreakWords   Document Interface New  Properties Creator  Producer    GraphicsState Interface New  Properties AlphalsShape  BlendMode  FillAlphaConstant   Fi
29. EEGEIV AMO ita A ETE ea ev EL EE ta oh ed bales 68  Ma o 69  ObjJECENUMDET  0d ia ad en 69  REAIVAIUG ciao da ER Ren EA Eh eure 69  SS  VASE EINER A RER REINE EEE O AN 69  StringVYalue   ana Rinne 69  A FRE ILISTA TT ELIET ATTE LISIATE ETAETA TTET 69  EmbeddedFile Interface    28 22 NE 69  Check SUM ini a ae nal 69  CreationDates  nd a dr dd dd en E dai de 70  TA Be 70  ModDate hen nn a 70  O 70  StoOrel Me MO Yi AAA 70  ENUMEFARIONS sa    RR la dl aia Kal 71  1 PDFESMPLESSION  sa  ts 71  TPBFCGoNtEentObJect   n     u ai 71  lO 71  DPDEOMEMCAGON soa nas 72  TPDFTextExtractConfiguration aiaeei niena ke a a nun en anne nun 72  Interface Changes        unuanunuanannnnanunnannnnannanannunannanannnnannanannanannanannanannanen 74  Changes  from 1 4 to 1 4 1  2  na 74  Changes  fromy  1 4 1  Co 1 5    a2 a legen 74  Ch  nges from IO daa 74  Changes from  LO 1 2    0 2 aD 75  Changes from A 7 tO  AB paaa aa a aa a aa a a 75  CRanges froM1 8 to di Divan ini ae aiii 75  Changes from  1 9 to IA 76  Changes from  1 91 to 2 Qornini akai kin 76  Changes from 2 0 10  2 1    Jans ars o da een 76  Ch  nges fr0M 4 3  10 4 42    trad 76  Samples  amp  Background Information            zsr s4srennn nennen en nenn nenn nenn nenn anne 77  EXT EXC nee ek nen ae Tea a Ren EEE era EEE EEE bens 77  Undesired Missing Blanks       zersersennennennen nn nn en nennen nennen nennen nennen nennen 77  Extracted Text is Unreadable         0   2 nn 78  Handling of Symbolic and Non Symbolic Fonts              z
30. Extract API  Easiest for a  quick start is to refer to this sample     In order to create a new project from scratch  do the following steps   1  Start Visual Studio and create a new C  or VB project   2  Adda reference to the  NET assemblies   To do so  in the  Solution Explorer  right click your project and select  Add    Reference      The  Add Reference  dialog will appear  In the tab  Browse   browse for    the  NET assemblies libpdfNET dll  RendererNET dll and PafExtractNET dll and add  them to the project as shown below        NET   com   Projects Browse   Recent      a e A a e       libpdFWET dll  SS PdFExtractNeT  dl   9  PDFParser  dll    File name    Pate xtractNET dil     libpdfNe T  dll   gt      Files of type    Component Files    dll   tlb   olb    ocx   exe    manifest           3  Import namespaces  Note  This step is optional  but useful    4  Write Code    Steps 3 and 4 are shown separately for C   and Visual Basic     Visual Basic    3  Double click  My Project  to view its properties  On the left hand side  select the    menu  References   The  NET assemblies you added before should show up in  the upper window     In the lower window import the namespaces Pdftools Pdf and  Pdftools  PdfExtractNET     You should now have settings similar as in the screenshot below           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5  July 9  2015    Page 24 of 80       CH          Application  Configuration   N A F Platform   N A E  
31. RS SE  Return true when the image is bi tonal     IsColor  Property Boolean IsColor  ASESSS OBS CO   Return true when the image is color     IsMonochrome  Property Boolean IsMonochrome    Accessors  Get  Return true when the image is monochrome     ObjNumber  Property Long ObjNumber    Accessors  Get  Returns a unique number of this image resource  If the number is 0  the image    resource occurs once only in the document  i e  it is an inline image   If the number is  larger than 0  the image resource might be used multiple times     IsMonochrome  Property Boolean IsMonochrome    Accessors  Get          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 44 of 80  July 9  2015       Return true when the image is monochrome     Samples  Property Variant Samples  Accessors  Get    Return the image s data samples in a byte array  The sample data is ordered by line  from top to bottom and within a line from left to right  The lines are byte aligned  If the  number of bits per component is less than one byte then the samples are ordered  beginning with the most significant bit first     If the property ImageMask of the image is set to False  the interpretation of the  sample data must be done according to the properties in the color space of the image     If the property ImageMask of the image is set to True  the sample data represents a  stencil mask  In this case the color space isn   t meaningful and the data is organized  one bit per pixel  
32. Root      Info      Encrypt     e   n    Page n  Path operators     e   name   Entry  name  of the dictionary          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 30 of 80  July 9  2015       e   i    Index i in the array  Examples  e  g  Root Pages Kids 0  Contents   e   1 Resources Font TT2 FontDescriptor FontFamily     GetOcg   Method Ocg GetOcg Integer Count    Return an interface to an optional content group item   e Parameters     Count  The number of the optional content group  Optional content groups  are numbered from O to OcgCount 1     e Return value     An interface to an optional content group item    GetPageLabel  Method String GetPageLabel  Long PageNo     Return the label text associated to a specific page given its number  Examples for page  labels are     7     or    vii        e Parameters   PageNo  The page number  e Return value     A string holding the page label if a page label exists  If no page label exists  the page number is converted to a string and returned     GetXMPMetadata  Method Boolean GetXMPMetadata  String FileName   Extract the document   s XMP metadata stream and write it to the specified file   e Parameters   FileName  The name of the output file  e Return value     True  if the document contains XMP metadata and the stream was  successfully written to the output file     GetXMPMetadataMem  Method Variant GetXMPMetadata       Extract the document   s XMP metadata stream as a byte array  If the 
33. Text  Method PDFText GetNextText       This method reads the content stream objects until a text object can be returned or  the end of the content stream is reached  If a text object can be found  an interface to  the next read text object  see Text Interface  is returned  In contrast to the methods  GetNextImage and GetNextPath this method reads text objects and merges text  objects until a major text property  font  line coordinate  etc   changes or a word break  occurs if word breaking is enabled  see Property BreakWords   The current graphic  state can be retrieved through the current content object   s interface     e Return value   An interface to the next text object if there is any one this page   Nothing otherwise     GraphicsState  Property TPDFGraphicsState GraphicsState    Accessors  Get    Return an interface to the content   s graphics state  see GraphicsState Interface   The  graphics state is updated each time a method GetNextText  GetNextImage   GetNextPath  or GetNextObject is called     IgnoreOCM  Property Boolean IgnoreOCM          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 39 of 80  July 9  2015       Accessors  Get  Set    Option to ignore optional content membership and make all content visible  BeginOCM  and EndOCM objects are extracted  but they have no effect on the extracted content   E g  when true  hidden text is extracted as well  Set this property to true in order to  extract all content     Image  Pro
34. am is decompressed     IntegerValue  Property Long IntegerValue    Accessors  Get    Return the integer value of a numeric object           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 69 of 80  July 9  2015       Name  Property String Name  Accessors  Get    Returns the character sequence of a name object  The string is null terminated     ObjectNumber  applies to Indirect Objects  Method Long ObjectNumber    Return the object number     RealValue  Property Double RealValue    Accessors  Get    Return the real value of a numeric object    Size   applies to Arrays  Property Long Size  ACCESS Ons ma Gest    Returns the size of the array     StringValue  Property Variant StringValue  ACCOSSOrSS COT    Return the content of a string object as byte array     Type  Property Type Type  Accessors  Get    Return the type of the object  Possible return values  eTypeBoolean  eTypelnteger   eTypeReal  eTypeString  eTypeName  eTypeArray  eTypeDictionary  eTypelndirect    4 16 EmbeddedFile Interface       CheckSum  Property Variant CheckSum          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 70 of 80  July 9  2015       Accessors  Get    Get the 16 byte MD5 check sum     CreationDate  Property String CreationDate    Accessors  Get    Get the creation date     FileName  Property String FileName    Accessors  Get    Get the embedded file s path  If the embedded file has no associated file stream  the  func
35. ay     StemH  StemV  Property Single StemH  Property Single StemV  ACCS SOS COT    These properties return the vertical and horizontal thickness of the dominant vertical  and horizontal stems of the glyphs in the font     Type  Property Single Type  Accessors  Get    Return the font type as string     Widths  Property Variant Widths    Accessors  Get    Return an array which contains the widths of the glyphs     XHeight  Property Single XHeight  nNeeessorsen Get    Return the maximum height of flat non ascending lowercase letters  such as the letter  x  measured from the baseline      For further information about font descriptors  see PDF Reference  chapter 5 7            PDF Tools AG   Premium PDF Technology    4 8    3 Heights    PDF Extract API Version 4 5 Page 58 of 80  July 9  2015       ColorSpace Interface       BaseColorSpace  Property IPDFColorSpace  BaseColorSpace    Accessors  Get    Return a IPDFColorSpace interface to the base color space if it is existing     ColorantName   Property Variant ColorantName  Accessors  Get   Return the name of the colorant   Interface Note     COM  A variant containing an array of strings is returned  These strings represent the  name of the colorants of the color space  In an RGB color space these are    Red         Green        Blue        C   Net  An additional parameter is passed which defines the index of the colorant   Instead of a array containing all strings a single string is returned  e g     Red        ComponentsPerPixel
36. below   Dim document As New Pdftools PdfExtract Document     document Open         Dim content   document Page Content    Add the following namespaces    using Pdftools Pdf    using Pdftools PdfExtract    The  NET interface can now be used as shown below   document   new ment        document  Open          content   document Page Content           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 25 of 80  July 9  2015       Trouble Shooting    The most common issue when using the  NET interface is if the native DLL is not found  at execution time  This normally manifests when the constructor is called for the first  time and exception is thrown   normally of type System TypelnitializationException     To resolve that ensure the native DLL is found at execution time  For this  see sub   chapter     NET Interface    in the chapter    Installation              PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 26 of 80  July 9  2015          4 Reference Manual  Note this manual describes the COM interface only  Other interfaces  C  Java   NET   however work similarly     e  they have calls with similar names and the call sequence  to be used is the same as with COM    4 1 Document Interface       Author  Property String Author  Accessors  Get    Return the author from the document s info object     Close  Method Void Close       This method closes an open document  If the document is already closed the met
37. d     When an encoding is missing or incorrect  the text could become not extractable  Even  if the text is visually readable  if the meaning of the glyphs is not encoded  it cannot be  extracted  except by means of OCR      If text is not extractable using the text extraction of Adobe Acrobat 7 Professional   then it   s most likely not extractable with the 3 Heights    PDF Extract Tool and vice  versa     Handling of Symbolic and Non Symbolic Fonts    Fonts in PDF documents have so called font descriptor flags  See PDF Reference  Manual  chapter 5 7 1   These flags describe the font characteristics  such as fixed  pitch  serif  symbolic  italic  etc  If a font is flagged symbolic  it means its glyphs are  not part of the standard Latin character set  Typical symbolic glyphs are squares  stars   or other small icons like cars or animals  Often there is no Unicode for these glyphs   The 3 Heights PDF Extract Tool handles text extraction of symbolic  as well as non   symbolic  fonts as described below     If there is no encoding provided with the font  the intrinsic encoding is applied  which  works as follows     e Incase font file is embedded   If there is a Unicode for the glyph  the corresponding Unicode is returned   If there is no Unicode and    the font is flagged symbolic and part of the glyph names consist of a  numerical value  such as G1  G2     G100  the corresponding glyph  number  and for TrueType fonts the Unicode Private Section prefix  OxFOOO  is returned  Oth
38. d by the PDF  specification and our set of heuristics  These Unicodes might not be accurate  In some  cases  you might have prior knowledge about this specific font and know the mapping  of character codes to Unicodes yourself  E g  you know the creator used the EBCDIC  encoding  For this reason  the property RawString returns the string of character codes  and allows you to apply your own mapping     With RawString  do not use the TextExtConfiguration options eTECBreakSpaceUnicode   eTECPosMergeSingleSpace and eTECPosMergeMultiSpace  because the Unicode these  options work with  might not be accurate        Rotation  Property Single Rotation  Accessors  Get    Return the rotation of the string in radians  rad    2 pi rad   360       StringLength  Property Integer StringLength  ACCESSOS Gels    Return the number of characters in the string     UnicodeString  Property String UnicodeString  ACES SIS OBS  Get    Return the text as a Unicode UTF 16 encoded string  The number of bytes per  character is a multiple of two  For most languages such as English a character can be  mapped to a single 16 Bit Unicode value  Complex languages such as Chinese can  return multiple 16 Bit values per character  Some text strings  however  cannot be  correctly mapped or cannot be mapped at all  The former is the case if e g  the PDF  creator program didn t use correct names for the character in the font encoding  see  Font Interface   The latter is the case if e g  the PDF creator program didn t
39. dMode  Property String BlendMode  Accessors  Get    Return the name of the blend mode  A blend mode can be  Normal    Multiply     Screen    Overlay   etc           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 48 of 80  July 9  2015       CharSpacing  Property Single CharSpacing  nNeeessorseGeite    Return the current space between two characters of a text string as a single precision  real number in text units     CTM  Property PDFTransformMatrix CTM  Accessors  Get    Return an interface to the current transform matrix  The transform describes the  transformation of the graphic object   s coordinates from user units to page units  including the effect of the page rotate attribute if requested  see method Reset of the  Content Interface      DashArray  Property Variant DashArray  AScossorss COT    Return the dash array of a line dash pattern  The line dash pattern controls the pattern  of dashes and gaps used to stroke paths     DashPhase  Property Single DashPhase  Accessors  Get    Return the dash phase of a line dash pattern  The dash phase is the offset of the  pattern and can be larger as the pattern itself     FillAlphaConstant  Property Single FillAlphaConstant    Accessors  Get    Return the alpha constant for filling     FillColorCMYK  Property Long FillColorCMYK    Accessors  Get    Return the CMYK color quad for filling operations  The color value is obtained by  converting the color values of the property FillColor by mea
40. default is 0 3  This means any  distance between two characters that are further apart as 0 3 times the width of the  space character glyph in this font is interpreted as a new word  For text that is written  very narrowly  this property should be decreased in order to avoid concatenation of  words     Text  Property PDFText Text    Accessors  Get    Return an interface to the last read text object  see Text Interface   The text object is  updated each time the method GetNextText or GetNextObject is called     TextExtConfiguration  Property Long TextExtConfiguration    Accessors  Get  Set          PDF Tools AG   Premium PDF Technology    4 4    3 Heights    PDF Extract API Version 4 5 Page 41 of 80  July 9  2015             Default  7  eTECBreakTextState   eTECBreakGraphicsState    eTECBreakSpaceUnicode    This property serves to control the way the text extraction algorithm works  Text  extraction collects all text objects and merges them into a single text  This property  controls which text objects are merged  See the Enumeration  TPDFTextExtractConfiguration for a list of all possible options     Recommended settings for different use cases                    eText search or indexing  i e  text formatting is not important    o Extract Words individually  eTECBreakSpaceUnicode  o Extract phrases  eTECPosMergeSingleSpace   eTECPosMergeMultiSpace    eConversion of pdf content to another format  i e  text formatting and exact positioning is crucial    o Usage of RawString or 
41. defines the visibility of  subsequent content objects  until the matching EndOCM operator is encountered  The  respective OCG can be retrieved using the Document s GetOcg method     As an example  look at file www pdf tools com public downloads samples layers pdf  It  contains six colored squares and six optional content groups  The visibility of the red   green and blue squares is controlled by the respective OCGs  The yellow square is only  visible  if both OCGs Green and Blue are ON  The OCGs  Gray 64  and  Gray 128  are  child elements of the OCG  Gray  and control the visibility of the respective gray OCGs   These are visible only  if both the child and the parent OCG are ON           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5    July 9  2015    Page 80 of 80       Extracting OCGs from Layers pdf           id name level   0 Red 0   1 Green 0   2 Blue 0   3 Gray 0   4 Gray 64 1   5 Gray 128 1   Extracting objects from Layers pdf    type property value comment   BeginOCM ocM  0  the visibility of subsequent objects is defined  by the state of OCG 0   Red        Path Path red square   EndOCM end of OCM segment   BeginOCM OCM  1  OCG 1 is  Green      Path Path green square   EndOCM   BeginOCM OCM  2  OCG 2 is  Blue       Path Path blue square   EndOCM   BeginOCM OCM  1  amp  amp  2  subsequent objects are visible  if OCG 1 and  OCG 2 are ON      Path Path yellow square   EndOCM   BeginOCM OCM  3  OCG 3 is  Gray   parent OCG of 4 and 5  
42. del     TextRenderingMode  Property Short TextRenderingMode  ACCES Songs GST    Return a value that indicates whether the text should be stroked  filled  used as a clip  path or some combination of the three  The meaning of the values in detail is           PDF Tools AG   Premium PDF Technology    4 7    3 Heights    PDF Extract API Version 4 5 Page 54 of 80  July 9  2015       Fill text    Stroke text    Fill  then stroke text    Neither fill nor stroke text  invisible     Fill text and add path for clipping    Stroke text and add path for clipping    Fill  then stroke text and add path for clipping   Add path for clipping     YOU A W N BO    TextRise  Property Single TextRise  Accessors  Get    Return a single precision real number in un scaled text units that indicates by which  amount the base line of the text is moved up or down  It is most commonly used to  display subscripts and superscripts     WordSpacing  Property Single WordSpacing  AEESSS OS COE    Return the current space between two words of a text string as a single precision real  number in text units      For further information about the Graphic State  see PDF Reference  chapter 4 3      Font Interface       Ascent  Property Single Ascent  Accessors  Get    Return the Ascent value  This value represents the maximum height above the  baseline reached by the glyphs in the font  excluding the height of glyphs for accented  characters     AvgWidth  Property Single AvgWidth  Aeeessors  GST    Return the average w
43. document does  not contain XMP metadata  NULL is returned           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 31 of 80  July 9  2015       IsCollection  Property Boolean IsCollection  nNeeessorseGeie    Return true if the PDF document is a collection  aka PDF Portfolio      IsEncrypted  Property Boolean IsEncrypted  nNeeessors mGet    Return true if the PDF document has an encryption entry     IsLinearized    Property Boolean IsLinearized  nNeeessorsseGeis    Return true if the linearization flag is set in the PDF document  This property does not  actually validate whether the linearization is correct     Linearization refers to optimizing the PDF for fast web access  i e  support random  page access     Keywords  Property String Keywords  neeessors mGet    Return a string with the keywords of the document s info object     LastError  Property TPDFErrorCode LastError    Accessors  Get    This property can be accessed to receive the latest error code  Any return value other  than PDF_S_SUCCESS  0  indicates that an error occurred  See enumeration  TPDFErrorCode     LastErrorMessage   Property String LastErrorMessage   Accessors  Get   Return the error message text associated with the last error  see property LastError      Note  that the property is NULL  if no message is available           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 32 of 80  July 9  2015       MajorVersion  Property Inte
44. ecision real numbers  The crop box is  optional  it defines the range of the visible region of the page  If there is no crop box  set  the media box is returned     DeviceColorant  Property String DeviceColorant  Accessors  Get    Return the device colorant     Document    Property PDFDocument Document  Necocssorss Est    Return the interface to the page s document  see Document interface      GetFirstAnnotation  Method Annotation GetFirstAnnotation     Return an interface to the first annotation  see Annotation Interface    e Return value   An interface to the first annotation if any annotations exist     Nothing otherwise    GetNextAnnotation  Method Annotation GetNextAnnotation     Return an interface to the next annotation   e Return value   An interface to the next annotation if any further annotations exist     Nothing otherwise     MediaBox  Property Variant MediaBox  Accessors  Get    Return the media box rectangle given by the coordinates left  bottom  right  top  The  values are returned as an array of four single precision real numbers  The media box is  required  it defines the physical boundaries of the medium on which the page is  intended to be displayed or printed           PDF Tools AG   Premium PDF Technology    4 3    3 Heights    PDF Extract API Version 4 5 Page 36 of 80  July 9  2015       Rotate  Property Integer Rotate  Accessors  Get    Return the rotation value of the page  This value is used by viewer programs to turn  the page by the given number of
45. ee 21   UNDE A LINUX ers coerce og Bra tea Rn BIO EN EL Meade ed Id Ideas 21   3 Getting started        unuanannnnanunnannnnannnnannnnannnnannanannanannanannanannanannanannanannanen 21  3 1    VisUal Basic rs E I ne ee ee 21  3 2  ASP SCript en me a a aa ia 22  3 32  NET ithe A A A er nee 23  Visual Basic unta AA 23          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 3 of 80  July 9  2015       a aie ces A ahr tele ora ae ee ache te ie a dig Ra E E A Raat A A Bs hae 24  Trouble Shooting sn    ah 25  4 Reference Ma  llal iran 26  40  DOCUMENt INCE Mace tds 26  AUTO 26  Cs diia 26  COMPANIA A ih 26  Creation Daten sonar a IA Re 26  Creado a iaa 26  GetCurrentOutlinelevel vc  a a dde 26  GetDestinadtiON vs A AR 27  GetFirstColorSpaceResource       uuesansenanennnnennanennnnennnn en nam ennn en nnn nennen nen 27  GetFirstembeddedRile u    aaa ana ana un 27  GetFirstFontResource       cccceccseceeeeessceeeeeeeessuaeeeeeeeesagaeeeereesggggnteeteesgaags 27  GetFirstlMagGeReSOUrce iaa 28  GetElIrstQUUlIN lema Has 28  GetInfoEntrYy  era te dg veddivadvetein CE AA EIA ETAL aaa to da act 28  GetNextColorSpaceResource       2    cecceceee eee e cence teeta e eee teeta teeta ee nennen 28  GetNextEMbedded File  iia anne ai 28  GetNeXtFONtRESOUICe      ccccececcceeeeescceeeeeeecggueeeeeeessguaeeeeeeesgugeneertttsgaags 29  GetNextIMmageReSounCe  iaa 29  GetNextoutlineltem    us ankamen 29  GOOD Clinic a de E REN a T 29  O a a a RON 30  GetPageLabe 
46. enis ends 55  ENGOCING ished T seadivesdeveus gehen 55  A ee RN TA 55  FOREBBOX rar A AAA 56  COM ais 56  FONtFIEGT Y PE aaa 56  TtaliGAN le une  een ale 56  ET A O 56  MaxWidth ne RR E 57  MissingWidth a    2  2  mau 57          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 6 of 80  July 9  2015       Stemi  Stem eas sedi a AA AAA AAA AA ead dace a oh 57  TY PO a a tates Nine A AE A donde tele tala 57  Widtns ans a o o ae 57  XHelght nn  a ale Sates tea distaste cid den a da ea ae ews 57  4 8    GolorS pace  Interactiva A a na 58  BaSGColorS pace ss iii er ee ee 58  Colorada let 58  ComponentsPerPixel 22 2 Rain 58  SA Hl ei 58  ISCOlOR ee aia 58  IsIndexed      sa aa 58  ISMonochr  me    a a un AH eng 59  LookKUDe n A te tes art da a re ee ae an oa Sa asain RE atau an Er tees 59  Names ee A TE ET ae ER 59  4 9  TransformMatrix Interface    zensersenneneenennennenn non en nun nun nun nun nun nnnnnennen nennen nenne 59  Ay De Erde Te a a 59  Orientation ir AAA HR neh 60  Rotativas 60  XScaling   YScalilnd 2 222 ii Eee ee ne ee 60  XSKEW  YSKEW  anne nn nn ee 60  XTranslation  YTranslation oo    cece cece ee none nn nn nun nun nun nun nun nennen nennen nenn 60  4 10 Alternate Image Interface             us4urernnnennn nenne nenne nenne ne nenne nenne nennen nenn 61  DOUE PA a 61  NAO ii A A A A er 61  4 11 Annotation Interface    urnennennennennnn nennen nenn nn een nn nennen nennen nn nnennennen nennen nennen 61  Attached lio ER un 61  
47. erwise the glyph index is returned     the font is non symbolic  the standard encoding is used   e Incase font file is not embedded   The standard encoding is applied   Notes about the above algorithm       When the standard encoding is applied  all control characters   lt 31  are  mapped to character 32  blank        The glyph numbers G1  G2    G100 are often created by Ghost Script related  PDF Creators  In these cases the number in the glyph name corresponds to the  encoding of the used code page  E g  G65 is the character A in WinAnsi  encoding           PDF Tools AG   Premium PDF Technology    5 13    3 Heights    PDF Extract API Version 4 5 Page 79 of 80  July 9  2015       Text Extraction of Text Marked as Symbolic    Sometimes text is marked as symbolic  but it actually is not  In certain cases PDF  creators do this to prevent text extraction  Assuming a PDF contains a TrueType font  that is by mistake marked as symbolic  As a result the returned characters contain the  Unicode Private Range prefix OxFOOO to OxFOFF  In this case the prefix needs to be  removed again  This can be achieved by setting the property TranslateSymbolic to  true     Image Extraction       5 14    Image extraction samples in different programming languages are available online at  http   www pdf tools com pdf pdf extract content metadata text aspx     An image is placed on the output page in any position  orientation  and size as  specified by the current transformation matrix  property CTM of
48. ext property     elmage  An image object could be found and its interface can be retrieved  through the content   s Image property  The graphics state can be retrieved  through the content   s GraphicsState property     ePath  A path object could be found and its string representation can be  retrieved through the content   s Path property  The graphics state can be  retrieved through the content   s GraphicsState property     eSave  Save the current graphics state on the graphics state stack           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 38 of 80  July 9  2015       eRestore  Restore the graphics state by removing the most recently saved  state from the stack and making it the current graphics state     eBeginOCM  Start of a sequence of objects  whose visibility is defined by an  optional content membership string  property OCM   Sets the property OCM   OCM sequences can be nested     eEndOCM  Marks the end of an OCM sequence     GetNextPath  Method String GetNextPath       This method reads the content stream objects until a path object can be returned or  the end of the content stream is reached  If a path object could be found  a string  representation of a path object is returned  It can also be retrieved through the  content s Path property  The graphics state can be retrieved through the content s  GraphicsState property     e Return value   The next text path on this page if there is any   Nothing otherwise     GetNext
49. extracted fonts  eTECBreakTextState    eTECBreakGraphicsState    o Other  eTECBreakTextState   eTECBreakGraphicsState    eTECPosMergeSingleSpace    TranslateSymbolic   Property Boolean TranslateSymbolic  Accessors  Get   Set   Default  False    Replace symbolic character from the Unicode custom range  OxF000  0xFOFF  with  WinAnsi codes  Ox00  0xFF      Image Interface       Alternates  Property Variant Alternates  Accessors  Get    Return an array of alternate images  see Interface AlternateImage   An image can  have none  one or multiple alternate images     BitsPerComponent  Property Integer BitsPerComponent  AGESOSS OBS Sie    Return the number of bits that are used to represent a single color component of an  image sample  The number of color components per image data sample can be  retrieved through the image s color space interface           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 42 of 80  July 9  2015       ChangeOrientation    Method Boolean ChangeOrientation  TPDFOrientation Orientation     Set the orientation of the image  This value has to be set prior to using the method  Store    The orientation of the image can be retrieved from the property  GraphicsState ctm  Orientation     ColorSpace  Property IPDFColorSpace  ColorSpace  ACCESS Oms St    Return an interface to the color space of the image  see ColorSpace Interface      Compression    Property IPDFCompression Compression  Accessons  Get    Return the compressio
50. file couldn t be opened   The file couldn t be created     The authentication failed due to a wrong password     Undefined    Pages appear in columns  from bottom to top and right  to left relative to page orientation     Pages appear in columns  from bottom to top and left to  right relative to page orientation     Pages appear in columns  from top to bottom and left to  right relative to page orientation     Pages appear in columns  from top to bottom and right  to left relative to page orientation     Pages appear in rows  from right to left and bottom to  top relative to page orientation     Pages appear in rows  from left to right and bottom to  top relative to page orientation     Pages appear in rows  from left to right and top to  bottom relative to page orientation     Pages appear in rows  from right to left and top to  bottom relative to page orientation     TPDFTextExtractConfiguration    eTECBreakTextState    eTECBreakGraphicsState    eTECBreakSpaceUnicode       eTECPosMergeSingleSpace    Start new text object  if text state changes  font  font  size  horizontal scaling   Set this property  if text state is  important to you     Start new text object  if graphics state changes  color    Set this option  if the color is important to you     Start new text object  if extracted text contains a blank  Unicode  At      nbsp  etc    Do not set this option  if you  need the RawString property    Example  If set  the text    Hello World    will be extracted  as    Hello 
51. file of both  the evaluation and the release  version of the 3 Heights    PDF Extract Tool API     Samples are also available at the website of PDF Tools for the 3 Heights    PDF Extract  Tool  Please find the latest samples online at  http   www  pdf   tools com asp products asp name EXPA    Note  Code samples in this manual are not constantly updated and might not be 100   compatible with the latest version of the Extract API     Text Extraction       For text extraction a page number must be set  Using the method GetNextText returns  the text tokens in Z order  This means the text token which is on top  i e  is rendered  last when the document is displayed  is retrieved last  Some PDF creators save the  text in the order from the upper left to the lower right corner  As a result  extracting  such documents  yields in a readable text sequence  This however is not true for all  creators  It is as well possible to save every single character separately and in random  order  Extracting text in such a document results in a random and therefore  unreadable sequence of text tokens  The text tokens will first need to be sorted by  coordinate in order too make it readable     Undesired Missing Blanks    Using the property TextExtConfiguration the text extraction algorithm can be  configured  It is best to start with one of the settings recommended for your use case     Sometimes this can lead to undesired blanks within what visually looks as one word   For example if       Text is 
52. from all layers  the IgnoreOCM property can be to true        For more background information including a sample see the section Optional Content     Layers            PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 66 of 80  July 9  2015       Label  Property Boolean Label  Accessors  Get    Flag that indicates  whether this is an OCG or a label  Labels are used to label groups  of OCGs in the hierarchy  Setting their visibility has ho effect     Level  Property Long Level  Accessors  Get    In user interfaces OCGs can be shown in a tree  The property level indicates the  hierarchy level of the OCG in that tree  OCG with Level O is a top level OCG  Level  1  means  that the OCG is not part of the hierarchy  it should not be presented to the  user  Parent elements in the OCG hierarchy can be labels or OCGs  If the level of a  label b is higher than its predecessor a  b is the parent element of the following objects  of the same level as b  If the level of an OCG b is higher than its predecessor ocg a  a  is the parent of the following objects of the same level as b  Note that the hierarchy  reflects actual nesting of OCGs in the content  Setting the visibility of an OCG to true  only has an effect  if the visibilities of all its parents are set to true     Name  Property String Name  Accessors  Get    Return the name of the OCG     Visible  Property Boolean Visible  Accessors  Get  Set    Get or set if the OCG is visible  This property c
53. g path  W          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 40 of 80  July 9  2015       The exact details to the path construction operators can be found in Adobe s PDF  Reference Manual  The path object is updated each time the method GetNextPath or  GetNextObject is called  This property cannot be set     Reset  Method Void Reset  Boolean AccountForRotate     This method allows to reset the content extraction process and set the point of  extraction to the beginning of the content stream     e Parameters     AccountForRotate  Optional  default false   This property defines origin and  orientation of the coordinate system of the coordinates of extracted content  elements  The unit of the coordinate system is 1 72 inch     eFalse  The coordinates are extracted as raw coordinates as used in the  PDF document     eTrue  Extracted coordinates are relative to the bottom left corner of the  visible page as displayed by a viewer  I e  the page is rotated by the  page s Rotate attribute and cropped using a bounding box  For  example  the coordinate  0  0  denotes the bottom left corner of the  page   The default bounding box used is the CropBox  This can be changed  by setting the BoundingBox property before calling the Reset  method     SpaceFactor  Property Single SpaceFactor  Accessors  Get  Set    This property can be used to get or set the distance between two characters that is  required to insert a blank for text extraction  The 
54. ger MajorVersion  Accessoms Get    Return the major version of the document   Ex  PDF Version 1 5 corresponds to Adobe  Acrobat 6  the major version is 1  the minor is 5     MinorVersion    Property Integer MinorVersion  Accessors  Get    Return the minor version of the document     ModDate  Property Date ModDate    Accessors  Get    Return the modification date of the info object of the document     OcgCount  Property Long OcgCount  ACCOSS OS Gee  Get the number of optional content groups  also known as    layers     of the document   e Return value   The number of optional content groups in this document    Open  Method Boolean Open String FileName  String Password     This method opens a PDF random access disk file  i e  makes the objects contained in  the PDF document accessible  If the document is already open it is closed first     e Parameters     FileName  The file name and optionally the file path  drive or server string  according to the operating systems file name specification rules     Password  optional   the user or the owner password of the encrypted PDF  document  If this parameter is left out an empty string is used as a default     e Return value   True  The was opened successfully     False  The file does not exists  it is corrupt  or the password is invalid           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 33 of 80  July 9  2015       OpenMem  Method Boolean OpenMem  Variant MemBlock  String Password     This
55. hod  does nothing     Compliance  Property TPDFCompliance Compliance    Get the claimed compliance of the document  For instance  this property can be used  in order to detect if the document claims to be PDF A     CreationDate  Property Date CreationDate    Accessors  Get    Return the creation date of the document   s info object     Creator  Property String Creator  Accessors  Get    Return the name of the creator of the document   s info object     GetCurrentOutlineLevel  Method Long GetCurrentOutlineLevel       Return the level of the current outline  bookmark            PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 27 of 80  July 9  2015       e Return value    The level of the current outline  O is equal to root level     GetDestination  Method PDFDestination GetDestination  String Destination   Return an interface to the destination specified in the parameter   e Parameters   Destination  The named destination  e Return value   An interface to the specified destination if it exists  Nothing otherwise    GetFirstColorSpaceResource  Method PDFColorSpace GetFirstColorSpaceResource     Return an interface to the first color space resource  see ColorSpace Interface    e Return value   An interface to the first color space resource if there is any    Nothing otherwise    GetFirstEmbeddedFile  Method PDFEmbeddedFile GetFirstEmbeddedFile       Return an interface to the first embedded file  see EmbeddedFile Interface   Embedded  files 
56. idth     MiterLimit  Property Single MiterLimit  Accessors  Get    Return the miter limit  The miter limit imposes a maximum on the ratio of the miter  length to the line width  which can be fairly large when two line segments meet at a  sharp angle  When the limit is exceeded  the join is converted from a miter to a bevel           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 52 of 80  July 9  2015       OverprintMode  Property Integer OverprintMode    Return the overprint mode     RenderingIntent  Property String RenderingIntent    Return the name of the rendering intent     SmoothnessTolerance  Property Single SmoothnessTolerance  Accessors  Get    Return the smoothness tolerance  The values are in the range  0 0  1 0  where 1 0  corresponds to 100      SoftMask  Property IPDFImage  SoftMask    Accessors  Get    Return the soft mask as image     StrokeAdjustment  Property Boolean StrokeAdjustment    Accessors  Get    Return the flag for the automatic stroke adjustment     SpaceWidth  Property Float SpaceWidth    Accessors  Get    Get the width of the space character in text space  To get page user units transform  using the text s matrix  The SpaceWidth property can be used to implement your own  word breaking algorithm  For more information about this  read the descriptions of the  properties BreakWords and SpaceFactor     StrokeAlphaConstant  Property Single StrokeAlphaConstant    Accessors  Get    Return the current alpha stroke
57. idth of the glyphs in the font           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 55 of 80  July 9  2015       BaseName  Property String BaseName  Accessors  Get    Return the font name     CapHeight  Property Single CapHeight  Accessors  Get    Return the height of the top of flat capital letters  measured from the baseline     Charset  Property String Charset  Accessors  Get    Return a string listing the character names defined in a font subset  This property is  only useful for Typel fonts     Descent  Property Single Descent  Accessors  Get    Return the Descent value  This negative number represents the maximum depth below  the baseline reached by the glyphs in the font     Encoding  Property Variant Encoding    Accessors  Get    Return the glyph name of each character     Flags  Property Long Flags    NOCSSSOLSS Get    Return the flags of the font  The flags are listed the following table  Bit positions within  the flag word are numbered from 1  low order  to 32  high order      Bit Position Name Meaning   1 FixedPitch All glyphs have the same width    2 Serif Glyphs have serifs    3 Symbolic The font contains characters outside the standard    Latin character set           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 56 of 80  July 9  2015       4 Script Glyphs resemble cursive handwriting    6 NonSymbolic Font uses standard Latin character set or a subset of  it    7 Italic Glyphs
58. in Icc     Opt  Opt  Opt  Opt   doc   pdf Doc  Doc  Doc  Doc   doc PDFParser idl Doc   doc javadoc     Doc    include expa_c h Req   include     Opt   jar EXPA jar Req    liblPDFParser lib Req   samples     Doc  Doc  Doc  Doc           The purpose of the most important distributed files of is described in Table  File    Description     Name    bin PDFParser dll  bin pdcjk d11    binX NET dl1    Table  File Description  Description  This is the DLL that contains the main functionality     This DLL contains support for Asian languages  It is loaded  from the module path     The  NET assemblies are required when using the  NET  interface  The files bin  NET xml contain the corresponding  XML documentation for MS Studio           PDF Tools AG   Premium PDF Technology          1 6    3 Heights    PDF Extract API Version 4 5    July 9  2015    Page 14 of 80       bin Icc        doc      include      jar EXPA  jar    lib PDFParser 1lib    samples        The two color profiles  USWebCoatedSWOP icc  and  sRGB  Color Space Profile icm  are required to transform RGB to  CMYK values and vice versa when extracting colors  The color  profiles must not be renamed  or they will not be found     Compatibility Note  In versions prior to 2 1 7  the color profiles  has different names   CMYK icc  and  sRGB icm   These old  names are no longer supported     Various documentation   Contains files to include in your C   C   project   The Java wrapper     The Object File Library needs to be linked t
59. ing  ibPDFPARSER dylib to the DYLD_LIBRARY_PATH     For Java          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 18 of 80  July 9  2015       e Rename the file  ibPDFPARSER dylib to libPDFPARSER jnilib or create a  file link for this purpose by using the following command     ln libPDFPARSER dylib libPDFPARSER jnilib  e Add the jar EXPA jar file to the CLASSPATH     1 10 Samples       Samples for various programming languages are included in the Windows kits  They  can also be downloaded at the PDF Tools AG web site     http   www  pdf tools com asp products asp name EXPA          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 19 of 80  July 9  2015       License Management       2 1    There are three possibilities to pass the license key to the application     1  The license key is installed using the GUI tool  Graphical user interface   This is  the easiest way if the licenses are managed manually  It is only available on  Windows     2  The license key is installed using the shell tool  This is the preferred solution for  all non Windows systems and for automated license management     3  The license key is passed to the application at runtime via the    LicenseKey     property  This is the preferred solution for OEM scenarios     Graphical License Manager Tool       The GUI tool LicenseManager exe is located in the bin directory of the product kit     amp  PDF Tools License Mal    File Ed
60. interpretation  make it possible to restore text that is lacking essential information  The tool can also  collect significant data such as position  color space and size when extracting images  such as TIFF or JPEG  Querying document attributes such as PDF version  creator   author  title  subject and creation date is also possible  The tool also supports reading  encrypted PDF files           PDF Tools AG   Premium PDF Technology    1 3    3 Heights    PDF Extract API Version 4 5 Page 10 of 80  July 9  2015       Features    Extract text contained on a PDF page  line wise and word wise   Retrieve text attributes such as position and font   Extract graphics objects  paths    Extract images   Retrieve PDF image attributes such as format  position and transparency masks  Retrieve PDF document attributes such as page count  version number  and title  Retrieve PDF page attributes such as the Crop Box and page rotation   Retrieve detailed font information from PDF text   Retrieve detailed graphics state information   Retrieve detailed color space information    Specify a password to decrypt PDF files    Formats    Input Formats     PDF 1 x  e g  PDF 1 4  PDF 1 5     Compliance    Standards  ISO 32000 1  PDF 1 7     Interfaces       The following interfaces are available     C   Java   NET  COM    1 4 Operating Systems       Windows XP  Vista  7  8  8 1   32 and 64 bit   Windows Server 2003  2008  2008 R2  2012  2012 R2   32 and 64 bit   Has 11 and later PA RISC2 0 32 bit or HP UX
61. isi saat a a ii E EE iari 38  CAES A A E A T 38  GraphicsState ui ia a cele 38  PQMOFEOECM A vata hend de cone cadetatend ea naeh en ern ee eek nee 38  A OOO TR 39  DEM A A A AAA AA 39  PM A A A A ss 39  A wet tea ei ieee eet i atthe tere EN neh ae ved 40  SA A ieee EERFEUFSSLFERTEUELFEEUSERPFRETELEPEREUESETRREURSEEERRPERELERUUERE 40  MEX seis E derdavedbevenwed degdascgecd aed decaaeng betas nud Sedan ea terevend decd ane    40  TEXtEXtConfiguration maria ia iii 40  TranslateSymbolie    u    en draeid iadi ai 41  4 4  Image  Interface  an an A cee nk IRRE he nen Are 41  OA 41  BIESPerCOMpPON Eric o a A iii 41  Ch  ngeOtient  tion   ans sa a ee 42  Color Pate ae vee Pee eee 42  COMPESSION ii ot 42  COnVErtLORGB ta 42  GeUmagE A nen 42  GetResolU ti inci rra e ed la ai al 42  A ea 43  ISBitona lsir ARMANI 43  ISC Oscar a a aa EE cay a AA nen een 43  ISMonoCHrOme A a ias 43  ODINU MD  se ek Hehe 43  ISMOMOCHOM Gs ai tens ee ran ne nern tied a iNi AE Eagt 43  SAM  Sii NAO Eee 44  SMSKSA ARAS 44  A E A E ese goede TE T EE TEA retest 44  SM O iia 44  Wide 45  4 5  Text Interface da ds sad il 45  Bounding BOX vicodin re Rs a tel ia 45  O Sh wich ei 45  Length deceit ce en vies dae li O een 46  RAWSEFING  22 ccdisecsca a a 46  PROC UI OM secre re ke ee E 46  STEIN LONA A hate Gade ede Atel Gaetan ited Gated ofa 46  UnicodeString tit eaii aai ate ER a id 46  WI dara A A AAA AER 47          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 5 of 80 
62. it Help      x G5      Add Key Delete Refresh List                                           All Users   Curent User License Properties  3 Heights TM  Document Converter    Name Value  T  f 0 10A0M  ER HO O IA Key 0 J0A04  EEE Lu  El Y 0 BCASN WEN HOT LALA FOREN Product 3 Heights TM  Image  3 Heights TM  Image to PDF Converter API J Intended Use Productive   7  Y 0 J0A94  en u Y Platform Windows  3 Heights TM  Image to PDF Converter Service Y Volume Page s  q   T  Y 0 1CAD  ATCA PARK ABUTS EE Expiration Does not expire      M tet oa a  L tata 1  J E      4 Maintainance Expiration 2033 12 31                         2 Unsiabic  TRAY mann tan NNC ansarar Chall    List all installed license keys    The license manager always shows a list of all installed license keys in the left pane of  the window  This includes licenses of other PDF Tools products     The user can choose between   e Licenses available for all users  Administrator rights are needed for modifications   e Licenses available for the current user only     Add and delete license keys    License keys can be added or deleted with the    Add Key    and    Delete    buttons in the  toolbar     e The    Add key    button installs the license key into the currently selected list   e The    Delete    button deletes the currently selected license keys           PDF Tools AG   Premium PDF Technology    2 2    3 Heights    PDF Extract API Version 4 5 Page 20 of 80  July 9  2015       Display the properties of a license    If a
63. java    available that shows how to use this interface     Begin  GetNext  End   applies to Dictionaries   Property Long Begin  Property Long End   Method Long GetNext  Long i     Iterator  Property Begin  method GetNext  and property End can be used to traverse a  dictionary object  GetKey and GetValue return the key and value of an element     C  Example        for  int i   dict Begin  i    dict End  i   dict GetNext  i                   PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 68 of 80  July 9  2015            do something         BooleanValue  Property Boolean BooleanValue    Accessors  Get    Return the Boolean value of a Boolean object    Dispose  DestroyObject   NET API     All objects retrieved from the API are destroyed when the document is closed   However  it is recommended to use Dispose as soon as possible in order to save  memory     Java and C C   API     The TPdfExpaPDFObject objects must always be deleted using  ExpaPDFObjectDestroyObject     GetElement    applies to Arrays  Method PDFObject  GetElement  Long i     Return the element at the index     GetEntry  applies to Dictionaries  Method PDFObject  GetEntry  String Name     Return the entry of the dictionary     GetStream   applies to Indirect Objects   Method PDFObject  GetStream  String FileName   property Variant StreamMem       Return the indirect object s stream  if present  If the object is an image  the  compressed stream is returned  otherwise the stre
64. les   The unit of pixels can be  converted to a distance unit such as inch  millimeter etc  using a resolution value  i e   72 dpi  dots per inch      4 5 Text Interface       BoundingBox  Property Variant BoundingBox    Accessors  Get    Return the smallest rectangle that encloses the text as shown below        1 Text Bounding Box  Height    The text bounding box is a rectangle which encloses the four points Q1  Q2  Q3  Q4   The points Q1 and Q2 are 1 3 of the height below the baseline     The text bounding box is defined by four values which represent the coordinate of the  lower left and the upper right corner     FontSize  Property Single FontSize    Accessors  Get    Return the size of the font in points  The size can also be interpreted as the height of  the text           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 46 of 80  July 9  2015       Length  Deprecated  use StringLength instead     RawString  Property Variant RawString    Accessors  Get    For simple fonts this property returns the raw character codes from the PDF as a byte  array  For CID fonts this property is NULL  If the ExpandLigatures property is not set   the length of the RawString is the same as the length of the UnicodeString and the  character position vector applies to the RawString character codes as well     The property UnicodeString always returns a string of Unicodes  These Unicodes are  the result of the mapping of character codes to Unicodes define
65. llOverprintFlag  FlatnessTolerance  OverprintMode   RenderingIntent  SmoothnessTolerance  StrokeAdjustment   StrokeAlphaConstant  StrokeOverprintFlag    Font Interface Changed  Type of Flags from Long to int   Image Interface New  Properties IsBitonal  IsMonochrome  IsColor  Page Interface New  Property DeviceColorant   Text Interface New  Property TextMatrix    Changes from 1 6 to 1 7       5 5    5 6    This is a list of interface changes from version 1 6  1 6 0 41  to version 1 7  1 7 4 1    Annotation Interface New  Property IsMarkup  Document Interface New  Method GetPageLabel    Changes from 1 7 to 1 8    This is list of interface changes from version 1 7  1 7 4 1  to version 1 8  1 8 35 1    Image Interface New  Property SMask    Changes from 1 8 to 1 9       This is list of interface changes from version 1 8  1 8 35 1  to version 1 9  1 9 24 1    Document Interface Deprecated  Property ErrorCode  New  Property LastError  Content Interface New  Property ConvertPathToImage  Colorspace Interface Deprecated  Property Colorant  New  Property ColorantName  Deprecated  Property High  New  Property HighIndex  Text Interface Deprecated  Property Length  New  Property StringLength          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 76 of 80  July 9  2015                5 7 Changes from 1 9 to 1 91  This is list of interface changes from version 1 9  1 9 24 1  to version 1 91   1 91 28 0    Content Interface New  Properties PathImageAn
66. luisa a Reken 30  GetXMPMetadata c  0ccccoccccconncncnnncncannna nana na nana nn nun nun nun nun nn nn nnnn nun nn nun nen 30  GetXMPMetadataMem      s2susnsnsnnnnnnnnnnnnnnnnnnn nn nun nn nun nn nn rana nara n nun nnnnnn nen 30  ISEoll amp ction  ana  ana ida dada 31  IsEncrypted e aeaa a he erh eher Eher eher 31  IsLinearizedi AAA ee ina nn a aia 31  Keywords  A A A A ine 31  LASER a ne aah ae een 31  LastErrorMessage  iii as 31  Maj  rVersion     u en A ees gate deere 32  MINOFVERSION  ae a a as 32  kaloo   DE i a E A a dos 32  DEGEOUNE   a a A ee 32  OPEN ee Ea te ti ae ee en te Eh ae ah ger eek 32  OpenMem Ip 33  Page  nassen er rn Lad ER RT 33  PageClountz  ern AE aa a 33  Par NO can 33  POUR  omar ee ee a re I Then rege 33  SUDJOCU a ea a ee 34  Title REDE N END se 34  4 2     Page Interface  sense Nas aan na iiglienie 34  ArtBOX u  a nn ana end 34  BIGCU BOX was  Ar ee ent 34  EONteNnt   e a 34  EF  PBOX O A 34  DeviceColokanti  2 2 nee AO ee OS 35  DOGUMEN Ei odas 35          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 4 of 80  July 9  2015       GEE IFSEAMMOLAE OM hectare etre E A cachet a te 35  GetNextAnnotation viii A A a taa 35  A et onde cee dae RED 35  A A 36  O Eee na o a RT 36  4 3      Content  Interface    lla Dd 36  Break WOrdS    apando 36  BOU IA iaa 36  Expandligat  res  ii He kn 37  A aan E a RE a he nn Ban EEE A een 37  GetNextImage      u ae He EEE 37  GEtNEXtObject  n AA a a a aa aai 37  G  tNextPath i ei 
67. m a regular  font to italic     XTranslation  YTranslation  Property Single XTranslation  Property Single Ytranslation    Accessors  Get          PDF Tools AG   Premium PDF Technology    4 10    3 Heights    PDF Extract API Version 4 5 Page 61 of 80  July 9  2015       Return the X and Y translation  These are the same values as returned by the  properties e and f     Alternate Image Interface       4 11    DefaultForPrinting  Property Boolean DefaultForPrinting  Accessors  Get    Return true if the alternate image is set as default for printing     Image  Property IPDFImage  Image  ACCESS Ors EST    Return an interface to the alternate image  see Image Interface      Annotation Interface    AttachedFile  Property IPDFEmbeddedFile AttachedFile    Accessors  Get    Return the embedded file attached to this annotation  This property is meaningful for  FileAttachment annotations only  Note that the AttachedFile might not have an  embedded file stream  but reference an external file via the FileName property only     Color  Property Long Color    Accessors  Get    Return to color of the annotation     Contents  Property String Contents  Accessors  Get    Return the content of the annotation           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 62 of 80  July 9  2015       Date   Property Date Date   ASE SSS OMS C CO   Return the date of the annotation    The used format is   dd mm yyyy hh mm ss     Dest  Property IPDFDestination  Dest   
68. n used for the image in the pdf     ConvertToRGB  Method Boolean ConvertToRGB       Convert the image to an RGB image  The conversion uses the image   s color space to  interpret the sample data  Calibrated color spaces are converted to RGB values  according to the sRGB color standard  Device color space are converted using pre   defined color profiles     e Return value   True if the conversion was successful   False otherwise     GetImage  Method Variant GetImage       Return the image from memory which was previously saves using the method  StoreInMemory     e Return value     The image as a 1 dimensional byte array     GetResolution  Method Single GetResolution  IPDFTransformMatrix  Matrix     Return the resolution of an image on the page in dpi  dots per inch      e Parameters           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 43 of 80       July 9  2015  Matrix  The transformation matrix of the image  This parameter is required    since the image itself has no resolution  The resolution is the ratio between  the size of the image and the size it uses on the page     e Return values   The calculated resolution in dpi     Height  Property Long Height    Accessors  Get  Return the height of the image in pixels  also called samples   The unit of pixels can be    converted to a distance unit such as inch  millimeter etc  using a resolution value     e    72 dpi  dots per inch      IsBitonal  Property Boolean IsBitonal    AGESSS O
69. ns of the FillColorSpace   The CMYK quads are encoded using the following formula  Quad      C   256    M     256   Y    256   K     If a color doesn   t exist  e g  with an uncolored pattern  then  1 is now returned           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 49 of 80  July 9  2015       Hexadecimal     Quad   OxCCMMYYKK  where CC is the byte for the cyan value in the range from 0x00  to OxFF  MM is magenta  YY is yellow  KK is key  black      Decimal     To retrieve the values for cyan  magenta  yellow and key apply the following formulas   VB code taking into account negative values  using integer division   and bitwise and    And    Quad   PDFPARSERLib GraphicsState FillColorCMYK    Quad And  amp H7FFFFFFF   t   16777216    t   65536  And 255    t   256  And 255   t And 255   If Quad  lt  0 Then C   C Or   H80    AK S amp S AQ ec  ll    There are also other ways to retrieve these values than using the above formulas     FillColorRGB  Property Long FillColorRGB  Accessors  Get    Return the RGB color triple for filling operations  The color value is obtained by  converting the color values of the property FillColor by means of the FillColorSpace   The RGB triples are encoded using the following formula  Triple     B   256    G     256   R     If a color does not exist  e  g  with an uncolored pattern  then  1 is now returned   Hexadecimal     Triple   OXBBGGRR  where BB is the byte for the blue value in the range from 0x
70. o the C C    project     Contains sample programs in different programming       languages        Color Profiles    The 3 Heights    PDF Extract API uses color profiles to convert sRGB to CMYK colors  and vice versa  If no color profiles are available  the conversion is done algorithmically     In order to convert using color profiles there are two files required  Icc CMYK icc and  Icc sRGB icm where the directory Icc  must be a direct sub directory of where  PdfParser  dll resides     Color profiles can be downloaded from the links provided in the directory Icc    Download at least one CMYK color profile and sRGB profile or use copy them from your  local systems   Most systems have pre installed color profiles available at   systemroot  system32 spool drivers color    Rename them to sRGB icm and  CMYK icc     Deployment   Runtime Kit          Distributed Files    The runtime kit  RTK  contains all files that are used for deploying the software  This is  a subset of the files contained in the SDK  Which files are required  Req    optional   Opt   or not used  empty field  for the four different interfaces is shown in the table  below     Table  Files for Deployment    Name    NET JNI       bin PDFParser dil Req  Req  Req  Req   bin pdcjk dil Opt  Opt  Opt  Opt   bin  NET d11 Req    bin Icc     Opt  Opt  Opt  Opt           PDF Tools AG   Premium PDF Technology          3 Heights    PDF Extract API Version 4 5 Page 15 of 80  July 9  2015       jar EXPA jar Req        Deploying 
71. of both the document   s collection  PDF Portfolio  and of FileAttachment  annotations are returned     e Return value   An interface to the first embedded file if there is any  Nothing otherwise    GetFirstFontResource  Method PDFFont GetFirstFontResource     Return an interface to the first font resource  see Font Interface    e Return value   An interface to the first font resource if there is any  Nothing otherwise          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 28 of 80  July 9  2015       GetFirstImageResource  Method PDFImage GetFirstImageResource     Return an interface to the first image resource  see Image Interface    e Return value   An interface to the first image resource if there is any  Nothing otherwise    GetFirstOutlineItem  Method PDFOutlineItem GetFirstOutlineItem    Return an interface to the first outline item  see Outline Interface    e Return value   An interface to the first outline item if there is any  Nothing otherwise    GetInfoEntry  Method String GetInfoEntry  String szKey   Return the value of a custom entry in the info object   e Parameters   szKey  The string defining the info object  such as    Author    or    Subject      e Return value   The string corresponding to the info object if it exists  Nothing otherwise    GetNextColorSpaceResource  Method PDFColorSpace GetNextColorSpaceResource     Return an interface to the next color space resource   e Return value   An interface to the next colo
72. om the left side page border  f is the distance on the y axis from the  bottom   0 0  is in the lower left corner  on an page with a size of A4 portrait    595 842  is in the upper right corner           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 60 of 80  July 9  2015       The scale factor in a matrix  a 0 O d O O  can be obtained from the values a and d for x  and y scaling respectively  With respect to fonts  d represents the font size of  horizontal text     A rotation of the axis by an angle a counter clockwise is produced by a matrix  cos a  sin a  sin a cos a O 0      More detailed information can be found in the PDF Reference manual chapter 4 2 2     Orientation  Property TPDFOrientation Orientation  Accessors  Get    Return the orientation rounded to the next 90 degrees  The orientation is an  enumeration with eight different values  rotation times flipping   See enumeration  TPDFOrientation     Rotation  Property Single Rotation  Accessors  Get    Return the rotation angle of the matrix counter clockwise  This is equal to the  minimum of XSkew and  YSkew     XScaling  YScaling  Property Single XScaling  Property Single YScaling    Accessors  Get    Return the x and y scaling factor     XSkew  YSkew  Property Single XSkew  Property Single Yskew  ACCESS Oms COT    Return the x and y axis skewing  The transformation matrix  1 tan a tan B1 O 0  skews  the x axis by a and the y axis by     Skewing sometimes is used to transfor
73. one during installation  e g  un   register using regsvr32  u  delete all files  etc     Note that an expired evaluation DLL cannot be unregistered  If you would like to un   register an expired evaluation DLL  download a new  non expired  evaluation version   overwrite the old version and un register it     Installing a new version does not require to previously uninstall the old version  The  files of the old version can directly be overwritten with the new version  If using the  COM interface  the new DLL must be registered  un registering the old version is not  required     Unix       Unpack the archive in an installation directory  i e   User lib pdf tools     e bin libPDFPARSER so  This is the library that contains the main functionality   required     e doc  Contains documentation files  e include   Contains files to include in your C   C   project  e jar EXPA jar  Contains the Java wrapper    Installation on Unix Systems  1  Unpack the archive in an installation directory  e g   usr pdftools com   2  Copy or link the shared object into one of the standard library directories  e g     ln  s  usr pdftools com bin libPDFPARSER so  usr lib    3  In case you have not yet installed the GNU shared libraries  get a copy of these  from http   www pdf tools com  extract the shared images and copy or link  them into  usr lib or  usr local lib     Installation on Mac OS X  1  Unpack the archive in an installation directory     e   User lib pdf tools  2  Add the directory contain
74. ontrols the extraction of content objects   The default value is the one configured in the PDF document     Note that though invisible paths generate no marks on the page  they still have an  effect on the graphics state  For example their effect on the current drawing position  and the clipping region does not change  Therefore  all paths are  active  and  extracted regardless of their visibility  Invisible paths just use the end path operator   n   instead of a filling or stroking operator           PDF Tools AG   Premium PDF Technology    4 15    3 Heights    PDF Extract API Version 4 5 Page 67 of 80  July 9  2015       Example 1   id  OCGs  Level  Hierarchy  0  OCG A  O   OCGA   1  OCG B  0   OCG B   2  OCG B1  1    OCG Bi  3  OCG B2  1    OCG B2  4  OCG C   1 hidden  OCG C  Example 2   id  OCGs Labels  Level Hierarchy  0  OCG A  O   OCGA   1  Label B  1   Label B  2  OCG B1  1    OCG Bi  3  OCG B2  1    OCG B2  4  Label C  1   Label C  5  OCG C1  1    OCG C1  6  OCG D  0   OCG D    PDFObject Interface    This interface represents a basic PDF object  More information on these types of  objects can be found in chapter 3 2 of the PDF Reference  The PDFObject interface  represents an object  which can be one of eight types  Depending on its type  different  methods and properties should be used     Note  If PDF objects are traversed recursively  it must be ensured the program does  not end up in an endless loop for cyclical structures     There is a Java sample    PdfObjExt 
75. perty PDFImage Image  Accessors  Get    Return an interface to the last read image object  see Image Interface   The image  object is updated each time the method GetNextImage or GetNextObject is called     OCM  Property String OCM  Accessors  Get    Return the current optional content membership string which defines the visibility as  Boolean function of OCG in C syntax  OCGs are represented by Ids  Retrieve the  respective OCG using the Document interface s GetOcg method     supported operators   88                Example   1 88 2  means  that the following objects are visible only  if OCG 1 and  OCG 2 are visible    Note  This property is valid only immediately after extraction of BeginOCM object     Path  Property String Path  Accessors  Get    Return the last read path object in its string form  The path object describes a graphic  drawing consisting of stroked lines and curves as well as filled shapes  The string  contains the PDF path construction tokens consisting of real value operands  in angle  brackets  followed by operator mnemonics     e Move current point to   lt x gt   lt y gt  m   e Line from current point to   lt x gt   lt y gt      e Rectangle   lt x gt   lt y gt   lt w gt   lt h gt  re   e Cubic Bezier curve from current point to   lt x1 gt   lt y1 gt   lt x2 gt   lt y2 gt   lt x3 gt   lt y3 gt  c  e Close figure  move to start of last sub path   h   e Fill path  f   e Stroke path  s   e End path  without filling and stroking   n   e Modify current clippin
76. r space resource if there is any  Nothing otherwise    GetNextEmbeddedFile  Method PDFEmbeddedFile GetNextEmbeddedFile     Return an interface to the next embedded file   e Return value   An interface to the next embedded file if there is any  Nothing otherwise          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 29 of 80  July 9  2015       GetNextFontResource  Method PDFFont GetNextFontResource     Return an interface to the next font resource   e Return value   An interface to the next font resource if there is any    Nothing otherwise    GetNextImageResource  Method PDFImage GetNextImageResource     Return an interface to the next image resource   e Return value   An interface to the next image resource if there is any  Nothing otherwise    GetNextOutlineItem    Method PDFOutlineItem GetNextOutlineItem Long MaxLevel  Boolean  ReturnOpenOnly     Return an interface to the next outline item   e Parameters     MaxLevel  optional  default 20   The maximum level of the depth of the  outlines     ReturnOpenOnly  optional  default false   Return only outlines which are  opened     e Return value   An interface to the next outline item if there is any  Nothing otherwise    GetObject  Method PDFObject GetObject  String Path     This method returns a PDF object specified by the path string  The path consists of a  prefix and operators     Prefix     e       Trailer dictionary  see chapter 3 4 4 of the PDF Reference   valid entries  are    
77. the Application  The deployment of an application works as described below   1  Identify the required files from your developed application  Identify all files from the RTK that are required by your developed application    Include all these files into an installation routine such as an MSI file or simple  batch script    4  Perform any interface specific actions  e g  registering when using the COM  interface     Example    This is a very simple example of how a COM application written in Visual Basic 6 could  be deployed     1  The developed and compiled application consists of the file TextExt exe    2  The application uses the COM interface and is distributed on Windows XP only   e The main DLL PDFParser dil must be distributed   e Asian text should be supported  thus pdcjk dil is distributed     3  All file are copied to the target location using a batch script  This script contains  the following commands     COPY TextExt exe  targetlocation     COPY PDFParser dll  targetlocation       COPY pdcjk dll  targetlocation       4  For COM  the main DLL needs to be registered in silent mode   s  on the target  system  This step requires PowerUser privileges and is added to the batch  script     REGSVR32  s  targetlocation  PDFParser dl1l    1 7 Interface specific Installation Steps       COM Interface    Registration  Before you can use the 3 Heights    PDF Extract API component in your  COM application program you have to register the component using the regsvr32 exe  program
78. tiAlias  PathImageBGColor   PathImageResolution  ConvertPathToImage  5 8 Changes from 1 91 to 2 0  There are no interface changes from version 1 91 final to 2 0 final   5 9 Changes from 2 0 to 2 1  The color profiles to transform RGB to CMYK values and vice versa when extracting  colors in the directory bin icc have been renamed from  CMYK icc  and  sRGB icm  to   USWebCoatedSWOP icc  and  sRGB Color Space Profile icm  to reflect their real  names  The abbreviated version are no longer supported   Document Interface New  Methods OcgCount  GetOcg  New  Property LastErrorMessage  GetFirstEmbeddedFile   GetNextEmbeddedFile  New  Interface Ocg New  Properties Label  Level  Name  Visible  Content Interface New  Properties OCG  IgnoreOCG  TPDFContentObject New  Enumerations eBeginOCM  eEndOCM  Enum  New  Interface New  Methods GetElement  GetEntry  GetNext  GetStream  PDFObject New  Properties BooleanValue  IntegerValue  RealValue   StringValue  Name  Size  Begin  End  ObjectNumber  Type  New  Interface New  Methods Store  StoreInMemory  EmbeddedFile New  Properties CheckSum  CreationDate  FileName   ModDate  5 10 Changes from 4 3 to 4 4       Content Interface Removed  Properties PathImageBGColor   PathImageAntiAlias  PathImageResolution   ConvertPathToImage          PDF Tools AG   Premium PDF Technology    5 11    3 Heights    PDF Extract API Version 4 5 Page 77 of 80  July 9  2015       Samples  amp  Background Information       5 12    There are various code samples in the ZIP 
79. tions Store   and StoreInMemory   return false   the FileName property references  an external file     ModDate  Property String ModDate    Accessors  Get    Get the modification date     Store  Method Boolean Store  String Path   Store the embedded file to disk   e Parameters   Path  The file name and path  where the document shall be stored  e Return Values   True  if the operation competed successfully   False otherwise    StoreInMemory   Method Variant StoreInMemory      Store the embedded file in memory   e Return Values     The embedded file as a byte array           PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 71 of 80    July 9  2015       4 17 Enumerations       Note  Depending on the interface  enumerations may have  TPDF  as prefix  COM  C   or  PDF  as prefix   NET  or no prefix at all  Java      TPDFCompression  eComprRaw  eComprJPEG  eComprFlate  eComprLZW  eComprGroup3  eComprGroup3_2D  eComprGroup4  eComprJBIG2  eComprJPEG2000  eComprUnknown    eComprDefault    No compression   Joint Photographic Expert Group  Flate compression  Lempel Ziv Welch   CCITT Fax Group 3   CCITT Fax Group 3 2D   CCITT Fax Group 4   Joint Bi level Image Experts Group  JPEG2000   Unknown compression    Apply a default compression which suites the color space  of the image    Note that not all image formats color depths support all compression types     TPDFContentObject    See also function Content GetNextObject     eBeginOCM    eEndOCM  eNone
80. trings as a single precision real number in text    units  It doesnt include any scaling factors from coordinate transforms such as from  the current transform matrix or the text matrix  In order to obtain the font size in page  units the values of the current text matrix have to be examined     HorizontalScaling  Property Single HorizontalScaling    Accessors  Get          PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 51 of 80  July 9  2015       Return the current horizontal scaling factor that describes the amount of horizontal  stretching of a text string  A value of greater than 1 0 stretches the string whereas a  value of less than 1 0 lets the string appear as condensed     Leading  Property Single Leading  Neeessorse SE    Return the current leading  line spacing  of a text string as a single precision number  in text units     LineCap  Property Integer LineCap  AEESSS OS SE    Return the line cap style  The line cap style specifies the shape to be used at the end  of open sub paths and dashes when they are stroked     0 Butt cap      Round cap   2 Projecting square cap  LineJoin    Property Integer LineJoin  AGESSS ORS  Gee    This property returns the line join style  The line join style specifies the shape to be  used at the corners of paths that are stroked     0 Miter join  1 Round join  2 Bevel join  LineWidth    Property Single LineWidth  ACCESS OS race    Return a single precision real number in user units of the line w
81. urn the Left value     PageNo  Property Long PageNo          PDF Tools AG   Premium PDF Technology    4 14    3 Heights    PDF Extract API Version 4 5 Page 65 of 80  July 9  2015       Accessors  Get    Return the target page number     Right  Property Single Right  ACCOSSOTIS COL    Return the Right value     Top  Property Single Top  Accessors  Get    Return the Top value     Type  Property Single Type  Accessors  Get    Return the type of the destination  such as  XYZ    Fit    FitH    FitR   etc     Zoom  Property Single Zoom    Accessors  Get    Return the Zoom value of the destination  A value of O has means the zoom level is  left as is  It has the same meaning as a null value  the returns value will be O in both  cases  A value of 1 means 100  magnification     Ocg Interface       The optional content group  OCG  interface allows to list optional content groups  also  known as    Layers     and their properties     Optional content groups  OCGs  in PDF differ substantially from the simple layer  paradigm found e  g  in graphics editing programs  Graphics objects in PDF do not  belong to an OCG  Instead  their visibility is calculated by a Boolean function  dependent on the state of any number of OCGs  For example  a path could be visible  only if OCG  A  is ON and OCG  B  is OFF     The functionality of OCG are described in depth in ISO 32000 1  chapter 8 11 4 or in  the PDF Reference  chapter 4 10  OCG is supported in PDF 1 5 or later     In order to extract content 
82. vailable for mark up annotations  requires PDF 1 5 or later      Subtype  Property String Subtype  Accessors  Get    Return the type of the annotation as string  such as  Widget    Square    PopUp     FreeText    Ink   etc     TextLabel  Property String TextLabel  Accessors  Get    Return the text label of the annotation as string  This label is usually used for the  name of the author     URI  Property String URI  Accessors  Get    Return the URI entry of the annotation as string if present     Vertices  Property Variant Vertices  Accessors  Get    Return the vertices of a polygon annotation           PDF Tools AG   Premium PDF Technology    4 12    3 Heights    PDF Extract API Version 4 5 Page 64 of 80  July 9  2015       OutlineItem Interface       4 13    Count  Property Long Count  Accessors  Get    Return the number of children of the current outline  A negative number means the  child tree is not opened     Dest  Property IPDFDestination  Dest  Accessors  Get    Return an interface to the destination  see Destination Interface      Title  Property String Title  Accessors  Get    Return the title of the outline     Destination Interface    Note that the properties Bottom  Left  Right and Top of the destination interface have  different meanings depending on the Type of the destination  The coordinates are raw  PDF user space coordinates     Bottom  Property Single Bottom    Accessors  Get    Return the Bottom value     Left  Property Single Left    Accessors  Get    Ret
83. written with different subsets of the same font  Different subsets of a font are  considered different fonts  Therefore if the font changes within what visually looks as  one word  it is separated       Text is not written on the same horizontal line  This can occur in some OCRed  documents  There is a built in tolerance to take account it this  however if Y  offsets are  too large  a new word starts       Various possible errors in the font  Such as incorrect or missing width values of the  glyphs  in particular of the blank   incorrect encoding  etc     In all of the above cases  the coordinates need to be considered  Instead of inserting  blanks after each word  as in the sample   the coordinate and width of the previous  text token needs to be compared with the position of the next text token     If text is concatenated     e  blanks are missing  decrease the property SpaceFactor for  example to the value 0 2   See also property SpaceFactor in the Content interface            PDF Tools AG   Premium PDF Technology    3 Heights    PDF Extract API Version 4 5 Page 78 of 80  July 9  2015       Extracted Text is Unreadable    Fonts contain a particular set of glyphs  A glyph is a specific graphical rendering of a  character  The glyphs P  P and P are glyphs of the character    P        Fonts have an encoding  such as WinAnsi  or MacRoman  or custom encodings  The  encoding maps the glyphs to a character  If the encoding in a font is missing  it is  assumed it is WinAnsi encode
84. xtract API Version 4 5 Page 37 of 80  July 9  2015       ExpandLigatures  Property Boolean ExpandLigatures  Accessors  Get  Set    Default  False    When ExpandLigatures is set to true  ligatures such as fi  ff  fl  etc  found during text  extraction are converted to individual characters     Flags  Property Long Flags  ACCOSSOTSISS ES     Return  1 while content is parsed and the annotation flags when annotations are  parsed   see also Property Flags in the Annotation interface     GetNextImage  Method PDFImage GetNextImage       This method reads the content stream objects until an image object can be returned or  the end of the content stream is reached  If an image object could be found  an  interface to the image object  see Image Interface  is returned  Its interface can also  be retrieved through the content   s Image property  The graphics state can be retrieved  through the content   s GraphicsState property     e Return value   An interface to the next image object on the current page if there is any   Nothing otherwise     GetNextObject  Method TPDFContentObject GetNextObject       This method reads the content stream objects until a text  image  or path object can  be returned or the end of the content stream is reached     e Return values     eNone  The end of the content stream has been reached and the content   s  Path property doesn   t return a valid value     eText  A text object could be composed and its interface can be retrieved  through the content   s T
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
Brochure Lexium SD3 - BERGER  Finalités de la culture générale  ALADIN User`s Manual  Audiovox Aca200w User's Manual  LMX Series (Asynchronous 16-Port Multiplexor)  Cliccate qui per un nuovo  ASUS MAXIMUS V FORMULA/THUNDERFX User's Manual  BE6100 Operating Instructions BRINKMANN  i-Hélicoptère Happy Cow  EOS EOS EOS EOS EOS    Copyright © All rights reserved. 
   Failed to retrieve file