Introduction

The fastest way to start with pdf2Data is to create a data field template in the online editor. Upload your template PDF file to the first step, use the data field editor to mark entities to be recognized, download the template and use it in your automated environment for extracting data from a series of PDF files.

Refer to videos section to quickly get familiarized with the user interface.

License key

If you want to use pdf2Data in your environment, you need to have a license key. The license key is an XML file which you have to load to the license key library before using any API.

If you are using other iText add-ons as well, your license keys might be stored in multiple files, especially if you purchased the add-ons separately. In this case you can load several licenses to the license key library one by one, or by passing an array of the license keys to the license key library.

To get a free trial license please fill this form. To get information about pricing, please use request a quote form or contact us directly.

Using pdf2Data in code

Java

The preferred wat to set up pdf2Data in Java is to use a build system like Maven or Gradle and download pdf2Data artifacts from iText Artifactory located at https://repo.itextsupport.com/pdf2data/

The groupId is com.duallab.pdf2data, artifactId is pdf2data

In Maven, the configuration would look similar to the example below

                    <repository>
	<id>pdf2Data</id>
	<name>pdf2Data Maven Repository</name>
	<url>https://repo.itextsupport.com/pdf2data</url>
</repository>


<dependency>
	<groupId>com.duallab.pdf2data</groupId>
	<artifactId>pdf2data</artifactId>
	<version>2.1.3-SNAPSHOT</version>
</dependency>
                

Example of how pdf2Data can be used in code:

                    // Make sure to load license file before invoking any code
LicenseKey.loadLicenseFile(pathToLicenseFile);

// Parse template into an object that will be used later on
Template template = Pdf2DataExtractor.parseTemplateFromPDF(pathToPdfTemplate);

// Create an instance of Pdf2DataExtractor for the parsed template
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);

// Feed file to be parsed against the template. Can be called multiple times for different files
ParsingResult result = extractor.recognize(pathToFileToParse);

// Save result to XML or explore the ParsingResult object to fetch information programmatically
result.saveToXML(pathToOutXmlFile);
                

.NET

For .NET pdf2Data is distributed as a NuGet package which is available at NuGet.org or at iText Artifactory

You can browse for the desired NuGet package manually or install it with Install-Package itext7.pdf2data NuGet Package Manager command

Example of how pdf2Data can be used in code:

                    // Make sure to load license file before invoking any code
LicenseKey.LoadLicenseFile(pathToLicenseFile);

// Parse template into an object that will be used later on
Template template = Pdf2DataExtractor.ParseTemplateFromPDF(pathToPdfTemplate);

// Create an instance of Pdf2DataExtractor for the parsed template
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);

// Feed file to be parsed against the template. Can be called multiple times for different files
ParsingResult result = extractor.Recognize(pathToFileToParse);

// Save result to XML or explore the ParsingResult object to fetch information programmatically
result.SaveToXML(pathToOutXmlFile);
                

Installation instructions for the data fields editor application

If you want to use the data field editor in your environment, follow the installation instructions:

Prerequisites

  • Apache Tomcat 7 (≥ 7.0.77) or 8
  • Java 8

Installation steps

  1. Download the war file of the version you are interested in from iText Artifactory
  2. Create a properties file with the following contents:
                                # Set temporary directory for resources
    dir.temp=your_folder_for_resources
    
    # Path to iText license file, e.g. licensekey=/home/user/license.xml
    licensekey=path_to_license_file.xml
                            
  3. Create an environment variable PDF2DATA_PROPERTIES and set it to the path of the file from previous step
  4. Deploy the application on the installed Tomcat server. In most cases it is sufficient to copy the war file into subdirectory webapps in Tomcat directory
  5. Start Tomcat server, if it was not running before, and you are ready to go

Command Line Interface

It is possible to use pdf2Data from command line as long as you have Java 7 or 8 available

You can download the CLI application from iText Artifactory

The steps are similar to the ones you would typically do in code. The output format for data extraction is XML

Creating template entity from a template PDF

                    java -jar cli.jar preprocess -t template.pdf -x template.xml -l license.xml
                

File recognition

                    java -jar cli.jar parse -t template.xml -f file_for_parsing.pdf -r recognized.pdf -x recognized.xml
                

Help information

                    java -jar cli.jar help preprocess
java -jar cli.jar help parse