Introduction

The fastest way to start with pdf2Data is to create a data field template in the online editor. Upload your template PDF file to the first step of the process, use the data field editor to mark entities to be recognized, download the template and use it in your automated environment for extracting data from a series of PDF files.

Refer to the videos section to quickly get familiarized with the user interface.

License key

If you want to use pdf2Data in your environment, you need to have a license key. The license key is an XML file which you have to load into the license key library before using any API.

If you are using other iText add-ons as well, your license keys might be stored in multiple files, especially if you purchased the add-ons separately. In this case you can load several licenses into the license key library one by one, or by passing an array of the license keys to the license key library.

To get a free trial license please fill out this form. To get information about pricing, please use request a quote form or contact us directly.

Using pdf2Data in code

Java

The preferred way to set up pdf2Data in Java is to use a build system like Maven or Gradle and download pdf2Data artifacts from the iText Artifactory located at https://repo.itextsupport.com/pdf2data/

The groupId is com.duallab.pdf2data, and the artifactId is pdf2data

In Maven, the configuration would look similar to the example below

                    <repository>
	<id>pdf2Data</id>
	<name>pdf2Data Maven Repository</name>
	<url>https://repo.itextsupport.com/pdf2data</url>
</repository>


<dependency>
	<groupId>com.duallab.pdf2data</groupId>
	<artifactId>pdf2data</artifactId>
	<version>2.1.3-SNAPSHOT</version>
</dependency>
                

Example of how pdf2Data can be used in code:

                    // Make sure to load license file before invoking any code
LicenseKey.loadLicenseFile(pathToLicenseFile);

// Parse template into an object that will be used later on
Template template = Pdf2DataExtractor.parseTemplateFromPDF(pathToPdfTemplate);

// Create an instance of Pdf2DataExtractor for the parsed template
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);

// Feed file to be parsed against the template. Can be called multiple times for different files
ParsingResult result = extractor.recognize(pathToFileToParse);

// Save result to XML or explore the ParsingResult object to fetch information programmatically
result.saveToXML(pathToOutXmlFile);
                

.NET

For .NET pdf2Data is distributed as a NuGet package which is available at NuGet.org or at iText Artifactory

You can browse for the desired NuGet package manually or install it with the Install-Package itext7.pdf2data NuGet Package Manager command

Example of how pdf2Data can be used in code:

                    // Make sure to load license file before invoking any code
LicenseKey.LoadLicenseFile(pathToLicenseFile);

// Parse template into an object that will be used later on
Template template = Pdf2DataExtractor.ParseTemplateFromPDF(pathToPdfTemplate);

// Create an instance of Pdf2DataExtractor for the parsed template
Pdf2DataExtractor extractor = new Pdf2DataExtractor(template);

// Feed file to be parsed against the template. Can be called multiple times for different files
ParsingResult result = extractor.Recognize(pathToFileToParse);

// Save result to XML or explore the ParsingResult object to fetch information programmatically
result.SaveToXML(pathToOutXmlFile);
                

Installation instructions for the data fields editor application

If you want to use the data field editor in your environment, follow these installation instructions:

Prerequisites

  • Apache Tomcat 7 (≥ 7.0.77) or 8
  • Java 8

Installation steps

  1. Download the war file of the version you are interested in from the iText Artifactory
  2. Create a properties file with the following contents:
                                # Set temporary directory for resources
    dir.temp=your_folder_for_resources
    
    # Path to iText license file, e.g. licensekey=/home/user/license.xml
    licensekey=path_to_license_file.xml
                            
  3. Create an environment variable PDF2DATA_PROPERTIES and set it to the path of the file from the previous step
  4. Deploy the application on the installed Tomcat server. In most cases it is sufficient to copy the war file into the webapps subdirectory in the Tomcat directory
  5. Start the Tomcat server, if it was not running before, and you are ready to go

Command Line Interface

It is possible to use pdf2Data from the command line as long as you have Java 7 or 8 installed

You can download the CLI application from the iText Artifactory

The steps are similar to the ones you would typically do in code. The output format for data extraction is XML

Creating template entity from a template PDF

                    java -jar cli.jar preprocess -t template.pdf -x template.xml -l license.xml
                

File recognition

                    java -jar cli.jar parse -t template.xml -s file_for_parsing.pdf -p recognized.pdf -x recognized.xml
                

Help information

                    java -jar cli.jar help preprocess
java -jar cli.jar help parse