Invoice Extraction and Learning in Kofax Transformation

RPI’s Senior Consultants, Brian Ayres and Derreck Mayer, provide an exploration of invoice extraction with Kofax Transformation (formerly Kofax KTM). See how Transformation extraction rates improve over time using Learning Server.


Brian Ayres:                      

Good morning everybody, thanks for joining our summer webinar series on KTM Learning Server. We’ll do a quick info about RPI. We’ve been working with Perceptive for about 16 years, our imaging practice has grown, we’re up to about 20 members now. The larger portion of our practice is actually a Lawson practice, we have about 70 consultants on the Lawson side. Our main offices are in Baltimore, Tampa, and Kansas City where I’m actually at today.

As far as the work we do at Perceptive, pretty much everything. We do eForms, iScript designs, workflow designs, redesigns, upgrades, which we’ve been doing a lot of recently, as I’m sure you’re aware if you use Perceptive. Health checks, security audits, migrations, we also work in the clinical and HL7 space. That’s a little bit about RPI.

A little bit about myself. My name is [Brian Ayres 00:01:08], I’m a senior Kofax consultant at RPI. I’ve been with RPI for about 11 years. The majority of that time I’ve been working with the Perceptive tools as well as Kofax. It’s kind of given me a unique view into the Kofax software, I’ve seen it grow over the years. One of their bragging points usually for their sales is how much money they put into their research and development and I can definitely verify that that’s true. We’ve seen the software grow a lot over the last couple years which has made this smart learning a very powerful tool.

Okay, so now we’ll jump into the good stuff here. Today we’re going to take a look inside the Black Box that is smart learning and data extraction. A lot of you have probably seen Kofax demos or have seen data capture demos. Some of you may already own the software and you do demos, or you see it work as an end-user. It just magically works. We want to show how to set up your projects to maximize the smart learning. We’ll actually get into some technical stuff to see within Project Builder what settings you want to have set up, what locators you want to use to accomplish this. We’ll take a look at that functionality.

Before we get into that, I want to talk a little bit just about the Kofax software, the different pieces of Kofax. There’s three main pieces I want to discuss. The Kofax Transformation Modules is the smart learning portion of it, so that’s where we’re going to spend a lot of our time today, but the Kofax Capture piece is basically the front end scanning where you do have a scanner hooked up and you scan in documents. Also if you’re going to be ingesting emails, things like that, that all happens in the capture portion of Kofax. Then the Transformation Modules is where all the smart learning happens, where you design your validation forms, you build in all your field validation, and formatters.

Those two pieces usually hand-in-hand. When we install a Kofax solution, typically our customers get installed Kofax Capture and Kofax Transformation Modules, which I’ll probably be calling KTM the rest of the way here, so if I say KTM, it’s Kofax Transformation Modules.

The Kofax Total Agility though, I did just want to mention this … It’s a newer platform for Kofax, so a lot of the newer installations are using Total Agility. It basically allows you to have some more workflow and business process capabilities with inside Kofax. It allows you to do mobile capture and a lot of advanced analytics around business processes. The newer installations are going to have Kofax Capture and Kofax Transformation Modules kind of built into Total Agility, so they’re modules within that application. Or Kofax Capture and Transformation Modules can stand alone as well. I just wanted to mention that.

The typical lifecycle of an invoice or document through Kofax … You can use Kofax for many different types of documents. I’m going to specifically be talking about invoices today and use that kind of as my example. This is a typical flow of data that would come through Kofax. You have your scan step, which is going to be manual if you’re actually using a physical scanner. If you are ingesting documents through email or file imports, or even fax, if people still fax, they can be ingested through the scan module as well. Those would be kind of automatic steps where you’d have an email box that’s monitored to bring documents in.

After they come into Kofax, they got onto the KTM server. That’s really where all the magic happens. All the data gets extracted, all the rules and locators that you’ve built are applied. That is an automated step. Nobody’s touching it when it’s going through that step.

The validation module, that’s where your end-users, your validators, we’ll call them, will go in and actually validate the data that gets extracted off of the invoices or documents. We’re going to see how they can actually make the system smarter as well through that step, and flag things for learning. After that step, there’s another automated step to the knowledge base learning server, and that is where the system will get smarter over time. It will learn the invoice samples that you’re running through Kofax. The next time a batch gets scanned in, you’ll have the advantage of that knowledge that was added to your knowledge base.

Then the last step is export. That’s where data’s passed off to any other applications that need it. If you have any third-party applications, another ERP, document imaging, reporting warehouses, things like that … we can build exports to connect to those. The export in Kofax is very flexible. We’ve done SOAP and REST web service interfaces, flat file exports, database updates. You can really interface with any application that allows you to ingest data. We can format it the way that that application needs it.

Next I wanted to show you a sample validation form real quick. I think probably a lot of you are a little bit familiar with Kofax, but I want to show you what a validation form looks like that we’ve designed. This was actually for a PO solution. We have basically all of our PO header data, our vendor remit to data, the invoice header information that we’re able to extract. Any discounts, then exceptions or comments that need to be added to the invoice. We actually have two forms set up here. This is a PO solution as I mentioned, so we have a PO header, which is everything we’re seeing here. Then we have a PO line matching tab as well where you can actually do all the PO line matching.

The one thing I did want to mention on this form is the button up here that’s highlighted in red. That’s actually your specific online learning button. That will allow you to submit this document to that knowledge base learning server, where the learning will actually take place.

We don’t necessarily train every document that comes through Kofax. Sometimes when we’re setting up new installations and new builds we’ll do that for user-acceptance testing to get the knowledge base built up with documents, but typically when the validator’s seeing these, we’re not going to be learning every document. They’re going to flag the ones, if we have a PO number that comes in that Kofax doesn’t recognize but it’s kind of clear on the invoice to us where that PO number is, we’re going to have the validator … we call it lasso, but basically highlight that data point on the invoice and then they’ll flag that. Just click that specific online learning button and then Kofax will know to learn this document. That’s how the validator’s actually make the projects smarter as time goes on.

Okay, so now we’re getting into the good stuff here. The types of locators that we’re going to use to extract data off the invoices … there’s really four main types. There’s a lot of different configurations you can do within Kofax to pull data off of invoices. These four are the four that I’ve typically seen used. I’ll go through and explain each one and then kind of explain how I’ve been setting up projects recently when I build them in Kofax.

As I mentioned, over the years Kofax has definitely improved their software and the locators that we have available to us. We’ve kind of always had that first one here, the out-of-the-box knowledge base from Kofax that ships … basically has, I think, around 1400 invoice samples that have been submitted to it. It’s basically a grouping of invoice header information, so you have your invoice number or your PO number, you’ll have all your amounts, different amounts on the invoice that Kofax has already learned. You can you use that locator to populate those fields. You could actually add to that. That’s how the smart learning used to work, you’d actually add to those knowledge bases and grow them as you submit documents. That’s what we had for a long time and that worked great. We still use it.

The next one is the format locators. This is a little bit more of a manual locator. When you set this up, you are going to define usually a dictionary full of keywords that include … if we’re talking about invoice numbers, you might have a dictionary that would have terms like invoice number, or invoice num, or INV#, or anything that would resemble what a vendor might put on their invoice to identify the invoice number itself. Then Kofax looks for those phrases on the invoice and then it will grab the data. If you say check to the right of anything you see that says invoice number, and you find a number, it’s probably your invoice number. That’s how those locators work. They are a little bit more manual to set up, but they’re kind of a good last-line of defense if we can’t find any other fields a different way.

The trainable group locator is a newer locator that Kofax has come out with. It’s probably maybe five years old at this point, but this one was kind of a game changer when it came to extracting data. Similar to the out-of-the-box knowledge base that Kofax shipped with, you can train it, but you can define your own fields. We were a little limited with the out-of-the-box knowledge base that came originally with Kofax. You had a set number of fields that they were extracting, they already trained, and that was pretty much all you had. With the trainable group locator though, you can define all the fields you want to extract off the invoice.

They don’t h