Document Text Extraction
This article describes the legacy invoice data extraction with the invoice coding API. For new projects, we suggest that you use Kaunt Document AI for invoice data extraction.
Our Document Extraction solution currently uses the prebuilt-invoice model provided by Azure Cognitive Services. The current version used is the API version 3.1 to extract data from invoices in various formats such as PDF, PNG, JPEG, TIFF, etc. This API provides advanced machine learning models that can accurately extract structured data from unstructured documents. By leveraging this API we can map the extracted data to the Kaunt data model and create a new invoice mapped to the Kaunt data model. In this way, Kaunt can support a wide variety of invoice formats and layouts and process invoices for customers that have a large amount of unstructured invoice formats. The preceding illustration shows a high-level overview of the flow when using the document text extraction service by Kaunt.
High-level illustration of the flow when using the document text extraction service by Kaunt.
While we strive to provide accurate and reliable data extraction services, Kaunt does not take any responsibility for any inaccuracies or errors in the extracted data. The accuracy of the extracted data is dependent on the quality and format of the input document, as well as the performance of the underlying machine learning models. It is the responsibility of the user to verify the accuracy of the extracted data and make any necessary corrections.
Target Audience
The primary target audience for the Document Extraction endpoint comprises users dealing with invoices presented in diverse, unstructured formats and layouts. This endpoint is designed to facilitate the extraction of data from such invoices and intelligently map it to the Kaunt data model. This capability empowers users to process invoices that might not be natively supported by the Kaunt API.
However, if you already possess well-defined invoice formats and layouts, you may not require the Document Extraction endpoint. In such cases, we recommend utilizing the Kaunt API directly with the supported invoice formats and layouts. For a comprehensive list of these supported invoice formats and layouts, please visit our Invoice Formats documentation.
Mapping Between Form Recognizer and Kaunt Data Model
Azure Form Recognizer | Kaunt |
---|---|
CustomerName | Buyer.Name |
CustomerId | N/A |
PurchaseOrder | OrderNumber |
InvoiceId | VendorInvoiceNumber |
InvoiceDate | InvoiceDate |
DueDate | DueDate |
VendorName | Vendor.Name |
VendorAddress | Vendor.Address |
VendorAddressRecipient | Vendor.Contact.Name |
CustomerAddress | Buyer.Address |
CustomerAddressRecipient | Buyer.Contact.Name |
BillingAddress | N/A |
BillingAddressRecipient | N/A |
ShippingAddress | DeliveryAddress |
ShippingAddressRecipient | DeliveryContact.Name |
SubTotal | AmountExVAT |
TotalDiscount | Not Mapped |
TotalTax | vatAmount |
InvoiceTotal | AmountInclVAT |
AmountDue | AmountInclVAT (If InvoiceTotal not present) |
PreviousUnpaidBalance | Not Mapped |
RemittanceAddress | Not Mapped |
RemittanceAddressRecipient | Not Mapped |
ServiceAddress | Not Mapped |
ServiceAddressRecipient | Not Mapped |
ServiceStartDate | Not Mapped |
ServiceEndDate | Not Mapped |
VendorTaxId | Vendor.TaxIdentificationNumber |
CustomerTaxId | Buyer.TaxIdentificationNumber |
PaymentTerm | PaymentInformation.PaymentTermsId |
PaymentDetails | Not Mapped |
PaymentDetails.* | Not Mapped |
PaymentDetails.*.IBAN | Not Mapped |
PaymentDetails.*.SWIFT | Not Mapped |
TaxDetails | Not Mapped |
TaxDetails.* | Not Mapped |
TaxDetails.*.Amount | Not Mapped |
TaxDetails.*.Rate | Not Mapped |
Items.*.Amount | LineAmountInclVAT |
Items.*.Date | Not Mapped |
Items.*.Description | Description |
Items.*.Quantity | Quantity |
Items.*.ProductCode | Name |
Items.*.Tax | LineVatAmount |
Items.*.TaxRate | Not Mapped |
Items.*.Unit | UnitOfMeasure |
Items.*.UnitPrice | Not Mapped |
Items.*.Amount - Items.*.Tax | LineAmountExVAT |
To see a description of the fields extracted from form-recognizer, see the official Azure Form Recognizer Invoice Model Docs.
Invoices without Line Items
Some invoices do not contain line items, although this is a requirement by Kaunt. In case Azure Form Recognizer cannot identify any line items on an invoice, Kaunt will automatically create an invoice line with the header amounts. This invoice line will have its lineNumber
set to GeneratedHeaderLine
.
How do i get access to the Document Text Extraction endpoint?
By default, access to the document text extraction endpoint is disabled for all users. To request access to this endpoint, please reach out to your designated contact at Kaunt or send an email to support@kaunt.com to be signed up for our partner portal."