New Generative AI features in UiPath Document Understanding
UiPath has offered Document Understanding (DU), a leader in the intelligent document processing space, for many years. Document Understanding is a robust intelligent automation tool able to digitise and process scanned or digital documents which are typed or handwritten. Just about any kind of document, including invoices, statements, applications, receipts, contracts, and forms can be processed.
The latest release is an exciting one as it brings Generative AI features into the mix.
Document Understanding’s new Generative AI components leverage machine learning (ML) and natural language processing (NLP) to understand and interpret documents in a more human-like manner by paying attention to context and meaning. This allows the tool to handle a wide range of document formats and structures, making it a versatile solution for businesses dealing with large volumes of data. It understands the context and semantics of the text, enabling it to accurately extract the required information with little-to-no training.
DU is able to classify documents and extract data based on a series of key phrases and prompts, leveraging UiPath’s LLMs (Large Language Models) DocPATH and CommPATH. These LLMs have been trained for their specific tasks, document processing and communications.
In UiPath’s own testing, compared to leading GenAI models, DocPath error rates were 45% to 76% lower. Moreover, in interpreting complex table structures, UiPath DocPath outperformed other IDP and GenAI vendors by registering 30-65% fewer extraction errors. GenAI within UiPath IDP also dramatically reduces the effort it takes to train models to understand specific documents and forms—cutting the time up to 80%, from weeks to hours or a day or two.
Automations can handle multiple types of document formats with incredibly quick setup, as each format does not need to be defined to the process. The UiPath automation flow can then be designed to take actions based on a document’s classification and the information contained within it.
With Generative AI, major improvements have been made to Document Understanding, specifically related to document classification and data extraction; increasing accuracy and reducing setup and implementation time.
UiPath Document Understanding gets better every month. See the latest release notes here – https://docs.uipath.com/document-understanding/automation-cloud/latest
Prior to the Generative AI release, DU required significant training to recognise the document type and the data to be extracted from the document. The tool typically needed to be trained on hundreds of sample document layouts to develop pattern recognition with a high degree of accuracy and confidence. Once the data was retrieved, it was formatted, if necessary, for further processing within an automation or downstream application.
How has Generative AI improved OCR and document digitisation?
Less Training Time, Better ROI
Prior to the introduction of Generative AI, Document Understanding required a good deal of time setting up and training the model to identify different document formats. With the AI based document classifier and data extractor, the time required to define a document’s taxonomy and train the bot has been greatly reduced – by as much as 50-80%.
The speed to implement is incredibly important because it reduces the overall development cost. Lower implementation costs increase ROI, which makes Document Understanding even more cost-effective and suitable for a greater number of use cases.
Easier Setup and Data Retrieval
Data retrieval and formatting rules can now be created with key phrases and prompts, with AI being used to translate and understand the text within the documents to help locate and extract the desired data. For example, the automation might need to locate the invoice date from a document. The LLM reviews the document and can recognise “Invoice Date” even when it may be labelled differently (date, date of invoice, inv# date, etc), or can be found in different places from one invoice to the next.
Using prompts, a developer can easily define that the data field associated with Invoice Date should be retrieved and also specify in which format (DD/MM/YYYY) the date should be retrieved. This greatly reduces training time and the amount of data manipulation which must otherwise be developed in order to format data correctly for processing.
Higher Confidence in Results
Using NLP to analyse and define context, the results of data extraction are categorised at a much higher confidence level. The generative AI extractor is built on machine learning models that continuously improve over time. As it processes more documents, it becomes better at understanding and extracting data, enhancing its performance and reliability.
Although human validation may still be needed at some levels, the amount of required human oversight will continue to decrease over time.
Expanded Use Cases – Multiple Document Types
Before the integration of Generative AI with Document Understanding, document types had to be explicitly defined for the tool. Now, with Natural Language Processing (NLP), multiple document types and formats can be processed, with classification determined by key phrases. This enhancement expands the capability to process a wider range of documents and form types, making Document Understanding valuable for every industry.
Seamless Integration with Existing Automations
The tool integrates seamlessly with existing UiPath automation workflows. This allows businesses to incorporate the generative extractor into their current processes without significant disruptions or the need for extensive development effort.
In Summary
UiPath Document Understanding can help to eliminate the manual data entry used to process information locked within documents, freeing employees to focus on more strategic tasks. Data can be efficiently extracted from various document types, including invoices, receipts, contracts, financial statements, shipping and delivery documents, medical records, lab reports, and insurance forms. This capability allows industries across the board to leverage this powerful tool.
And with the latest release integrating Generative AI, OCR and document digitisation has become even easier and faster and more cost effective to use.