Retour page d'accueil
ISO 9001 and ISO 14001
Version FrançaiseVersión española Version arabe

Product Range

ECM - BPM
Gargantua 7

EDMS
Gargantua 5
Gargantua 5 Web
Cold
Push
Thesaurus Manager

Workflow
NORA

Adr/Adp
LARA

Hardware
Scanners
Jukebox

Print Request for documentation

LARA is the automatic document processing module of SIATEL 's EDMS software range. It mainly integrates RAD - Automatic Document Reading (or Recognition) functionalities.

LARA is designed to capture and automatically process information comming from various supports, and then export it towards databases or management systems. LARA provides maximum automation for document processing, with high quality and efficiency which are determining factors for the solution's success and profitability. LARA is the ideal solution: easy to install , user friendly , very flexible . It anticipates all user requirements concerning security and adaptability.

LARA may be used many ways, mainly:

  • As a standalone solution for automatic capture of data comming from various kinds of documents or forms with same architecture,
  • As an add-in dedicated to read and index docuements intended to be stored in EDMS solution Gargantua or processed with the workflow module NORA .

LARA reads printed documents ( OCR ), as well as handwritten characters ( ICR ), marks ( OMR ) and different bar codes and CMC7 codes. Among the main application areas, which are so many, let's cite invoice processing, (any kind of) form processing... LARA could as well process bank deposit slips, questionnaires and many more document types.

Used as a standalone solution , LARA is designed to scan batches of homogeneous or heterogeneous documents, capture any kind of electonic medium (faxes, office files...), process files in order to improve image quality, recognize and verify data according to different methods and finally export them for eventual use.

Used as an add-in running with EDMS solution Gargantua , LARA indexes documents automatically, one by one or by batches.

In order to optimize form processing, LARA 's standard version offers a form design module.

In any kind of organisation, LARA improves document processing by providing quick access to contents. Results are a better service quality, more competitiveness, a rapid return on investment thanks to productivity increase, more ressources that can now be dedicated to high value added operations and an optimized security.

LARA is made of different subsets which provide a complete range of functions meant to implement quickly and easily a powerful document reading, processing and automatic management solution.

Select a chapter ...

1) Creation of processing templates

LARA includes a form design module. As forms are specific templates, this module is described in paragraph 8 .

LARA integrates a template editing wizard, very simple to use, which allows the user to establish rules of extraction of data from documents and forms. To be processed, each kind of document must have an associated template. Therefore, a template database builds up and enriches gradually, with time.

A template is composed of data blocks and automatic validation rules. Within a template, blocks are important insofar as they are used to identify which element is to be recognized and which is not. Blocks are also used to select reference marks which associate a document with its template, or to reposition the image in order to obtain a better recognition rate.


Figure 1: LARA’s template editor

Example: The figure above shows a step of template blocks definition. The red-bordered blocks are considered as “image” areas: not to be recognized. The green-bordered blocks contain text to be recognized. The orange-bordered blocks contain check boxes. The rectangles surrounding several blocks represent an associated rule

Among the main block properties let's cite block types (text, mark, bar code, image); letter types (capital, lower case); number or data types ( example : a phone number must be composed of seven figures ); dictionnaries and lists; recognition and verification options. These properties' main goal is to improve the extraction of information from data blocks. They also optimize the whole process in general terms.

In the templates it is possible to define validation automatic rules. A validation rule is a group of conditions that have to be fulfilled by the recognized data ( example : a “day” recognition must be a number between 0 and 31 unless the month is 02, etc .). Rules are defined in order to:

  • Verify the recognized data format and standardize them if necessary ( example : a rule may be defined in order to verify and standardize dates ),
  • Compare recognition results with databases' contents or authorised values lists,
  • Verify links between several blocks ( examples : verify that a numeric value corresponds to its spelled out value, verify that several values are correctly added up...),
  • Verify arithmetic operation results or field fusion results,...


Figure 2: Defining template validation rules

LARA works on batches in order to simplify the process of automatic extraction of information from the forms. A batch includes pages and templates. Each page represents the image of a digitized document. A template describes a page: it contains the information which is required to identify and recognize the page's content. A batch can contain up to 99 templates, and therefore as many different document structures.


Figure 3: Defining what a batch is composed of.
In this example, the batch is composed of two templates.

Within a template, it is also necessary to define export rules for collected data. It is possible to export the data in several simple text formats like for example XML, or using OLE objects, towards other applications for eventual use.

2) Production architecture and administration

Document processing as a whole includes several stages: digitization, recognition, verification, validation and export of data for eventual use. Each stage can be issued by the same operator on the same workstation, or by different operators on different workstations. They can also be grouped and sequenced.

LARA is a very flexible as well as a totally scalable product. For example, if one recognition stage requires high calculation power or takes much time, it is possible to issue the stage on several workstations all dedicated to it and working in parallel. The architecture may therefore depend on several criteria such as volumes, deadlines, etc.

The administration workstation is used to parameterize the whole process, to define templates, batches and to create forms if necessary. This workstation is also dedicated to process supervision.


Figure 4: Process block diagram

Typically, different kinds of workstations can be allowed:

  • The digitization workstation is the entry point of the process. It issues batches containing digitized documents.
  • The recognition workstation receives digitized batches. Each image of a batch is automatically identified –by associating a recognition template- and recognized –by applying the previously associated recognition template. Manual intervention (operator) may only be necessary in case of page identification error, which is very unlikely. In such case, however, the page would be positionned in a post processing bin.
    The batches are identified and recognized.
  • The validation and verification workstation receives identified and recognized batches.
    Validation is a double process:
    • The first step is automatic. It consists in controlling collected data thanks to validation rules and in using databases, dictionnaries and other.
    • The second step is manual and concerns uncertain points. An operator controls doubtful characters and those specifically pointed out as “to be verified”. He corrects recognition errors, if there are. Once again, validation rules make the manual verification easier and may also be used in order to modify recognition results.

    The batches are validated.

  • The export workstation receives validated batches. It exports data according to previously defined rules. The export process is entirely automatic and does not need any manual intervention.

3) Capture

LARA uses GARGANTUA's digitization modules and integrated drivers. Therefore, there is no need to add external modules, it directly drives many scanners, from the most simple up to the most performing ones. This allows direct and instantaneous document integration in the workspace. No need for extra processing. Twain mode is proposed for scanners with no integrated driver.

In order to facilitate the use of digitization peripherals it is possible to save some parameters such as brightness, contrast, compression, etc., so that you don't have to specify them again every time you need them. Just clic once and apply the right parameters, optimized according to paper type and quality. During the digitization of heterogenous documents you can apply detection functions or threshold automatic adjustment functions to each of them.


Figure 5: Online specification of scanner parameters

After being digitized, a document is stored with a specific file format and compressed with the right mode and format according to the type of document. The standard version of LARA includes main compression formats such as CCITT G4, JPEG, GIF, PNG, etc., used for black and wite, color and grey scale image input. File format and compression format might be changed thanks to a specific function.

In any kind of digitization the images are instantaneously decompressed and the content of the batches can be viewed page by page thanks to a quality control module.


Figure 6: Compression options and modes for two different file formats

After digitization, imaging functions may be applied to a document in order to improve its legibility and quality. LARA integrates a very large palette of imaging tools among which: automatic skew, rotate-by-degree, rotate by 90, 180 or 270 degrees, page orientation detection, offset and repostionning, despeck, spot removal, contrast threshold detection, etc.

4) Identification

Identification is the first stage of the process. It's an automatic operation executed on each page of each batch. Within a batch of varied documents LARA identifies the structure of each one using page structure recognition algorithms. Documents may then be processed according to their structure. This corresponds to the so called ADR functions which can identify an unlimited number of formats or structures, with a success rate near 100%.

Examples:

  • Multipage form processing: during the digitization, a form genetrates several files which are not always properly classified (documents put in wrong order in the scanner's feeder, for example). It is then necessary to put this files in the right order before exporting them.
  • Supplier invoice processing: considering the diversity of templates it is necessary to identify them before recognition. Skew or rotate functions may be applied to them if necessary, in order to extract relevant data: supplier ID, invoice number, date, type of supplies, amounts and other information, for example.

For a batch, the operation is considered as finished when all the images have been identified. Errors are submitted to the operator, if there are.

5) Recognition

Recognition is the main stage of the process. It's an automatic operation executed on each identified page of a batch. LARA integrates the best recognition engines of the market, based on artificial intelligence and multi-level analysis algorithms. Multi-level analysis (MDA) are a mix of different classifiers and recognition engines, meant to analyse data at multiple levels: page, table, cell, paragraph, image, line, word, character... Objects are analysed individually but also within their background. This technology is used for OCR (printed characters), ICR (handwritten letters and characters) and OMR (marks, circles...). This is completed by bar code reading functions.

Recognition is issued by detecting automatically text type and by comparing the areas defined into the templates with digitized documents. Integrated and external dictionaries help improve the recognition quality.


Figure 7: Identification and recognition result

“Template” column shows the name of the template associated to each page. “Characters...” column indicates uncertain characters percentage after recognition process. NB : An uncertain character is not necessarily due to a recognition engine error.

For a batch, the operation is considered as finished when all the images have been recognized. The batch is then itself so considered and is ready for next stage.

6) Validation and verification

LARA uses different verification modes in order to produce reliable and relevant data. Verification and validation stages' main goal is to transform raw data comming from the recognition process into validated data that can be exported and used within other applications.

During validation stage LARA looks into the rules defined in the templates to automatically correct or complete the values contained in fields. It also may verify that the recognized value of a field corresponds to certain criteria (example : a date is in a date field, a word is in a word list, a key is located in a database). Verification stage consists in eliminating manually uncertain characters.

Whether which of validation or verification stages is to be issued first is indicated in the template. The verificatoin may be done before, after or during the validation. It is also possible to parameterize the uncertainty rate applicable to the recognition of a character or a field. It determines whether or not an uncertain character or field should be manually verified.

During verification stage, if a field's uncertainty rate exceeds the limit specified into its parameters, it is isolated in order to be corrected or validated by the operator. The fields are shown to the operator one by one so that he can exclusively concentrate on the current field, not on the whole page.


Figure 8: Verification Uncertain characters appear in red font.
The operator must correct them if necessary in order to validate the field’s value.

Validation rules are executed in sequence. If one of them fails, the page is marked “not validated”. At the end of the process, the pages are shown one by one to the operator to be corrected. The batch is considered valid when all the pages have been verified and validated.

Through an intuitive interface, the operator can verify anytime the batch's process progress.


Figure 9: Result of verification and validation
In this example validation rules have failed in most pages as indicated by the red flags in the “Rules” column.


Figure 10: Rule validation
Three rules have failed in this page. A message is diplayed for each of them and
the related fields are bordered in red so that the operator can identify and eliminate the cause of the error.

7) Export

Export is a totaly automatic operation only applied to “validated” batches. LARA creates files compatible with most of destination applications. This last step actually guarantees interfacing with other solutions. The file, whose type and format are specified in the rules, is generated and transfered to the destination application.

8) FORM Creator module

To optimize the information automatic extraction process it is very important that forms be properly created insofar as other stages – digitization, recognition, verification, export- depend on the legibility and quality of the original document.

LARA's FORM CREATOR module allows the creation of forms intended to be red and processed by automatic systems. These forms are then printed, distributed, filled-in and then collected, digitized and finally processed by different LARA's modules.

LARA's FORM CREATOR has intuitive and user-friendly interface and includes tools for the creation of form usual elements -like check boxes, text areas, labels- which are meant to be interpreted by machines. These elements are designed and created from the biggining in order to be processed later. Therefore, the program checks-up and adjusts each area separately (element size, spacing...) as well as the whole finalized form in order to make its computer processing a success. It is thus possible to easily and rapidly create forms since the program deals with the most tiresome tasks.


Figure 11: FORM CREATOR module

9) Specific Developments

LARA Automation API is a SDK that allows the control of the process, from the digitization stage up to the export stage. The API is built according to standard COM and may be used in Visual Basic, C, C++ applications or in script environments.

SIATEL's software range is the result of a close cooperation between users and the development staff. Since Electronic Document Management Systems and Workflow software do not always exactly fit to customer's specific needs, SIATEL places at their disposal a team of engeneers specialized in developing specific applications and interconnecting them with other products.

10) Peripheral softwares

LARA exports collected data and can be integrated to any existing system if necessary.

LARA also includes direct interfaces to other software such as:

  • GARGANTUA GEDD
  • NORA Workflow
Select a chapter ...

Print Request for documentation