LARA is designed to capture
and automatically process information comming
from various supports, and then export it
towards databases or management systems.
LARA provides maximum automation for document
processing, with high quality and efficiency
which are determining factors for the solution's
success and profitability. LARA is
the ideal solution: easy
to install , user friendly , very
flexible . It anticipates
all user requirements concerning
security and adaptability.
LARA may be used many ways,
mainly:
- As a standalone solution for
automatic capture of data comming from
various kinds of documents or forms with
same architecture,
- As an add-in dedicated to read and index
docuements intended to be stored in EDMS
solution Gargantua or processed
with the workflow module NORA .
LARA reads printed documents
( OCR ), as well as handwritten
characters ( ICR ), marks
( OMR ) and different bar
codes and CMC7 codes. Among the main application
areas, which are so many, let's cite invoice
processing, (any kind of) form processing...
LARA could as well process bank deposit slips,
questionnaires and many more document types.
Used as a standalone solution , LARA is
designed to scan batches of homogeneous or
heterogeneous documents, capture any kind
of electonic medium (faxes, office files...),
process files in order to improve image quality,
recognize and verify data according to different
methods and finally export them for eventual
use.
Used as an add-in running with EDMS
solution Gargantua , LARA indexes
documents automatically, one by one or
by batches.
In order to optimize form processing, LARA 's
standard version offers a form design module.
In any kind of organisation, LARA improves
document processing by providing quick access
to contents. Results are a better service
quality, more competitiveness, a rapid return
on investment thanks to productivity increase,
more ressources that can now be dedicated
to high value added operations and an optimized
security.
LARA is made of different subsets which
provide a complete range of functions meant to implement quickly
and easily a powerful document reading, processing and automatic
management solution.
1) Creation
of processing templates
LARA includes a form design
module. As forms are specific templates, this
module is described in paragraph 8
.
LARA integrates a template
editing wizard, very simple to use, which allows
the user to establish rules of extraction of
data from documents and forms. To be processed,
each kind of document must have an associated
template. Therefore, a template database builds
up and enriches gradually, with time.
A template is composed of data blocks and automatic
validation rules. Within a template, blocks are
important insofar as they are used to identify
which element is to be recognized and which is
not. Blocks are also used to select reference
marks which associate a document with its template,
or to reposition the image in order to obtain
a better recognition rate.

Figure 1: LARA’s template editor
Example:
The figure above shows a step of template blocks
definition. The red-bordered blocks are considered
as “image” areas: not to be recognized.
The green-bordered blocks contain text to be
recognized. The orange-bordered blocks contain
check boxes. The rectangles surrounding several
blocks represent an associated rule
Among the main block properties
let's cite block types (text, mark, bar code,
image); letter types (capital, lower case); number
or data types ( example :
a phone number must be composed of seven figures );
dictionnaries and lists; recognition and verification
options. These properties' main goal is to improve
the extraction of information from data blocks.
They also optimize the whole process in general
terms.
In the templates it is possible to define validation
automatic rules. A validation rule is a group
of conditions that have to be fulfilled by the
recognized data ( example :
a “day” recognition must be a number
between 0 and 31 unless the month is 02, etc
.). Rules are defined in order to:
- Verify the recognized data format and standardize
them if necessary ( example :
a rule may be defined in order to verify and
standardize dates ),
- Compare recognition results with databases'
contents or authorised values lists,
- Verify links between several blocks ( examples :
verify that a numeric value corresponds to
its spelled out value, verify that several
values are correctly added up...),
- Verify arithmetic operation results or field
fusion results,...

Figure 2: Defining template validation
rules
LARA works on batches in order to simplify the
process of automatic extraction of information
from the forms. A batch includes pages and templates.
Each page represents the image of a digitized
document. A template describes a page: it contains
the information which is required to identify
and recognize the page's content. A batch can
contain up to 99 templates, and therefore as
many different document structures.

Figure 3: Defining what a batch is composed
of.
In this example, the
batch is composed of two templates.
Within a template, it is also necessary to define
export rules for collected data. It is possible
to export the data in several simple text formats
like for example XML, or using OLE objects, towards
other applications for eventual use.
2) Production
architecture and administration
Document processing as a whole includes several
stages: digitization, recognition, verification,
validation and export of data for eventual use.
Each stage can be issued by the same operator
on the same workstation, or by different operators
on different workstations. They can also be grouped
and sequenced.
LARA is a very flexible as
well as a totally scalable product. For example,
if one recognition stage requires high calculation
power or takes much time, it is possible to issue
the stage on several workstations all dedicated
to it and working in parallel. The architecture
may therefore depend on several criteria such
as volumes, deadlines, etc.
The administration workstation is used to parameterize
the whole process, to define templates, batches
and to create forms if necessary. This workstation
is also dedicated to process supervision.

Figure 4: Process block diagram
Typically, different kinds of workstations can
be allowed:
- The digitization workstation is the entry
point of the process. It issues batches containing
digitized documents.
- The recognition workstation receives digitized
batches. Each image of a batch is automatically
identified –by associating a recognition
template- and recognized –by applying
the previously associated recognition template.
Manual intervention (operator) may only be
necessary in case of page identification error,
which is very unlikely. In such case, however,
the page would be positionned in a post processing
bin.
The batches are identified and recognized.
- The export workstation receives validated
batches. It exports data according to previously
defined rules. The export process is entirely
automatic and does not need any manual intervention.
3) Capture
LARA uses GARGANTUA's
digitization modules and integrated drivers.
Therefore, there is no need to add external modules,
it directly drives many scanners, from the most
simple up to the most performing ones. This allows
direct and instantaneous document integration
in the workspace. No need for extra processing.
Twain mode is proposed for scanners with no integrated
driver.
In order to facilitate the use of digitization
peripherals it is possible to save some parameters
such as brightness, contrast, compression, etc.,
so that you don't have to specify them again
every time you need them. Just clic once and
apply the right parameters, optimized according
to paper type and quality. During the digitization
of heterogenous documents you can apply detection
functions or threshold automatic adjustment functions
to each of them.

Figure 5: Online specification of scanner
parameters
After being digitized, a document is stored
with a specific file format and compressed with
the right mode and format according to the type
of document. The standard version of LARA includes
main compression formats such as CCITT G4, JPEG,
GIF, PNG, etc., used for black and wite, color
and grey scale image input. File format and compression
format might be changed thanks to a specific
function.
In any kind of digitization the images are instantaneously
decompressed and the content of the batches can
be viewed page by page thanks to a quality control
module.


Figure 6: Compression options and modes
for two different file formats
After digitization, imaging functions may be
applied to a document in order to improve its
legibility and quality. LARA integrates a very
large palette of imaging tools among which: automatic
skew, rotate-by-degree, rotate by 90, 180 or
270 degrees, page orientation detection, offset
and repostionning, despeck, spot removal, contrast
threshold detection, etc.
4) Identification
Identification is the first stage of the process.
It's an automatic operation executed on each
page of each batch. Within a batch of varied
documents LARA identifies the structure of each
one using page structure recognition algorithms.
Documents may then be processed according to
their structure. This corresponds to the so called
ADR functions which can identify an unlimited
number of formats or structures, with a success
rate near 100%.
Examples:
- Multipage form processing: during the digitization,
a form genetrates several files which are not
always properly classified (documents put in
wrong order in the scanner's feeder, for example).
It is then necessary to put this files in the
right order before exporting them.
- Supplier invoice processing:
considering the diversity of templates it is
necessary to identify them before recognition.
Skew or rotate functions may be applied to
them if necessary, in order to extract relevant
data: supplier ID, invoice number, date, type
of supplies, amounts and other information,
for example.
For a batch, the operation is considered as
finished when all the images have been identified.
Errors are submitted to the operator, if there
are.
5) Recognition
Recognition is the main stage of the process.
It's an automatic operation executed on each
identified page of a batch. LARA integrates the
best recognition engines of the market, based
on artificial intelligence and multi-level analysis
algorithms. Multi-level analysis (MDA) are a
mix of different classifiers and recognition
engines, meant to analyse data at multiple levels:
page, table, cell, paragraph, image, line, word,
character... Objects are analysed individually
but also within their background. This technology
is used for OCR (printed characters), ICR (handwritten
letters and characters) and OMR (marks, circles...).
This is completed by bar code reading functions.
Recognition is issued by detecting automatically
text type and by comparing the areas defined
into the templates with digitized documents.
Integrated and external dictionaries help improve
the recognition quality.

Figure 7: Identification and recognition
result
“Template” column shows the name
of the template associated to each page. “Characters...” column
indicates uncertain characters percentage after
recognition process. NB : An
uncertain character is not necessarily due to
a recognition engine error.
For a batch, the operation is considered as
finished when all the images have been recognized.
The batch is then itself so considered and is
ready for next stage.
6) Validation
and verification
LARA uses different verification
modes in order to produce reliable and relevant
data. Verification and validation stages' main
goal is to transform raw data comming from the
recognition process into validated data that
can be exported and used within other applications.
During validation stage LARA looks into the
rules defined in the templates to automatically
correct or complete the values contained in fields.
It also may verify that the recognized value
of a field corresponds to certain criteria (example :
a date is in a date field, a word is in a word
list, a key is located in a database).
Verification stage consists in eliminating manually
uncertain characters.
Whether which of validation or verification
stages is to be issued first is indicated in
the template. The verificatoin may be done before,
after or during the validation. It is also possible
to parameterize the uncertainty rate applicable
to the recognition of a character or a field.
It determines whether or not an uncertain character
or field should be manually verified.
During verification stage, if a field's uncertainty
rate exceeds the limit specified into its parameters,
it is isolated in order to be corrected or validated
by the operator. The fields are shown to the
operator one by one so that he can exclusively
concentrate on the current field, not on the
whole page.

Figure 8: Verification Uncertain characters
appear in red font.
The operator must correct
them if necessary in order to validate the
field’s value.
Validation rules are executed in sequence. If
one of them fails, the page is marked “not
validated”. At the end of the process,
the pages are shown one by one to the operator
to be corrected. The batch is considered valid
when all the pages have been verified and validated.
Through an intuitive interface, the operator can
verify anytime the batch's process progress.

Figure 9: Result of verification and validation
In this example validation
rules have failed in most pages as indicated
by the red flags in the “Rules” column.

Figure 10: Rule validation
Three rules have failed
in this page. A message is diplayed for each
of them and
the related fields are bordered in red so
that the operator can identify and eliminate
the cause of the error.
7) Export
Export is a totaly automatic operation only
applied to “validated” batches. LARA
creates files compatible with most of destination
applications. This last step actually guarantees
interfacing with other solutions. The file, whose
type and format are specified in the rules, is
generated and transfered to the destination application.
8) FORM
Creator module
To optimize the information automatic extraction
process it is very important that forms be properly
created insofar as other stages – digitization,
recognition, verification, export- depend on
the legibility and quality of the original document.
LARA's FORM CREATOR module
allows the creation of forms intended to be red
and processed by automatic systems. These forms
are then printed, distributed, filled-in and
then collected, digitized and finally processed
by different LARA's modules.
LARA's FORM CREATOR has intuitive
and user-friendly interface and includes tools
for the creation of form usual elements -like check
boxes, text areas, labels- which are meant to be
interpreted by machines. These elements are designed
and created from the biggining in order to be processed
later. Therefore, the program checks-up and adjusts
each area separately (element size, spacing...)
as well as the whole finalized form in order to
make its computer processing a success. It is thus
possible to easily and rapidly create forms since
the program deals with the most tiresome tasks.

Figure 11: FORM CREATOR module
9) Specific
Developments
LARA Automation API is a SDK
that allows the control of the process, from
the digitization stage up to the export stage.
The API is built according to standard COM and
may be used in Visual Basic, C, C++ applications
or in script environments.
SIATEL's software range is
the result of a close cooperation between users
and the development staff. Since Electronic Document
Management Systems and Workflow software do not
always exactly fit to customer's specific needs, SIATEL places
at their disposal a team of engeneers specialized
in developing specific applications and interconnecting
them with other products.
10) Peripheral
softwares
LARA exports collected data
and can be integrated to any existing system
if necessary.
LARA also includes direct interfaces
to other software such as:
- GARGANTUA GEDD
- NORA Workflow