Difference between revisions of "OCR templates"

Revision as of 11:29, 8 August 2014

Since OpenKM 6.4.2 the plug-in system of OpenKM allows you to expand quickly the functionality offered by the platform, extending the available OCR field parsers without having to rebuild the system to add/change the existing functionality. Refer to Extend OCR field parsers if you need to extend OCR field parsers feature on 6.4.2.

OCR Templates allows to create zonal OCR templates which allows to recognise and extract estructured text from scanned images.

Images should be scanned at least at 200 dpi of resolution to get good text recognition from OCR engine.

Template creation

Open OCR template administration option.

Then click on the new ocr template icon

Fill the form and click create button.

Add field zones

Field	Description
Name	Template field name
Type	The field type ( String, Date, etc. )
Property	The property group parameter name ( example okp:property.name )
Pattern	Java regular expression pattern ( take a look at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html )
Custom	Commands for processing images before OCR. It can contain several commands, one per line ( example: /usr/bin/convert ${fileIn} -matte ( +clone -fuzz 50% -transparent red ) -compose DstOut -composite -negate ${fileOut} )
OCR	OCR command line ( example: /usr/bin/tesseract ${fileIn} ${fileOut} digits )
Use to recognise	Indicate if a field is also used to identify the document or only for extraction purpose.
Zone	Here, the cropped image is shown.

Then click on the fields icon

Fill the form and select the zone and click create button.

Test OCR template

Click on the check icon and will be shown all the data fields extracted by zone.

Recognise testing

From main OCR template list, click recognise icon.

Fill the form selecting some scanned image to test recognision.

Click recognise button.

Enable Zonal OCR data capture

OCR data capture can be enabled from profiles and automation.

Profiles

To enable zonal OCR data capture from profiles should be enabled the check OCR data capture.

Automation

Automation is divided in two operations, validations and actions.

There's a validation called IsOCRDataCaptureFile which validates if OCR data capture engine supports the image format.

There're two actions OCRDataCapture and AddOCRDataCaptureToWizard. OCRDataCapture capture data and store to metadata. AddOCRDataCaptureToWizard enables end user wizard to see live ocr data capture process.

For more information take a look at Automation.

@@ Line 30: / Line 30: @@
 |-
 |'''Type'''
-|The field type ( String, Date etc... )
+|The field type ( String, Date, etc. )
 |-
 |'''Property'''
@@ Line 36: / Line 36: @@
 |-
 |'''Pattern'''
-|Java pattern ( take a look at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html )
+|Java regular expression pattern ( take a look at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html )
 |-
 |'''Custom'''
-|Terminal executions commands for processing images before ocr. Can be several commands, one per line ( example: '''/usr/bin/convert ${fileIn} -matte ( +clone  -fuzz 50% -transparent red ) -compose DstOut -composite -negate ${fileOut}''' )
+|Commands for processing images before OCR. It can contain several commands, one per line ( example: '''/usr/bin/convert ${fileIn} -matte ( +clone  -fuzz 50% -transparent red ) -compose DstOut -composite -negate ${fileOut}''' )
 |-
 |'''OCR'''
-|Terminal ocr command line ( example: '''/usr/bin/tesseract ${fileIn} ${fileOut} digits''' )
+|OCR command line ( example: '''/usr/bin/tesseract ${fileIn} ${fileOut} digits''' )
 |-
 |'''Use to recognise'''
@@ Line 48: / Line 48: @@
 |-
 |'''Zone'''
-|Is shown the cropped image
+|Here, the cropped image is shown.
 |}

Difference between revisions of "OCR templates"

Revision as of 11:29, 8 August 2014

Contents

Template creation

Add field zones

Test OCR template

Recognise testing

Enable Zonal OCR data capture

Profiles

Automation

Navigation menu

Views

Personal tools

Navigation

Search

Tools