Difference between revisions of "OCR templates"

From OpenKM Documentation
Jump to: navigation, search
m (Add field zones)
 
(31 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{TOCright}} __TOC__
 +
 +
{{Note|Since OpenKM 6.4.2 the plug-in system of OpenKM allows you to expand quickly the functionality offered by the platform, extending the available OCR field parsers without having to rebuild the system to add/change the existing functionality. Refer to [[Extend OCR field parsers]] if you need to extend OCR field parsers feature on 6.4.2.}}
 +
 
OCR Templates allows to create zonal OCR templates which allows to recognise and extract estructured text from scanned images.
 
OCR Templates allows to create zonal OCR templates which allows to recognise and extract estructured text from scanned images.
  
==Create template ==
+
{{Note|Images should be scanned at least at 200 dpi of resolution to get good text recognition from OCR engine.}}
 +
 
 +
==Template creation ==
 
Open OCR template administration option.
 
Open OCR template administration option.
  
[[File:Okm_user_guide_378.png|center]]
+
[[File:Okm_user_guide_378.png|center|800px]]
  
 
Then click on the [[File:add.png]] '''new ocr template icon'''
 
Then click on the [[File:add.png]] '''new ocr template icon'''
Line 15: Line 21:
  
 
=== Add field zones ===
 
=== Add field zones ===
 +
{| border="1" cellpadding="2" cellspacing="0"
 +
|- style="color:green;"
 +
|'''Field'''
 +
|'''Description'''
 +
|-
 +
|'''Name'''
 +
|Template field name
 +
|-
 +
|'''Type'''
 +
|The field type ( String, Date, etc. )
 +
|-
 +
|'''Property'''
 +
|The property group parameter name ( example okp:property.name )
 +
|-
 +
|'''Pattern'''
 +
|Java regular expression pattern ( take a look at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html )
 +
|-
 +
|'''Custom'''
 +
|Commands to process images before OCR. It can contain several commands, one per line ( example: '''/usr/bin/convert ${fileIn} -matte ( +clone  -fuzz 50% -transparent red ) -compose DstOut -composite -negate ${fileOut}''' )
 +
|-
 +
|'''OCR'''
 +
|OCR command line ( example: '''/usr/bin/tesseract ${fileIn} ${fileOut} digits''' )
 +
|-
 +
|'''Use to recognise'''
 +
|Indicate if a field is also used to identify the document or only for extraction purpose.
 +
|-
 +
|'''Zone'''
 +
|Here, the cropped image is shown.
 +
|}
 +
 +
 
Then click on the [[File:Params.png]] '''fields icon'''
 
Then click on the [[File:Params.png]] '''fields icon'''
  
 
[[File:Okm_user_guide_381.png|center]]
 
[[File:Okm_user_guide_381.png|center]]
  
Fill the form and select the zone.
 
  
[[File:Okm_user_guide_382.png|center]]
+
[[File:Okm_user_guide_382.png|center|800px]]
 +
 
 +
Fill the form and select the zone and click '''create''' button.
 +
 
 +
[[File:Okm_user_guide_383.png|center|800px]]
 +
 
 +
=== Test OCR template ===
 +
Click on the [[File:check.png]] '''check icon''' and will be shown all the data fields extracted by zone.
 +
 
 +
 
 +
[[File:Okm_user_guide_384.png|center|800px]]
 +
 
 +
== Recognise testing==
 +
From main OCR template list, click [[File:Recognize.png]] '''recognise icon'''.
 +
 
 +
Fill the form selecting some scanned image to test recognision.
 +
 
 +
[[File:Okm_user_guide_385.png|center]]
 +
 
 +
Click '''recognise''' button.
 +
 
 +
[[File:Okm_user_guide_386.png|center]]
 +
 
 +
== Enable Zonal OCR data capture ==
 +
OCR data capture can be enabled from profiles and automation.
 +
 
 +
=== Profiles ===
 +
To enable zonal OCR data capture from profiles should be enabled the check OCR data capture.
 +
 
 +
[[File:Okm_user_guide_387.png|center]]
 +
 
 +
=== Automation ===
 +
Automation is divided in two operations, validations and actions.
 +
 
 +
There's a validation called '''IsOCRDataCaptureFile''' which validates if OCR data capture engine supports the image format.
 +
 
 +
[[File:Okm_user_guide_388.png|center|800px]]
  
 +
There're two actions '''OCRDataCapture''' and '''AddOCRDataCaptureToWizard'''. '''OCRDataCapture''' capture data and store to metadata. '''AddOCRDataCaptureToWizard''' enables end user wizard to see live ocr data capture process.
  
 +
[[File:Okm_user_guide_389.png|center|800px]]
  
 +
For more information take a look at [[Automation]].
  
 
[[Category: Administration Guide]]
 
[[Category: Administration Guide]]

Latest revision as of 10:30, 8 August 2014


Nota clasica.png Since OpenKM 6.4.2 the plug-in system of OpenKM allows you to expand quickly the functionality offered by the platform, extending the available OCR field parsers without having to rebuild the system to add/change the existing functionality. Refer to Extend OCR field parsers if you need to extend OCR field parsers feature on 6.4.2.

OCR Templates allows to create zonal OCR templates which allows to recognise and extract estructured text from scanned images.


Nota clasica.png Images should be scanned at least at 200 dpi of resolution to get good text recognition from OCR engine.

Template creation

Open OCR template administration option.

Okm user guide 378.png

Then click on the Add.png new ocr template icon

Okm user guide 379.png

Fill the form and click create button.

Okm user guide 380.png

Add field zones

Field Description
Name Template field name
Type The field type ( String, Date, etc. )
Property The property group parameter name ( example okp:property.name )
Pattern Java regular expression pattern ( take a look at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html )
Custom Commands to process images before OCR. It can contain several commands, one per line ( example: /usr/bin/convert ${fileIn} -matte ( +clone -fuzz 50% -transparent red ) -compose DstOut -composite -negate ${fileOut} )
OCR OCR command line ( example: /usr/bin/tesseract ${fileIn} ${fileOut} digits )
Use to recognise Indicate if a field is also used to identify the document or only for extraction purpose.
Zone Here, the cropped image is shown.


Then click on the Params.png fields icon

Okm user guide 381.png


Okm user guide 382.png

Fill the form and select the zone and click create button.

Okm user guide 383.png

Test OCR template

Click on the Check.png check icon and will be shown all the data fields extracted by zone.


Okm user guide 384.png

Recognise testing

From main OCR template list, click Recognize.png recognise icon.

Fill the form selecting some scanned image to test recognision.

Okm user guide 385.png

Click recognise button.

Okm user guide 386.png

Enable Zonal OCR data capture

OCR data capture can be enabled from profiles and automation.

Profiles

To enable zonal OCR data capture from profiles should be enabled the check OCR data capture.

Okm user guide 387.png

Automation

Automation is divided in two operations, validations and actions.

There's a validation called IsOCRDataCaptureFile which validates if OCR data capture engine supports the image format.

Okm user guide 388.png

There're two actions OCRDataCapture and AddOCRDataCaptureToWizard. OCRDataCapture capture data and store to metadata. AddOCRDataCaptureToWizard enables end user wizard to see live ocr data capture process.

Okm user guide 389.png

For more information take a look at Automation.