Third-party software integration: OCR Cuneiform
From OpenKM Documentation
CuneiForm is an OCR tool. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware on December 12, 2007. The kernel of OCR engine was released under the open source BSD license license at the beginning of April 2008.
You can grab binaries from these sites:
- http://en.openocr.org
- http://pkgs.org/search/?keyword=cuneiform
- http://ftp.es.debian.org/debian/pool/non-free/c/cuneiform
- http://notesalexp.net/
If you are using a computer with Debian / Ubuntu, the installation simplifies a lot:
$ aptitude install cuneiform
Compile from source code
You can download the source code from http://code.google.com/p/tesseract-ocr/ and compile yourself. Also download the language files you need and uncompress them in the same folder of the application.
$ aptitude install cmake g++ imagemagick libmagick++-dev $ tar xjvf cuneiform-linux-1.0.0.tar.bz2 $ cd cuneiform-linux-1.0.0 $ mkdir builddir $ cd builddir $ cmake -DCMAKE_BUILD_TYPE=release .. $ make install
Once installed, edit the file /etc/bash.bashrc and add at the end:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib64
The Cuneiform executable will be located at:
/usr/local/bin/cuneiform