Third-party software integration

From OpenKM Documentation
Revision as of 10:44, 22 January 2010 by Pavila (talk | contribs) (Created page with '== Apache == Expose OpenKM directly from JBoss can be dangerous if you need the application to be accessed from Internet. Also this 8080 may be closed by a firewall. For these re…')

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Apache

Expose OpenKM directly from JBoss can be dangerous if you need the application to be accessed from Internet. Also this 8080 may be closed by a firewall. For these reasons, is a good idea expose your OpenKM installation through the standard web port 80. In the following steps we explain how to configure Apache to handle these request and forward to JBoss application server using the AJP13 protocol.

From the Apache documentation: The AJP13 protocol is packet-oriented. A binary format was presumably chosen over the more readable plain text for reasons of performance. The web server communicates with the servlet container over TCP connections. To cut down on the expensive process of socket creation, the web server will attempt to maintain persistent TCP connections to the servlet container, and to reuse a connection for multiple request/response cycles.

The first thing in to install the required Apache software. From Debian / Ubuntu you can install Apache with a single command:

$ sudo aptitude install apache2

Edit the file called /etc/apache2/apache2.conf and configure a ServerName to prevent warnings in the Apache startup process:

ServerRoot "/etc/apache2"
ServerName "your-domain.com"

Enable the proxy module, needed to forward petitions to JBoss:

$ sudo a2enmod proxy_ajp

Now create the configuration file /etc/apache2/sites-available/openkm.cfg with this content:

<VirtualHost *>
    ServerName openkm.your-domain.com
    RedirectMatch ^/$ /OpenKM
    <Location /OpenKM>
        ProxyPass ajp://127.0.0.1:8009/OpenKM
        ProxyPassReverse http://openkm.your-domain.com/OpenKM
    </Location>
    CustomLog /var/log/apache2/openkm-access.log combined
</VirtualHost>

The VirtualHost ServerName must be other than ServerName in the main Apache configuration. Enable this site configuration:

$ sudo a2ensite openkm.cfg

You have to enable explicity the proxy access editing the Apache configuration file /etc/apache2/mods-available/proxy.conf:

<Proxy *>
    AddDefaultCharset off
    Order deny,allow
    Allow from all
    Deny from all
    #Allow from .example.com
</Proxy>

Finally restart Apache:

$ sudo /etc/init.d/apache2 restart

Now you can access your OpenKM installation from http://openkm.your-domain.com/. Another advantage of using Apache is that you can log OpenKM access and generate web statistics.

For more info, visit:

OCR

Tesseract is an Open Source OCR engine adopted by Google. It works really well. The OCR natively can read TIFF documents and has hight ratio of recognition with images 300 dpi of resolution and converted to lineart (1 bit color).

You can download the source code from http://code.google.com/p/tesseract-ocr/ and compile yourself. Also download the language files you need and uncompress them in the same folder of the application.

If you are using a computer with Debian / Ubuntu, the installation simplifies a lot:

$ aptitude install tesseract-ocr

And

$ aptitude install tesseract-ocr-eng

If you want to add support for english language. Now you have to tell OpenKM to use this OCR application. Edit the file OpenKM.cfg:

$ vim OpenKM.cfg

And set the system.ocr property to the path of the tesseract executable:

system.ocr=/usr/local/bin/tesseract

For more info, go to http://code.google.com/p/tesseract-ocr/.

There is also another interesting free OCR application called OCRopus. It has many improvements over Tesseract but is on early development stage. Last released version (0.3.1) is quite usable and works very well but have to be compiled and actually is a difficult task. Visit http://code.google.com/p/ocropus/ for more info.

OpenOffice.org

OpenKM can convert some document types to PDF. This is a great help if need to read an Microsoft Office / OpenOffice.org document and you don't have the software installed in the computer.

You need an OpenOffice.org installation in the OpenKM server, and also this OpenOffice.org application has to be running in server mode (also known as headless). In Debian / Ubuntu, depending of you OpenOffice.org version you will have to install an X11 virtual server or not:

$ apt-get install xvfb

And start it using this command:

$ xvfb-run /usr/lib/openoffice/program/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard

From OpenOffice.org 2.3, it is not necessary the X11 virtual server but you should install these packages:

$ aptitude install openoffice.org-headless openoffice.org-java openoffice.org

But before of this, you must enable a couple of repositories:

 deb http://en.archive.ubuntu.com/ubuntu/ hardy-updates universe
 deb http://en.archive.ubuntu.com/ubuntu/ hardy-updates multiverse

This script simplifies the start process (For security reasons, you should no start OpenOffice.org as root):

#!/bin/sh
unset DISPLAY
/usr/lib/openoffice/program/soffice "-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -nologo
 -headless -nofirststartwizard

OpenOffice.org will listen at port 8100, so you can check that the application has started running this:

$ netstat -putan | grep 8100

Also you can configure OpenOffice.org as a service with this script:

#!/bin/bash
# openoffice.org headless server script
#
# chkconfig: 2345 80 30
# description: headless openoffice server script
# processname: openoffice
#
# Author: Vic Vijayakumar
# Modified by Paco Avila and Federico Ch. Tomasczik
#
SOFFICE=/usr/bin/soffice
PIDFILE=/var/run/openoffice-server.pid
set -e
case "$1" in
    start)
        if [ -f $PIDFILE ]; then
            echo "OpenOffice headless server has already started."
            sleep 5
            exit
        fi
        echo "Starting OpenOffice headless server"
        $SOFFICE -headless -nologo -nofirststartwizard -accept="socket,host=127.0.0.1,port=8100;urp" & > /dev/null 2>&1
        touch $PIDFILE
        ;;
    stop)
        if [ -f $PIDFILE ]; then
            echo "Stopping OpenOffice headless server."
            killall -9 soffice && killall -9 soffice.bin
            rm -f $PIDFILE
            exit
        fi
        echo "Openoffice headless server is not running."
        exit
        ;;
    *)
        echo "Usage: $0 {start|stop}"
        exit 1
esac
exit 0

Change the permissions to this file:

$ chmod 0755 /etc/init.d/openoffice

Install openoffice init script links:

$ update-rc.d openoffice defaults

And this script will launch OpenOffice.org on every system reboot. Also you can launch it manually this way:

$ /etc/init.d/openoffice start

More info at:

Antivirus

OpenKM can check if a submitted document is infected. It works with an Open Source antivirus software called ClamAV. Edit OpenKM.cfg and add this line:

system.antivir=/path/to/clamscan

This screenshot shows an error message from OpenKM because the submitted document is infected by a virus:

To install ClamAV on Debian / Ubuntu distribution:

 $ sudo aptitude install clamav

To install ClamAV in Centos 5.2 you need more work. First create a file named /etc/yum.repos.d/dag.repo with this content:

[dag]
name=Dag RPM Repository for Red Hat Enterprise Linux
baseurl=http://apt.sw.be/redhat/el$releasever/en/$basearch/dag/
gpgcheck=1
gpgkey=http://dag.wieers.com/packages/RPM-GPG-KEY.dag.txt
enabled=1

Now install the program as root:

$ yum install clamd.i386

Start the daemon:

$ /etc/init.d/clamd start

And update the virus database:

$ freshclam