Difference between revisions of "Third-party software integration"
(Created page with '== Apache == Expose OpenKM directly from JBoss can be dangerous if you need the application to be accessed from Internet. Also this 8080 may be closed by a firewall. For these re…') |
(No difference)
|
Revision as of 10:44, 22 January 2010
Contents
Apache
Expose OpenKM directly from JBoss can be dangerous if you need the application to be accessed from Internet. Also this 8080 may be closed by a firewall. For these reasons, is a good idea expose your OpenKM installation through the standard web port 80. In the following steps we explain how to configure Apache to handle these request and forward to JBoss application server using the AJP13 protocol.
From the Apache documentation: The AJP13 protocol is packet-oriented. A binary format was presumably chosen over the more readable plain text for reasons of performance. The web server communicates with the servlet container over TCP connections. To cut down on the expensive process of socket creation, the web server will attempt to maintain persistent TCP connections to the servlet container, and to reuse a connection for multiple request/response cycles.
The first thing in to install the required Apache software. From Debian / Ubuntu you can install Apache with a single command:
$ sudo aptitude install apache2
Edit the file called /etc/apache2/apache2.conf and configure a ServerName to prevent warnings in the Apache startup process:
ServerRoot "/etc/apache2"
ServerName "your-domain.com"
Enable the proxy module, needed to forward petitions to JBoss:
$ sudo a2enmod proxy_ajp
Now create the configuration file /etc/apache2/sites-available/openkm.cfg with this content:
<VirtualHost *>
ServerName openkm.your-domain.com
RedirectMatch ^/$ /OpenKM
<Location /OpenKM>
ProxyPass ajp://127.0.0.1:8009/OpenKM
ProxyPassReverse http://openkm.your-domain.com/OpenKM
</Location>
CustomLog /var/log/apache2/openkm-access.log combined
</VirtualHost>
The VirtualHost ServerName must be other than ServerName in the main Apache configuration. Enable this site configuration:
$ sudo a2ensite openkm.cfg
You have to enable explicity the proxy access editing the Apache configuration file /etc/apache2/mods-available/proxy.conf:
<Proxy *>
AddDefaultCharset off
Order deny,allow
Allow from all
Deny from all
#Allow from .example.com
</Proxy>
Finally restart Apache:
$ sudo /etc/init.d/apache2 restart
Now you can access your OpenKM installation from http://openkm.your-domain.com/. Another advantage of using Apache is that you can log OpenKM access and generate web statistics.
For more info, visit:
- http://httpd.apache.org/docs/2.2/mod/mod_proxy.html
- http://httpd.apache.org/docs/2.2/mod/mod_proxy_ajp.html
OCR
Tesseract is an Open Source OCR engine adopted by Google. It works really well. The OCR natively can read TIFF documents and has hight ratio of recognition with images 300 dpi of resolution and converted to lineart (1 bit color).
You can download the source code from http://code.google.com/p/tesseract-ocr/ and compile yourself. Also download the language files you need and uncompress them in the same folder of the application.
If you are using a computer with Debian / Ubuntu, the installation simplifies a lot:
$ aptitude install tesseract-ocr
And
$ aptitude install tesseract-ocr-eng
If you want to add support for english language. Now you have to tell OpenKM to use this OCR application. Edit the file OpenKM.cfg:
$ vim OpenKM.cfg
And set the system.ocr property to the path of the tesseract executable:
system.ocr=/usr/local/bin/tesseract
For more info, go to http://code.google.com/p/tesseract-ocr/.
There is also another interesting free OCR application called OCRopus. It has many improvements over Tesseract but is on early development stage. Last released version (0.3.1) is quite usable and works very well but have to be compiled and actually is a difficult task. Visit http://code.google.com/p/ocropus/ for more info.
OpenOffice.org
OpenKM can convert some document types to PDF. This is a great help if need to read an Microsoft Office / OpenOffice.org document and you don't have the software installed in the computer.
You need an OpenOffice.org installation in the OpenKM server, and also this OpenOffice.org application has to be running in server mode (also known as headless). In Debian / Ubuntu, depending of you OpenOffice.org version you will have to install an X11 virtual server or not:
$ apt-get install xvfb
And start it using this command:
$ xvfb-run /usr/lib/openoffice/program/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard
From OpenOffice.org 2.3, it is not necessary the X11 virtual server but you should install these packages:
$ aptitude install openoffice.org-headless openoffice.org-java openoffice.org
But before of this, you must enable a couple of repositories:
deb http://en.archive.ubuntu.com/ubuntu/ hardy-updates universe
deb http://en.archive.ubuntu.com/ubuntu/ hardy-updates multiverse
This script simplifies the start process (For security reasons, you should no start OpenOffice.org as root):
#!/bin/sh
unset DISPLAY
/usr/lib/openoffice/program/soffice "-accept=socket,host=localhost,port=8100;urp;StarOffice.ServiceManager" -nologo
-headless -nofirststartwizard
OpenOffice.org will listen at port 8100, so you can check that the application has started running this:
$ netstat -putan | grep 8100
Also you can configure OpenOffice.org as a service with this script:
#!/bin/bash
# openoffice.org headless server script
#
# chkconfig: 2345 80 30
# description: headless openoffice server script
# processname: openoffice
#
# Author: Vic Vijayakumar
# Modified by Paco Avila and Federico Ch. Tomasczik
#
SOFFICE=/usr/bin/soffice
PIDFILE=/var/run/openoffice-server.pid
set -e
case "$1" in
start)
if [ -f $PIDFILE ]; then
echo "OpenOffice headless server has already started."
sleep 5
exit
fi
echo "Starting OpenOffice headless server"
$SOFFICE -headless -nologo -nofirststartwizard -accept="socket,host=127.0.0.1,port=8100;urp" & > /dev/null 2>&1
touch $PIDFILE
;;
stop)
if [ -f $PIDFILE ]; then
echo "Stopping OpenOffice headless server."
killall -9 soffice && killall -9 soffice.bin
rm -f $PIDFILE
exit
fi
echo "Openoffice headless server is not running."
exit
;;
*)
echo "Usage: $0 {start|stop}"
exit 1
esac
exit 0
Change the permissions to this file:
$ chmod 0755 /etc/init.d/openoffice
Install openoffice init script links:
$ update-rc.d openoffice defaults
And this script will launch OpenOffice.org on every system reboot. Also you can launch it manually this way:
$ /etc/init.d/openoffice start
More info at:
- http://www.artofsolving.com/node/10
- http://www.oooforum.org/forum/viewtopic.phtml?t=11890
- http://code.google.com/p/openmeetings/wiki/OpenOfficeConverter
Antivirus
OpenKM can check if a submitted document is infected. It works with an Open Source antivirus software called ClamAV. Edit OpenKM.cfg and add this line:
system.antivir=/path/to/clamscan
This screenshot shows an error message from OpenKM because the submitted document is infected by a virus:
To install ClamAV on Debian / Ubuntu distribution:
$ sudo aptitude install clamav
To install ClamAV in Centos 5.2 you need more work. First create a file named /etc/yum.repos.d/dag.repo with this content:
[dag]
name=Dag RPM Repository for Red Hat Enterprise Linux
baseurl=http://apt.sw.be/redhat/el$releasever/en/$basearch/dag/
gpgcheck=1
gpgkey=http://dag.wieers.com/packages/RPM-GPG-KEY.dag.txt
enabled=1
Now install the program as root:
$ yum install clamd.i386
Start the daemon:
$ /etc/init.d/clamd start
And update the virus database:
$ freshclam