Difference between revisions of "Automatic key extraction full example"
Line 46: | Line 46: | ||
In my case | In my case | ||
+ | |||
sourceFolder=/home/jllort/softwareFactoryGalileo/thesaurus/vocabulary ( all path are relative to sourceFolder ) | sourceFolder=/home/jllort/softwareFactoryGalileo/thesaurus/vocabulary ( all path are relative to sourceFolder ) | ||
Revision as of 16:27, 20 September 2010
Contents
SVN checkout modules
To creating KEA model must checkout openkm and thesaurus modules:
Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/openkm to refer openkm:
Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/thesaurus to refer thesaurus:
Installing openkm classes into maven repository
Ensure you've intalled openkm into your local maven repository, to ensure it you can execute the command:
mvn clean package install -Dmaven.test.skip=true
Donwloading AGROVOC thesaurus
We'll use agrovoc for testing purposes, you can downloading from http://oaei.ontologymatching.org/2007/environment/ please read terms of use.
Copy into thesaurus/src/test/resources/vocabulary folder the file ag_skos_20070219.rdf
Into vocabulary folder there's testdocs folders are some agrovoc training docs to creating KEA module.
Create runtime configuration
Now we can create runtime configuration, it must be executed the ModelBuilder class with some params
For training KEA module is needed execute ModelBuilder class with that params:
sourceFolder trainingFolder vocabularyFile vocabularyType stopwordFile modelFileName porterStemmerClass stopwordClass language documentEncoding testDocs
In my case
sourceFolder=/home/jllort/softwareFactoryGalileo/thesaurus/vocabulary ( all path are relative to sourceFolder )
trainingFolder=testdocs/en/train
vocabularyFile=ag_skos_20070219.rdf
vocabularyType=skos
stopwordFile=stopwords_en.txt
modelFileName=ag_skos_20070219.model
porterStemmerClass=com.openkm.kea.stemmers.PorterStemmer
stopwordClass=com.openkm.kea.stopwords.StopwordsEnglish
language=en
documentEncoding=UTF-8
testDocs=testdocs/en/test
The params to execute ModelBuilder class are "/home/jllort/softwareFactoryGalileo/thesaurus/vocabulary testdocs/en/train ag_skos_20070219.rdf skos stopwords_en.txt ag_skos_20070219.model com.openkm.kea.stemmers.PorterStemmer com.openkm.kea.stopwords.StopwordsEnglish en UTF-8 testdocs/en/test" and VM argument "-Xmx526M" as you can see in next screenshot
Classpath must be shown as