Difference between revisions of "Automatic key extraction"
(14 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | OpenKM | + | {{Warning|Automatic key extraction was removed from OpenKM Community 6.2.4 and OpenKM Professional 6.2.15 due to obsolete and removed libraries.}} |
+ | {{TOCright}} __TOC__ | ||
+ | |||
+ | OpenKM uses '''KEA''' for extracting keyphrases from text documents. '''KEA''' it by default can be either used for free indexing or for indexing with a controlled vocabulary, but with OpenKM is '''mandatory having a controled vocabulary'''. OpenKM automatic extractrion keyphrases is based in KEA 5.0. | ||
If order having KEA running in OpenKM must be a well done configured vocabulary (Thesaurus). | If order having KEA running in OpenKM must be a well done configured vocabulary (Thesaurus). | ||
+ | |||
+ | '''KEA''' is a training module that uses a Thesaurus as the controller vocabulary. In order how to configure OpenKM Thesaurus take a look at [[Thesaurus]] in installation guide. | ||
+ | |||
+ | To creating KEA model must checkout openkm and thesaurus modules: | ||
+ | |||
+ | Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/openkm to refer openkm: | ||
+ | |||
+ | Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/thesausus to refer thesaurus: | ||
+ | |||
+ | In KEA web page could downloading file that comes with some example how to creating KEA model. In similar way using class ModelBuilder in thesaurus modules must be created the KEA model based in some vocabulary controller ( Thesaurus ). | ||
+ | |||
+ | For training KEA module is needed execute ModelBuilder class with that params: | ||
+ | sourceFolder | ||
+ | trainingFolder | ||
+ | vocabularyFile | ||
+ | vocabularyType | ||
+ | stopwordFile | ||
+ | modelFileName | ||
+ | porterStemmerClass | ||
+ | stopwordClass | ||
+ | language | ||
+ | documentEncoding | ||
− | + | In order to correctly configure OpenKM thesaurus you must set this [[OpenKM.cfg]] entries: | |
+ | |||
+ | kea.thesaurus.skos.file | ||
+ | kea.thesaurus.vocabulary.serql | ||
+ | kea.model.file | ||
+ | kea.stopwords.file | ||
+ | kea.automatic.keyword.extraction.number | ||
+ | kea.automatic.keyword.extraction.restriction | ||
+ | |||
+ | |||
+ | == Setting the SKOS file == | ||
+ | kea.thesaurus.skos.file=file.rdf | ||
+ | |||
+ | == Setting vocabulary query == | ||
+ | kea.thesaurus.vocabulary.serql=SELECT X,UID FROM {X} skos:prefLabel {UID} WHERE lang(UID) ="en" | ||
+ | USING NAMESPACE rdf=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>, | ||
+ | skos=<http://www.w3.org/2004/02/skos/core#>,rdfs=<http://www.w3.org/2000/01/rdf-schema#>, | ||
+ | dc=<http://purl.org/dc/elements/1.1/>, dcterms=<http://purl.org/dc/terms/>, foaf=<http://xmlns.com/foaf/0.1/> | ||
+ | |||
+ | == Setting model file == | ||
+ | kea.model.file=file.model | ||
+ | |||
+ | == Setting stop words == | ||
+ | kea.stopwords.file=stopwords.txt | ||
+ | |||
+ | == Setting max keywords extraction == | ||
+ | kea.automatic.keyword.extraction.number=10 | ||
+ | == Setting dictionary restriction == | ||
+ | Only dictionary words are allowed | ||
+ | kea.automatic.keyword.extraction.restriction=on | ||
+ | [[Automatic key extraction full example]] [[File:Padlock.gif]] | ||
+ | You could be interested in: | ||
+ | * KEA [http://www.nzdl.org/Kea/index.html] | ||
+ | * WEKA - Data mining with Open Source machine learning in Java [http://www.cs.waikato.ac.nz/~ml/weka/] | ||
+ | * Aperture framework [http://aperture.sourceforge.net/] | ||
+ | * RDF2GO [http://mavenrepo.fzi.de/semweb4j.org/site/rdf2go/] | ||
+ | * OpenRDF [http://www.openrdf.org/] | ||
[[Category: Installation Guide]] | [[Category: Installation Guide]] |
Latest revision as of 10:02, 16 April 2013
Automatic key extraction was removed from OpenKM Community 6.2.4 and OpenKM Professional 6.2.15 due to obsolete and removed libraries. |
OpenKM uses KEA for extracting keyphrases from text documents. KEA it by default can be either used for free indexing or for indexing with a controlled vocabulary, but with OpenKM is mandatory having a controled vocabulary. OpenKM automatic extractrion keyphrases is based in KEA 5.0.
If order having KEA running in OpenKM must be a well done configured vocabulary (Thesaurus).
KEA is a training module that uses a Thesaurus as the controller vocabulary. In order how to configure OpenKM Thesaurus take a look at Thesaurus in installation guide.
To creating KEA model must checkout openkm and thesaurus modules:
Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/openkm to refer openkm:
Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/thesausus to refer thesaurus:
In KEA web page could downloading file that comes with some example how to creating KEA model. In similar way using class ModelBuilder in thesaurus modules must be created the KEA model based in some vocabulary controller ( Thesaurus ).
For training KEA module is needed execute ModelBuilder class with that params:
sourceFolder trainingFolder vocabularyFile vocabularyType stopwordFile modelFileName porterStemmerClass stopwordClass language documentEncoding
In order to correctly configure OpenKM thesaurus you must set this OpenKM.cfg entries:
kea.thesaurus.skos.file kea.thesaurus.vocabulary.serql kea.model.file kea.stopwords.file kea.automatic.keyword.extraction.number kea.automatic.keyword.extraction.restriction
Setting the SKOS file
kea.thesaurus.skos.file=file.rdf
Setting vocabulary query
kea.thesaurus.vocabulary.serql=SELECT X,UID FROM {X} skos:prefLabel {UID} WHERE lang(UID) ="en" USING NAMESPACE rdf=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>, skos=<http://www.w3.org/2004/02/skos/core#>,rdfs=<http://www.w3.org/2000/01/rdf-schema#>, dc=<http://purl.org/dc/elements/1.1/>, dcterms=<http://purl.org/dc/terms/>, foaf=<http://xmlns.com/foaf/0.1/>
Setting model file
kea.model.file=file.model
Setting stop words
kea.stopwords.file=stopwords.txt
Setting max keywords extraction
kea.automatic.keyword.extraction.number=10
Setting dictionary restriction
Only dictionary words are allowed
kea.automatic.keyword.extraction.restriction=on
Automatic key extraction full example
You could be interested in: