Difference between revisions of "Automatic key extraction"

From OpenKM Documentation
Jump to: navigation, search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Note|Automatic key extraction is only available from OpenKM 5.0 and upper.}}
+
{{Warning|Automatic key extraction was removed from OpenKM Community 6.2.4 and OpenKM Professional 6.2.15 due to obsolete and removed libraries.}}
 +
 
 +
{{TOCright}} __TOC__
  
 
OpenKM uses '''KEA''' for extracting keyphrases from text documents. '''KEA''' it by default can be either used for free indexing or for indexing with a controlled vocabulary, but with OpenKM is '''mandatory having a controled vocabulary'''. OpenKM automatic extractrion keyphrases is based in KEA 5.0.
 
OpenKM uses '''KEA''' for extracting keyphrases from text documents. '''KEA''' it by default can be either used for free indexing or for indexing with a controlled vocabulary, but with OpenKM is '''mandatory having a controled vocabulary'''. OpenKM automatic extractrion keyphrases is based in KEA 5.0.
 
  
 
If order having KEA running in OpenKM must be a well done configured vocabulary (Thesaurus).  
 
If order having KEA running in OpenKM must be a well done configured vocabulary (Thesaurus).  
 
  
 
'''KEA''' is a training module that uses a Thesaurus as the controller vocabulary. In order how to configure OpenKM Thesaurus take a look at [[Thesaurus]] in installation guide.
 
'''KEA''' is a training module that uses a Thesaurus as the controller vocabulary. In order how to configure OpenKM Thesaurus take a look at [[Thesaurus]] in installation guide.
 
  
 
To creating KEA model must checkout openkm and thesaurus modules:
 
To creating KEA model must checkout openkm and thesaurus modules:
Line 15: Line 14:
 
   
 
   
 
Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/thesausus to refer thesaurus:  
 
Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/thesausus to refer thesaurus:  
 
  
 
In KEA web page could downloading file that comes with some example how to creating KEA model. In similar way using class ModelBuilder in thesaurus modules must be created the KEA model based in some vocabulary controller ( Thesaurus ).
 
In KEA web page could downloading file that comes with some example how to creating KEA model. In similar way using class ModelBuilder in thesaurus modules must be created the KEA model based in some vocabulary controller ( Thesaurus ).
 
  
 
For training KEA module is needed execute ModelBuilder class with that params:
 
For training KEA module is needed execute ModelBuilder class with that params:
Line 33: Line 30:
  
  
In order to correctly configure OpenKM thesaurus you must set this OpenKM.cfg entries:  
+
In order to correctly configure OpenKM thesaurus you must set this [[OpenKM.cfg]] entries:  
  
 
  kea.thesaurus.skos.file
 
  kea.thesaurus.skos.file
Line 45: Line 42:
 
== Setting the SKOS file ==  
 
== Setting the SKOS file ==  
 
  kea.thesaurus.skos.file=file.rdf
 
  kea.thesaurus.skos.file=file.rdf
 
  
 
== Setting vocabulary query ==  
 
== Setting vocabulary query ==  
Line 52: Line 48:
 
skos=<http://www.w3.org/2004/02/skos/core#>,rdfs=<http://www.w3.org/2000/01/rdf-schema#>,
 
skos=<http://www.w3.org/2004/02/skos/core#>,rdfs=<http://www.w3.org/2000/01/rdf-schema#>,
 
dc=<http://purl.org/dc/elements/1.1/>, dcterms=<http://purl.org/dc/terms/>, foaf=<http://xmlns.com/foaf/0.1/>
 
dc=<http://purl.org/dc/elements/1.1/>, dcterms=<http://purl.org/dc/terms/>, foaf=<http://xmlns.com/foaf/0.1/>
 
  
 
== Setting model file ==
 
== Setting model file ==
 
  kea.model.file=file.model
 
  kea.model.file=file.model
 
  
 
== Setting stop words ==
 
== Setting stop words ==
 
  kea.stopwords.file=stopwords.txt
 
  kea.stopwords.file=stopwords.txt
 
  
 
== Setting max keywords extraction ==
 
== Setting max keywords extraction ==
 
  kea.automatic.keyword.extraction.number=10
 
  kea.automatic.keyword.extraction.number=10
 
  
 
== Setting dictionary restriction ==
 
== Setting dictionary restriction ==

Latest revision as of 10:02, 16 April 2013


Nota advertencia.png Automatic key extraction was removed from OpenKM Community 6.2.4 and OpenKM Professional 6.2.15 due to obsolete and removed libraries.

OpenKM uses KEA for extracting keyphrases from text documents. KEA it by default can be either used for free indexing or for indexing with a controlled vocabulary, but with OpenKM is mandatory having a controled vocabulary. OpenKM automatic extractrion keyphrases is based in KEA 5.0.

If order having KEA running in OpenKM must be a well done configured vocabulary (Thesaurus).

KEA is a training module that uses a Thesaurus as the controller vocabulary. In order how to configure OpenKM Thesaurus take a look at Thesaurus in installation guide.

To creating KEA model must checkout openkm and thesaurus modules:

Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/openkm to refer openkm:

Select the svn type and type the url https://openkm.svn.sourceforge.net/svnroot/openkm/trunk/thesausus to refer thesaurus:

In KEA web page could downloading file that comes with some example how to creating KEA model. In similar way using class ModelBuilder in thesaurus modules must be created the KEA model based in some vocabulary controller ( Thesaurus ).

For training KEA module is needed execute ModelBuilder class with that params:

sourceFolder 
trainingFolder 
vocabularyFile 
vocabularyType
stopwordFile 
modelFileName 
porterStemmerClass 
stopwordClass 
language 
documentEncoding


In order to correctly configure OpenKM thesaurus you must set this OpenKM.cfg entries:

kea.thesaurus.skos.file
kea.thesaurus.vocabulary.serql
kea.model.file
kea.stopwords.file
kea.automatic.keyword.extraction.number
kea.automatic.keyword.extraction.restriction


Setting the SKOS file

kea.thesaurus.skos.file=file.rdf

Setting vocabulary query

kea.thesaurus.vocabulary.serql=SELECT X,UID FROM {X} skos:prefLabel {UID} WHERE lang(UID) ="en" USING NAMESPACE rdf=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>, skos=<http://www.w3.org/2004/02/skos/core#>,rdfs=<http://www.w3.org/2000/01/rdf-schema#>, dc=<http://purl.org/dc/elements/1.1/>, dcterms=<http://purl.org/dc/terms/>, foaf=<http://xmlns.com/foaf/0.1/>

Setting model file

kea.model.file=file.model

Setting stop words

kea.stopwords.file=stopwords.txt

Setting max keywords extraction

kea.automatic.keyword.extraction.number=10

Setting dictionary restriction

Only dictionary words are allowed

kea.automatic.keyword.extraction.restriction=on


Automatic key extraction full example Padlock.gif


You could be interested in:

  • KEA [1]
  • WEKA - Data mining with Open Source machine learning in Java [2]
  • Aperture framework [3]
  • RDF2GO [4]
  • OpenRDF [5]