wiki:Peregrine SKOS CLI

Downoad

The Peregrine SKOS CLI can be downloaded from: downloads.

Running Peregrine SKOS CLI

  • Download the file from production.properties
  • Change the normalizer.lvg.properties and normalizer.lvg.binaryCache properties to point to the LVG installation path.

  • Run Peregrine SKOS CLI:
    java -Dperegrine.config.location=/path/to/production.properties -jar peregrine-skos-cli.jar thesaurus.ttl abstract_list.txt output.ttl
    

The '/path/to/production.properties' should point to the production.properties file. 'thesaurus.ttl' is the SKOS thesaurus to use.

The 'abstract_list.txt' is a file that consists of document locations on each line.

The 'output.ttl' is a file that consists of a Turtle (actually ntriples) file indicates for each concept URI (Subject) the document URI (Object) in which it occurs.

Three example files have been attached below.

SKOS Format

The SKOS Format of the ontology file is a direct translation of the legacy ErasmusMC ontology file format.

Peregrine matches terms case-sensitive by default. There are a few options for term matching that are described here. In short:

  • default (pref|altLabel): the term is converted to lower case.
  • normalised (NO, pref|altLabel_NO): the term should be passed via normalizer.
  • case insensitive (CI, pref|altLabel_CI): if only the first letter of a word is a capital, the word is reduced to lowercase, else the original string is returned (e.g. compare common name like "Atlantic" vs chemical "C1H5O2").
  • both normalized and case insensitive (NO,CI, pref|altLabel_NO_CI).

These options can be indicated per term using two subproperties pref- and altlabels we defined. This can be used to distinguish between chemical formulas, identifiers, abbreviations, and normal words. Peregrine doesn't discriminate between pref- and altLabels, the difference between these is only for human readers. According to the SKOS definition, each concept should have one prefLabel per language, the rest is altLabel.

Last modified 6 years ago Last modified on Mar 6, 2014, 8:40:52 PM

Attachments (3)

Download all attachments as: .zip