wiki:DisambiguationSteps

Version 10 (modified by rob.hooft@…, 11 years ago) (diff)

--

Overall process

After the results are retrieved from the indexing engine, a special procedure takes place to assure that results are accurate in the given context. For example, it might happen that the same term is assigned to different concepts (e.g. H1C1 may refer to some gene, but may be a alternative name of some disease). The process of resolving such conflicts is called disambiguation.

In the current Peregrine architecture the disambiguation process is divided into two stages:

  1. Indexing results are fed to the disambiguator. Disambiguator implementation can call some helpers, or chain the request through other disambiguators and merge the results.
  2. Disambiguation results are fed to a disambiguation decision maker, that applies some logic to filter out indexing results based on disambiguation results.

The above procedure is illustrated in the following diagram:

Disambiguator

Two disambiguator implementations are available. Which disambiguator to use is defined in the DI field of the ontology.

Loose disambiguator

This disambiguator was formerly called UMLSDisambiguator. The algorithm it follows is the following:

  • If a concept has synonyms then the assigned weight depends on the minimal distance for the closest synonym ([0.75 .. 0.8])
  • If a term is a preferred term for a concept, then weight [0.7] (SURE_WEIGHT) is assigned.
  • If a concept has no homonyms, then weight [0.65] (PRETTY_SURE_WEIGHT) is assigned.
  • Otherwise the weight is assigned value [0.5] (UNCERTAIN_WEIGHT).

Strict disambiguator

This disambiguator was formerly called GeneDisambiguator. The algorithm it follows is the following:

  • If the concept under consideration has no homonyms<ref name="homonym"/> or the term is a preferred term and if the term is complex, then weight [0.9] (POSITIVE_WEIGHT) is assigned.
  • If concept has synonyms then the assigned weight depends on the minimal distance for the closest synonym ([0.75 .. 0.8])
  • If concept has keywords the assigned weight depends on the minimal distance for the closest keyword ([0.70 .. 0.75])
  • Otherwise the weight is assigned value [0.1] (NEGATIVE_WEIGHT).

Disambiguation decision maker

Currently there is only a trivial disambiguation decision maker implementation: it removes the indexing result if the corresponding disambiguation result has weight less than [0.5].

See also

Attachments (3)

Download all attachments as: .zip