wiki:DisambiguationSteps

Version 13 (modified by hailiang.mei@…, 11 years ago) (diff)

--

Steps in the Disambiguation

Overall process

After the results are retrieved from the indexing engine, a special procedure takes place to assure that results are accurate in the given context. For example, it might happen that the same term is assigned to different concepts (e.g. H1C1 may refer to some gene, but may be a alternative name of some disease). The process of resolving such conflicts is called disambiguation.

In the current Peregrine architecture the disambiguation process is divided into two stages:

  1. Indexing results are fed to the disambiguator. Disambiguator implementation can call some helpers, or chain the request through other disambiguators and merge the results.
  2. Disambiguation results are fed to a disambiguation decision maker, that applies some logic to filter out indexing results based on disambiguation results.

The above procedure is illustrated in the following diagram:

Disambiguator

Two disambiguator implementations are available. Which disambiguator to use is defined in the DI field of the ontology.

Loose disambiguator

This disambiguator was formerly called UMLSDisambiguator. The algorithm it follows is the following:

  • If a concept has synonyms then the weight [0.9] (POSITIVE_WEIGHT) is assigned.
  • If a term is a preferred term for a concept, then weight [0.7] (SURE_WEIGHT) is assigned.
  • If a concept has no homonyms, then weight [0.65] (PRETTY_SURE_WEIGHT) is assigned.
  • Otherwise the weight is assigned value [0.5] (UNCERTAIN_WEIGHT).

Strict disambiguator

This disambiguator was formerly called GeneDisambiguator. The algorithm it follows is the following:

  • If the concept under consideration has no homonyms<ref name="homonym"/> or the term is a preferred term and if the term is complex, then weight [0.9] (POSITIVE_WEIGHT) is assigned.
  • If concept has synonyms then the assigned weight depends on the minimal distance for the closest synonym ([0.75 .. 0.8])
  • If concept has keywords the assigned weight depends on the minimal distance for the closest keyword ([0.70 .. 0.75])
  • Otherwise the weight is assigned value [0.1] (NEGATIVE_WEIGHT).

Disambiguation decision maker

The rules

  • When the concept weight >= DEFAULT_DISAMBIGUATION_ALWAYS_ACCEPTED_WEIGHT (80), it is kept.
  • When the concept weight < DEFAULT_DISAMBIGUATION_MINIMAL_WEIGHT (50), it is removed.
  • When the weight is a in-between value, then:

See also

Attachments (3)

Download all attachments as: .zip