wiki:Keyword

Keyword

A keyword is a token that is rarely used across all concepts. A good keyword:

  1. should be part of a multi-tokened term of the to-be-disambiguated concept.
    • set in IndexedOntology.java: if (normalizedToken == term.getText() || !isComplex.isAComplexKeyword(normalizedToken)) {IsNotAKeywordToken = true;}
  2. should be complex (i.e. longer than 5 chars, or with at least 1 number and 1 letter)
    • set in IsComplexRule.java: isAComplexKeyword().
  3. should appear fewer than e.g. 100 times in the ontology
    • set in PeregrineImpl.java as DEFAULT_KEYWORD_THRESHOLD
  4. When concepts are homonyms (ie. they share at least one term), all tokens belong to that homonym term should not be used as the keywords for these two concepts. (However, these tokens might be used as keywords for other concepts that do not have this homonym term as one of their concept terms).
    • implement a map (isPartOfHomonyms@IndexedOntology.java) and a flag (TokensShouldBeSkippedAsKeywordForThisConcept) to filter out keywords that are part of a homonym.
Last modified 11 years ago Last modified on Sep 2, 2011, 1:42:37 PM