Class LocationChooserRule

java.lang.Object
org.opensextant.extractors.geo.rules.GeocodeRule
org.opensextant.extractors.geo.rules.LocationChooserRule
All Implemented Interfaces:
org.opensextant.data.MatchSchema

public class LocationChooserRule extends GeocodeRule implements org.opensextant.data.MatchSchema
A final geocoding pass or two. Loop through candidates and choose the location that best fits the context. As needed cache chosen entries to optimize, e.g. co-referrenced places aformentioned in document.

Ideally, consider choosing a best place for the particular instance of a name, but percolate that to the other mentions of that same name. Is it the same place? No need to disambiguate it multiple times at this point.

Author:
ubaldino
  • Field Details

    • ADMIN_CONTAINS_PLACE_WT

      protected static final double ADMIN_CONTAINS_PLACE_WT
      See Also:
    • COUNTRY_CONTAINS_PLACE_WT

      protected static final double COUNTRY_CONTAINS_PLACE_WT
      See Also:
    • PREF_COUNTRY

      public static final String PREF_COUNTRY
      Preferred Country or Location -- when user supplies the context that may be missing.... We accept that and weight such preference higher.
      See Also:
    • PREF_LOCATION

      public static final String PREF_LOCATION
      See Also:
    • COUNTRY_CONTAINS

      public static final String COUNTRY_CONTAINS
      See Also:
    • ADMIN_CONTAINS

      public static final String ADMIN_CONTAINS
      See Also:
    • MATCHCONF_BARE_ACRONYM

      public static final int MATCHCONF_BARE_ACRONYM
      See Also:
    • MATCHCONF_MINIMUM

      public static final int MATCHCONF_MINIMUM
      The bare minimum confidence -- if rules negate confidence points, confidence may go below 20.
      See Also:
    • MATCHCONF_MANY_LOC

      public static final int MATCHCONF_MANY_LOC
      Absolute Confidence: Many Locations matched a single name. No country is in scope; No country mentioned in document, so this is very low confidence.
      See Also:
    • MATCHCONF_MANY_COUNTRIES

      public static final int MATCHCONF_MANY_COUNTRIES
      Absolute Confidence: Many locations matched, with multiple countries in scope So, Many countries mentioned in document
      See Also:
    • MATCHCONF_MANY_COUNTRY

      public static final int MATCHCONF_MANY_COUNTRY
      Absolute Confidence: Many locations matched, but one country in scope. So, 1 country mentioned in document
      See Also:
    • MATCHCONF_NAME_REGION

      public static final int MATCHCONF_NAME_REGION
      Absolute Confidence: Name, Region; City, State; Capital, Country; etc. Patterns of qualified places.
      See Also:
    • MATCHCONF_ONE_LOC

      public static final int MATCHCONF_ONE_LOC
      Absolute Confidence: Unique name in gazetteer. Confidence is high, however this needs to be tempered by the number of gazetteers, coverage, and diversity
      See Also:
    • MATCHCONF_GEODETIC

      public static final int MATCHCONF_GEODETIC
      Absolute Confidence: Geographic location of a named place lines up with a coordinate in-scope
      See Also:
    • MATCHCONF_QUALIFIER_MAJOR_PLACE

      public static final int MATCHCONF_QUALIFIER_MAJOR_PLACE
      Confidence Qualifier: The chosen place happens to be a major place, e.g., large city.
      See Also:
    • MATCHCONF_QUALIFIER_COUNTRY_MENTIONED

      public static final int MATCHCONF_QUALIFIER_COUNTRY_MENTIONED
      Confidence Qualifier: The chosen place happens to be in a country mentioned in the document
      See Also:
    • MATCHCONF_QUALIFIER_AMBIGUOUS_NAME

      public static final int MATCHCONF_QUALIFIER_AMBIGUOUS_NAME
      Confidence Qualifier: Ambiguous
      See Also:
    • MATCHCONF_QUALIFIER_UNIQUE_COUNTRY

      public static final int MATCHCONF_QUALIFIER_UNIQUE_COUNTRY
      Confidence Qualifier: Name appears in only one country.
      See Also:
    • MATCHCONF_QUALIFIER_HIGH_SCORE

      public static final int MATCHCONF_QUALIFIER_HIGH_SCORE
      Confidence Qualifier: The chosen place scored high compared to the runner up
      See Also:
    • MATCHCONF_QUALIFIER_LOWERCASE

      public static final int MATCHCONF_QUALIFIER_LOWERCASE
      Confidence Qualifier: Start here if you have a lower case term that may be a place. -10 points or more for lower case matches, however feat_class P and A win back 5 points; others are less likely places.
      See Also:
    • MATCHCONF_PREFERRED

      public static final int MATCHCONF_PREFERRED
      A subtle boost for locations that were preferred -- especially helps when there is no inherent context and we must rely on the caller's intuition.
      See Also:
  • Constructor Details

    • LocationChooserRule

      public LocationChooserRule()
  • Method Details

    • reset

      public void reset()
      Description copied from class: GeocodeRule
      no-op, unless overriden.
      Overrides:
      reset in class GeocodeRule
    • evaluate

      public void evaluate(List<PlaceCandidate> names)
      Overrides:
      evaluate in class GeocodeRule
      Parameters:
      names - list of found place names
    • evaluate

      public void evaluate(List<PlaceCandidate> names, org.opensextant.processing.Parameters preferences)
      Walk the entire list.
    • inferCountry

      public void inferCountry(String cc)
    • getInferredCountryCount

      public int getInferredCountryCount(String cc)
      How likely is it that this country is substantially relevant to the document based on hard geography. not trivial mentions of country names/codes. "Inferred" countries is more telling than short codes, for example
      Parameters:
      cc -
      Returns:
    • evaluate

      public void evaluate(PlaceCandidate name, org.opensextant.data.Place geo)
      Yet unchosen location. Consider given evidence first, creating some weight there, then introducing innate properties of possible locations, thereby amplifying the differences in the candidates.
      Specified by:
      evaluate in class GeocodeRule
      Parameters:
      name - matched name in text
      geo - gazetteer entry or location
    • assessConfidence

      public void assessConfidence(PlaceCandidate pc)
      Confidence of your final chosen location for a given name is assembled as the sum of some absolute metric plus some additional qualifiers. The absolute provides some context at the document level, whereas the qualifiers are refinements.
        conf = A + Q1 + Q2...  // this may change.
       
      Parameters:
      pc -