Class GeocodeRule

java.lang.Object
org.opensextant.extractors.geo.rules.GeocodeRule
Direct Known Subclasses:
ContextualOrganizationRule, CoordinateAssociationRule, CountryRule, FeatureRule, HeatMapRule, LocationChooserRule, MajorPlaceRule, NameCodeRule, NameRule, NonLatinNameRule, NonsenseFilter, PersonNameFilter, PostalCodeAssociationRule, PostalCodeFilter, PostalLocationChooser, ProvinceAssociationRule, ProvinceNameSetter

public abstract class GeocodeRule extends Object
  • Field Details

  • Constructor Details

    • GeocodeRule

      public GeocodeRule()
  • Method Details

    • logMsg

      protected void logMsg(String msg, String val)
    • setCountryObserver

      public void setCountryObserver(CountryObserver o)
    • setLocationObserver

      public void setLocationObserver(LocationObserver o)
    • setBoundaryObserver

      public void setBoundaryObserver(BoundaryObserver o)
    • setDefaultMethod

      public void setDefaultMethod(String m)
    • textCase

      public int textCase(org.opensextant.data.TextInput t)
      for the purposes of Geocoder Rule reasoning determine the case.
    • setTextCase

      public void setTextCase(org.opensextant.data.TextInput t)
    • isRelevant

      public boolean isRelevant()
      Override if rule instance has another view of relevance, e.g. coordinate rule: no coords found, so rule.isRelevant() is FALSE.
      Returns:
    • isShort

      public static boolean isShort(int matchLen)
    • internalPlaceID

      protected String internalPlaceID(org.opensextant.data.Place p)
      Create a location ID useful for tracking distinct named features by location. This is not generalizable. It produces a looser identity such as "the city at location": P/PPL/f57yah5
      Parameters:
      p -
      Returns:
      feature+location hash
    • sameCountry

      public boolean sameCountry(org.opensextant.data.Place p1, org.opensextant.data.Place p2)
    • sameCountry

      public boolean sameCountry(String cc1, String cc2)
      To compare places by their country code alone. Useful for CITY.cc =? COUNTRY.cc mentions.
      Parameters:
      cc1 - code 1
      cc2 - code 2
      Returns:
    • sameBoundary

      public boolean sameBoundary(org.opensextant.data.Place p1, org.opensextant.data.Place p2)
      Quick test to see if two places are contained within the same boundary.
      Parameters:
      p1 -
      p2 -
      Returns:
    • setGeohash

      public static void setGeohash(org.opensextant.data.Place loc)
    • sameLexicalName

      public void sameLexicalName(PlaceCandidate name, org.opensextant.data.Place geo)
      Increment score for lexical matches accoringly: - non-ASCII match: 2.5 pts - ASCII match: 1.5 pts - Case insenstive match: 0.5 pts Simple example, with one char in Name match, one char in Geo Name ø = ø 2.5 o = o 1.5 O = O 1.5 O = o 0.5 Mention ? Gazetteer Entry Boston != Bøstøn, no points. Bøstøn == Bøstøn, 2.5 points
      Parameters:
      name -
      geo -
    • filterByNameOnly

      public boolean filterByNameOnly(PlaceCandidate name)
      Override here as needed.
      Parameters:
      name -
      Returns:
    • evaluate

      public void evaluate(List<PlaceCandidate> names)
      Parameters:
      names - list of found place names
    • filterOutByFrequency

      protected boolean filterOutByFrequency(PlaceCandidate name, org.opensextant.data.Place geo)
      Certain names appear often around the world... in such cases we can pare back and evaluate only significant places (e.g., cities and states) and avoid say streams and roadways by the same name. If a name, N, occurs in more than 100 to 250 places, then consider only feature classes A and P. The exact distinct count is up for debate. Lower count means we filter out random places sooner for common city/village names.
      Parameters:
      name -
      geo -
      Returns:
    • evaluate

      public abstract void evaluate(PlaceCandidate name, org.opensextant.data.Place geo)
      The one evaluation scheme that all rules must implement. Given a single text match and a location, consider if the geo is a good geocoding for the match.
      Parameters:
      name - matched name in text
      geo - gazetteer entry or location
    • reset

      public void reset()
      no-op, unless overriden.