Class GeocodeRule
java.lang.Object
org.opensextant.extractors.geo.rules.GeocodeRule
- Direct Known Subclasses:
ContextualOrganizationRule,CoordinateAssociationRule,CountryRule,FeatureRule,HeatMapRule,LocationChooserRule,MajorPlaceRule,NameCodeRule,NameRule,NonLatinNameRule,NonsenseFilter,PersonNameFilter,PostalCodeAssociationRule,PostalCodeFilter,PostalLocationChooser,ProvinceAssociationRule,ProvinceNameSetter
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intprotected BoundaryObserverprotected LocationObserverprotected CountryObserverprotected Stringstatic final Stringstatic final Stringprotected booleanprotected final org.slf4j.Loggerstatic final intprotected intstatic final intint -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidevaluate(List<PlaceCandidate> names) abstract voidevaluate(PlaceCandidate name, org.opensextant.data.Place geo) The one evaluation scheme that all rules must implement.booleanOverride here as needed.protected booleanfilterOutByFrequency(PlaceCandidate name, org.opensextant.data.Place geo) Certain names appear often around the world...protected StringinternalPlaceID(org.opensextant.data.Place p) Create a location ID useful for tracking distinct named features by location.booleanOverride if rule instance has another view of relevance, e.g.static booleanisShort(int matchLen) protected voidvoidreset()no-op, unless overriden.booleansameBoundary(org.opensextant.data.Place p1, org.opensextant.data.Place p2) Quick test to see if two places are contained within the same boundary.booleansameCountry(String cc1, String cc2) To compare places by their country code alone.booleansameCountry(org.opensextant.data.Place p1, org.opensextant.data.Place p2) voidsameLexicalName(PlaceCandidate name, org.opensextant.data.Place geo) Increment score for lexical matches accoringly: - non-ASCII match: 2.5 pts - ASCII match: 1.5 pts - Case insenstive match: 0.5 pts Simple example, with one char in Name match, one char in Geo Name ø = ø 2.5 o = o 1.5 O = O 1.5 O = o 0.5 Mention ? Gazetteer Entry Boston != Bøstøn, no points.voidvoidvoidstatic voidsetGeohash(org.opensextant.data.Place loc) voidvoidsetTextCase(org.opensextant.data.TextInput t) inttextCase(org.opensextant.data.TextInput t) for the purposes of Geocoder Rule reasoning determine the case.
-
Field Details
-
AVG_WORD_LEN
public static final int AVG_WORD_LEN- See Also:
-
UPPERCASE
public static final int UPPERCASE- See Also:
-
LOWERCASE
public static final int LOWERCASE- See Also:
-
LEX1
- See Also:
-
LEX2
- See Also:
-
weight
public int weight -
NAME
-
defaultMethod
-
countryObserver
-
coordObserver
-
boundaryObserver
-
log
protected final org.slf4j.Logger log -
locationOnly
protected boolean locationOnly -
textCase
protected int textCase
-
-
Constructor Details
-
GeocodeRule
public GeocodeRule()
-
-
Method Details
-
logMsg
-
setCountryObserver
-
setLocationObserver
-
setBoundaryObserver
-
setDefaultMethod
-
textCase
public int textCase(org.opensextant.data.TextInput t) for the purposes of Geocoder Rule reasoning determine the case. -
setTextCase
public void setTextCase(org.opensextant.data.TextInput t) -
isRelevant
public boolean isRelevant()Override if rule instance has another view of relevance, e.g. coordinate rule: no coords found, so rule.isRelevant() is FALSE.- Returns:
-
isShort
public static boolean isShort(int matchLen) -
internalPlaceID
Create a location ID useful for tracking distinct named features by location. This is not generalizable. It produces a looser identity such as "the city at location": P/PPL/f57yah5- Parameters:
p-- Returns:
- feature+location hash
-
sameCountry
public boolean sameCountry(org.opensextant.data.Place p1, org.opensextant.data.Place p2) -
sameCountry
To compare places by their country code alone. Useful for CITY.cc =? COUNTRY.cc mentions.- Parameters:
cc1- code 1cc2- code 2- Returns:
-
sameBoundary
public boolean sameBoundary(org.opensextant.data.Place p1, org.opensextant.data.Place p2) Quick test to see if two places are contained within the same boundary.- Parameters:
p1-p2-- Returns:
-
setGeohash
public static void setGeohash(org.opensextant.data.Place loc) -
sameLexicalName
Increment score for lexical matches accoringly: - non-ASCII match: 2.5 pts - ASCII match: 1.5 pts - Case insenstive match: 0.5 pts Simple example, with one char in Name match, one char in Geo Name ø = ø 2.5 o = o 1.5 O = O 1.5 O = o 0.5 Mention ? Gazetteer Entry Boston != Bøstøn, no points. Bøstøn == Bøstøn, 2.5 points- Parameters:
name-geo-
-
filterByNameOnly
Override here as needed.- Parameters:
name-- Returns:
-
evaluate
- Parameters:
names- list of found place names
-
filterOutByFrequency
Certain names appear often around the world... in such cases we can pare back and evaluate only significant places (e.g., cities and states) and avoid say streams and roadways by the same name. If a name, N, occurs in more than 100 to 250 places, then consider only feature classes A and P. The exact distinct count is up for debate. Lower count means we filter out random places sooner for common city/village names.- Parameters:
name-geo-- Returns:
-
evaluate
The one evaluation scheme that all rules must implement. Given a single text match and a location, consider if the geo is a good geocoding for the match.- Parameters:
name- matched name in textgeo- gazetteer entry or location
-
reset
public void reset()no-op, unless overriden.
-