Class GeocodeRule
java.lang.Object
org.opensextant.extractors.geo.rules.GeocodeRule
- Direct Known Subclasses:
ContextualOrganizationRule
,CoordinateAssociationRule
,CountryRule
,FeatureRule
,HeatMapRule
,LocationChooserRule
,MajorPlaceRule
,NameCodeRule
,NameRule
,NonLatinNameRule
,NonsenseFilter
,PersonNameFilter
,PostalCodeAssociationRule
,PostalCodeFilter
,PostalLocationChooser
,ProvinceAssociationRule
,ProvinceNameSetter
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
protected BoundaryObserver
protected LocationObserver
protected CountryObserver
protected String
static final String
static final String
protected boolean
protected final org.slf4j.Logger
static final int
protected int
static final int
int
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
evaluate
(List<PlaceCandidate> names) abstract void
evaluate
(PlaceCandidate name, org.opensextant.data.Place geo) The one evaluation scheme that all rules must implement.boolean
Override here as needed.protected boolean
filterOutByFrequency
(PlaceCandidate name, org.opensextant.data.Place geo) Certain names appear often around the world...protected String
internalPlaceID
(org.opensextant.data.Place p) Create a location ID useful for tracking distinct named features by location.boolean
Override if rule instance has another view of relevance, e.g.static boolean
isShort
(int matchLen) protected void
void
reset()
no-op, unless overriden.boolean
sameBoundary
(org.opensextant.data.Place p1, org.opensextant.data.Place p2) Quick test to see if two places are contained within the same boundary.boolean
sameCountry
(String cc1, String cc2) To compare places by their country code alone.boolean
sameCountry
(org.opensextant.data.Place p1, org.opensextant.data.Place p2) void
sameLexicalName
(PlaceCandidate name, org.opensextant.data.Place geo) Increment score for lexical matches accoringly: - non-ASCII match: 2.5 pts - ASCII match: 1.5 pts - Case insenstive match: 0.5 pts Simple example, with one char in Name match, one char in Geo Name ø = ø 2.5 o = o 1.5 O = O 1.5 O = o 0.5 Mention ? Gazetteer Entry Boston != Bøstøn, no points.void
void
void
static void
setGeohash
(org.opensextant.data.Place loc) void
void
setTextCase
(org.opensextant.data.TextInput t) int
textCase
(org.opensextant.data.TextInput t) for the purposes of Geocoder Rule reasoning determine the case.
-
Field Details
-
AVG_WORD_LEN
public static final int AVG_WORD_LEN- See Also:
-
UPPERCASE
public static final int UPPERCASE- See Also:
-
LOWERCASE
public static final int LOWERCASE- See Also:
-
LEX1
- See Also:
-
LEX2
- See Also:
-
weight
public int weight -
NAME
-
defaultMethod
-
countryObserver
-
coordObserver
-
boundaryObserver
-
log
protected final org.slf4j.Logger log -
locationOnly
protected boolean locationOnly -
textCase
protected int textCase
-
-
Constructor Details
-
GeocodeRule
public GeocodeRule()
-
-
Method Details
-
logMsg
-
setCountryObserver
-
setLocationObserver
-
setBoundaryObserver
-
setDefaultMethod
-
textCase
public int textCase(org.opensextant.data.TextInput t) for the purposes of Geocoder Rule reasoning determine the case. -
setTextCase
public void setTextCase(org.opensextant.data.TextInput t) -
isRelevant
public boolean isRelevant()Override if rule instance has another view of relevance, e.g. coordinate rule: no coords found, so rule.isRelevant() is FALSE.- Returns:
-
isShort
public static boolean isShort(int matchLen) -
internalPlaceID
Create a location ID useful for tracking distinct named features by location. This is not generalizable. It produces a looser identity such as "the city at location": P/PPL/f57yah5- Parameters:
p
-- Returns:
- feature+location hash
-
sameCountry
public boolean sameCountry(org.opensextant.data.Place p1, org.opensextant.data.Place p2) -
sameCountry
To compare places by their country code alone. Useful for CITY.cc =? COUNTRY.cc mentions.- Parameters:
cc1
- code 1cc2
- code 2- Returns:
-
sameBoundary
public boolean sameBoundary(org.opensextant.data.Place p1, org.opensextant.data.Place p2) Quick test to see if two places are contained within the same boundary.- Parameters:
p1
-p2
-- Returns:
-
setGeohash
public static void setGeohash(org.opensextant.data.Place loc) -
sameLexicalName
Increment score for lexical matches accoringly: - non-ASCII match: 2.5 pts - ASCII match: 1.5 pts - Case insenstive match: 0.5 pts Simple example, with one char in Name match, one char in Geo Name ø = ø 2.5 o = o 1.5 O = O 1.5 O = o 0.5 Mention ? Gazetteer Entry Boston != Bøstøn, no points. Bøstøn == Bøstøn, 2.5 points- Parameters:
name
-geo
-
-
filterByNameOnly
Override here as needed.- Parameters:
name
-- Returns:
-
evaluate
- Parameters:
names
- list of found place names
-
filterOutByFrequency
Certain names appear often around the world... in such cases we can pare back and evaluate only significant places (e.g., cities and states) and avoid say streams and roadways by the same name. If a name, N, occurs in more than 100 to 250 places, then consider only feature classes A and P. The exact distinct count is up for debate. Lower count means we filter out random places sooner for common city/village names.- Parameters:
name
-geo
-- Returns:
-
evaluate
The one evaluation scheme that all rules must implement. Given a single text match and a location, consider if the geo is a good geocoding for the match.- Parameters:
name
- matched name in textgeo
- gazetteer entry or location
-
reset
public void reset()no-op, unless overriden.
-