Class MajorPlaceRule
java.lang.Object
org.opensextant.extractors.geo.rules.GeocodeRule
org.opensextant.extractors.geo.rules.MajorPlaceRule
Major Place rule -- fire this rule after Country rule.
Try to find all countries in scope first, then major places.
If you try to infer country from major places first you get a lot of false
positives.
Country name space is smaller and more reliable.
LOTS of caveats: these rules enforce the notion that country names are
drivers here, and major places amplify.
IF we see a National Capital we can infer a country, provided no countries
have been seen in document
IF we see a major place, add that evidence weighting it higher if the country
of that major place is also
mentioned in document.
- Author:
- ubaldino
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
static final String
static final String
static final String
Fields inherited from class org.opensextant.extractors.geo.rules.GeocodeRule
AVG_WORD_LEN, boundaryObserver, coordObserver, countryObserver, defaultMethod, LEX1, LEX2, locationOnly, log, LOWERCASE, NAME, textCase, UPPERCASE, weight
-
Constructor Summary
ConstructorDescriptionMajorPlaceRule
(Map<String, Integer> populationStats) Major Place assigns a score to places that are national capitals, provinces, or cities with sizable population. -
Method Summary
Modifier and TypeMethodDescriptionvoid
evaluate
(List<PlaceCandidate> names) void
evaluate
(PlaceCandidate name, org.opensextant.data.Place geo) attach either a Capital or Admin region ID, giving it some weight based on various properties or context.static boolean
Determine if this rule was applied to the candidate.void
reset()
no-op, unless overriden.Methods inherited from class org.opensextant.extractors.geo.rules.GeocodeRule
filterByNameOnly, filterOutByFrequency, internalPlaceID, isRelevant, isShort, logMsg, sameBoundary, sameCountry, sameCountry, sameLexicalName, setBoundaryObserver, setCountryObserver, setDefaultMethod, setGeohash, setLocationObserver, setTextCase, textCase
-
Field Details
-
CAPITAL
- See Also:
-
ADMIN
- See Also:
-
POP
- See Also:
-
MENTIONED_COUNTRY
- See Also:
-
-
Constructor Details
-
MajorPlaceRule
Major Place assigns a score to places that are national capitals, provinces, or cities with sizable population. Log(population) adds up to one point to place weight. Population data is indexed by location/grid using geohash. Source:geonames.org Population stats are deterministic -- they do not change during the processing and they are not context specific. So we only assess population per location ONCE not per mention.- Parameters:
populationStats
- optional population stats.
-
-
Method Details
-
reset
public void reset()Description copied from class:GeocodeRule
no-op, unless overriden.- Overrides:
reset
in classGeocodeRule
-
evaluate
- Overrides:
evaluate
in classGeocodeRule
- Parameters:
names
- list of found place names
-
isRuleFor
Determine if this rule was applied to the candidate.- Parameters:
pc
-- Returns:
-
evaluate
attach either a Capital or Admin region ID, giving it some weight based on various properties or context.- Specified by:
evaluate
in classGeocodeRule
- Parameters:
name
- matched name in textgeo
- gazetteer entry or location
-