Class LocationChooserRule
java.lang.Object
org.opensextant.extractors.geo.rules.GeocodeRule
org.opensextant.extractors.geo.rules.LocationChooserRule
- All Implemented Interfaces:
- org.opensextant.data.MatchSchema
A final geocoding pass or two. Loop through candidates and choose the location that best fits the context.
 As needed cache chosen entries to optimize, e.g. co-referrenced places aformentioned in document.
 
Ideally, consider choosing a best place for the particular instance of a name, but percolate that to the other mentions of that same name. Is it the same place? No need to disambiguate it multiple times at this point.
- Author:
- ubaldino
- 
Field SummaryFieldsModifier and TypeFieldDescriptionstatic final Stringprotected static final doublestatic final Stringprotected static final doublestatic final intstatic final intAbsolute Confidence: Geographic location of a named place lines up with a coordinate in-scopestatic final intAbsolute Confidence: Many locations matched, with multiple countries in scope So, Many countries mentioned in documentstatic final intAbsolute Confidence: Many locations matched, but one country in scope.static final intAbsolute Confidence: Many Locations matched a single name.static final intThe bare minimum confidence -- if rules negate confidence points, confidence may go below 20.static final intAbsolute Confidence: Name, Region; City, State; Capital, Country; etc.static final intAbsolute Confidence: Unique name in gazetteer.static final intA subtle boost for locations that were preferred -- especially helps when there is no inherent context and we must rely on the caller's intuition.static final intConfidence Qualifier: Ambiguousstatic final intConfidence Qualifier: The chosen place happens to be in a country mentioned in the documentstatic final intConfidence Qualifier: The chosen place scored high compared to the runner upstatic final intConfidence Qualifier: Start here if you have a lower case term that may be a place.static final intConfidence Qualifier: The chosen place happens to be a major place, e.g., large city.static final intConfidence Qualifier: Name appears in only one country.static final StringPreferred Country or Location -- when user supplies the context that may be missing....static final StringFields inherited from class org.opensextant.extractors.geo.rules.GeocodeRuleAVG_WORD_LEN, boundaryObserver, coordObserver, countryObserver, defaultMethod, LEX1, LEX2, locationOnly, log, LOWERCASE, NAME, textCase, UPPERCASE, weightFields inherited from interface org.opensextant.data.MatchSchemaVAL_COORD, VAL_COUNTRY, VAL_PLACE, VAL_POSTAL, VAL_TAXON
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionvoidConfidence of your final chosen location for a given name is assembled as the sum of some absolute metric plus some additional qualifiers.voidevaluate(List<PlaceCandidate> names) voidevaluate(List<PlaceCandidate> names, org.opensextant.processing.Parameters preferences) Walk the entire list.voidevaluate(PlaceCandidate name, org.opensextant.data.Place geo) Yet unchosen location.intHow likely is it that this country is substantially relevant to the document based on hard geography.voidinferCountry(String cc) voidreset()no-op, unless overriden.Methods inherited from class org.opensextant.extractors.geo.rules.GeocodeRulefilterByNameOnly, filterOutByFrequency, internalPlaceID, isRelevant, isShort, logMsg, sameBoundary, sameCountry, sameCountry, sameLexicalName, setBoundaryObserver, setCountryObserver, setDefaultMethod, setGeohash, setLocationObserver, setTextCase, textCase
- 
Field Details- 
ADMIN_CONTAINS_PLACE_WTprotected static final double ADMIN_CONTAINS_PLACE_WT- See Also:
 
- 
COUNTRY_CONTAINS_PLACE_WTprotected static final double COUNTRY_CONTAINS_PLACE_WT- See Also:
 
- 
PREF_COUNTRYPreferred Country or Location -- when user supplies the context that may be missing.... We accept that and weight such preference higher.- See Also:
 
- 
PREF_LOCATION- See Also:
 
- 
COUNTRY_CONTAINS- See Also:
 
- 
ADMIN_CONTAINS- See Also:
 
- 
MATCHCONF_BARE_ACRONYMpublic static final int MATCHCONF_BARE_ACRONYM- See Also:
 
- 
MATCHCONF_MINIMUMpublic static final int MATCHCONF_MINIMUMThe bare minimum confidence -- if rules negate confidence points, confidence may go below 20.- See Also:
 
- 
MATCHCONF_MANY_LOCpublic static final int MATCHCONF_MANY_LOCAbsolute Confidence: Many Locations matched a single name. No country is in scope; No country mentioned in document, so this is very low confidence.- See Also:
 
- 
MATCHCONF_MANY_COUNTRIESpublic static final int MATCHCONF_MANY_COUNTRIESAbsolute Confidence: Many locations matched, with multiple countries in scope So, Many countries mentioned in document- See Also:
 
- 
MATCHCONF_MANY_COUNTRYpublic static final int MATCHCONF_MANY_COUNTRYAbsolute Confidence: Many locations matched, but one country in scope. So, 1 country mentioned in document- See Also:
 
- 
MATCHCONF_NAME_REGIONpublic static final int MATCHCONF_NAME_REGIONAbsolute Confidence: Name, Region; City, State; Capital, Country; etc. Patterns of qualified places.- See Also:
 
- 
MATCHCONF_ONE_LOCpublic static final int MATCHCONF_ONE_LOCAbsolute Confidence: Unique name in gazetteer. Confidence is high, however this needs to be tempered by the number of gazetteers, coverage, and diversity- See Also:
 
- 
MATCHCONF_GEODETICpublic static final int MATCHCONF_GEODETICAbsolute Confidence: Geographic location of a named place lines up with a coordinate in-scope- See Also:
 
- 
MATCHCONF_QUALIFIER_MAJOR_PLACEpublic static final int MATCHCONF_QUALIFIER_MAJOR_PLACEConfidence Qualifier: The chosen place happens to be a major place, e.g., large city.- See Also:
 
- 
MATCHCONF_QUALIFIER_COUNTRY_MENTIONEDpublic static final int MATCHCONF_QUALIFIER_COUNTRY_MENTIONEDConfidence Qualifier: The chosen place happens to be in a country mentioned in the document- See Also:
 
- 
MATCHCONF_QUALIFIER_AMBIGUOUS_NAMEpublic static final int MATCHCONF_QUALIFIER_AMBIGUOUS_NAMEConfidence Qualifier: Ambiguous- See Also:
 
- 
MATCHCONF_QUALIFIER_UNIQUE_COUNTRYpublic static final int MATCHCONF_QUALIFIER_UNIQUE_COUNTRYConfidence Qualifier: Name appears in only one country.- See Also:
 
- 
MATCHCONF_QUALIFIER_HIGH_SCOREpublic static final int MATCHCONF_QUALIFIER_HIGH_SCOREConfidence Qualifier: The chosen place scored high compared to the runner up- See Also:
 
- 
MATCHCONF_QUALIFIER_LOWERCASEpublic static final int MATCHCONF_QUALIFIER_LOWERCASEConfidence Qualifier: Start here if you have a lower case term that may be a place. -10 points or more for lower case matches, however feat_class P and A win back 5 points; others are less likely places.- See Also:
 
- 
MATCHCONF_PREFERREDpublic static final int MATCHCONF_PREFERREDA subtle boost for locations that were preferred -- especially helps when there is no inherent context and we must rely on the caller's intuition.- See Also:
 
 
- 
- 
Constructor Details- 
LocationChooserRulepublic LocationChooserRule()
 
- 
- 
Method Details- 
resetpublic void reset()Description copied from class:GeocodeRuleno-op, unless overriden.- Overrides:
- resetin class- GeocodeRule
 
- 
evaluate- Overrides:
- evaluatein class- GeocodeRule
- Parameters:
- names- list of found place names
 
- 
evaluateWalk the entire list.
- 
inferCountry
- 
getInferredCountryCountHow likely is it that this country is substantially relevant to the document based on hard geography. not trivial mentions of country names/codes. "Inferred" countries is more telling than short codes, for example- Parameters:
- cc-
- Returns:
 
- 
evaluateYet unchosen location. Consider given evidence first, creating some weight there, then introducing innate properties of possible locations, thereby amplifying the differences in the candidates.- Specified by:
- evaluatein class- GeocodeRule
- Parameters:
- name- matched name in text
- geo- gazetteer entry or location
 
- 
assessConfidenceConfidence of your final chosen location for a given name is assembled as the sum of some absolute metric plus some additional qualifiers. The absolute provides some context at the document level, whereas the qualifiers are refinements.conf = A + Q1 + Q2... // this may change. - Parameters:
- pc-
 
 
-