Class LocationChooserRule
java.lang.Object
org.opensextant.extractors.geo.rules.GeocodeRule
org.opensextant.extractors.geo.rules.LocationChooserRule
- All Implemented Interfaces:
org.opensextant.data.MatchSchema
A final geocoding pass or two. Loop through candidates and choose the location that best fits the context.
As needed cache chosen entries to optimize, e.g. co-referrenced places aformentioned in document.
Ideally, consider choosing a best place for the particular instance of a name, but percolate that to the other mentions of that same name. Is it the same place? No need to disambiguate it multiple times at this point.
- Author:
- ubaldino
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringprotected static final doublestatic final Stringprotected static final doublestatic final intstatic final intAbsolute Confidence: Geographic location of a named place lines up with a coordinate in-scopestatic final intAbsolute Confidence: Many locations matched, with multiple countries in scope So, Many countries mentioned in documentstatic final intAbsolute Confidence: Many locations matched, but one country in scope.static final intAbsolute Confidence: Many Locations matched a single name.static final intThe bare minimum confidence -- if rules negate confidence points, confidence may go below 20.static final intAbsolute Confidence: Name, Region; City, State; Capital, Country; etc.static final intAbsolute Confidence: Unique name in gazetteer.static final intA subtle boost for locations that were preferred -- especially helps when there is no inherent context and we must rely on the caller's intuition.static final intConfidence Qualifier: Ambiguousstatic final intConfidence Qualifier: The chosen place happens to be in a country mentioned in the documentstatic final intConfidence Qualifier: The chosen place scored high compared to the runner upstatic final intConfidence Qualifier: Start here if you have a lower case term that may be a place.static final intConfidence Qualifier: The chosen place happens to be a major place, e.g., large city.static final intConfidence Qualifier: Name appears in only one country.static final StringPreferred Country or Location -- when user supplies the context that may be missing....static final StringFields inherited from class org.opensextant.extractors.geo.rules.GeocodeRule
AVG_WORD_LEN, boundaryObserver, coordObserver, countryObserver, defaultMethod, LEX1, LEX2, locationOnly, log, LOWERCASE, NAME, textCase, UPPERCASE, weightFields inherited from interface org.opensextant.data.MatchSchema
VAL_COORD, VAL_COUNTRY, VAL_PLACE, VAL_POSTAL, VAL_TAXON -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidConfidence of your final chosen location for a given name is assembled as the sum of some absolute metric plus some additional qualifiers.voidevaluate(List<PlaceCandidate> names) voidevaluate(List<PlaceCandidate> names, org.opensextant.processing.Parameters preferences) Walk the entire list.voidevaluate(PlaceCandidate name, org.opensextant.data.Place geo) Yet unchosen location.intHow likely is it that this country is substantially relevant to the document based on hard geography.voidinferCountry(String cc) voidreset()no-op, unless overriden.Methods inherited from class org.opensextant.extractors.geo.rules.GeocodeRule
filterByNameOnly, filterOutByFrequency, internalPlaceID, isRelevant, isShort, logMsg, sameBoundary, sameCountry, sameCountry, sameLexicalName, setBoundaryObserver, setCountryObserver, setDefaultMethod, setGeohash, setLocationObserver, setTextCase, textCase
-
Field Details
-
ADMIN_CONTAINS_PLACE_WT
protected static final double ADMIN_CONTAINS_PLACE_WT- See Also:
-
COUNTRY_CONTAINS_PLACE_WT
protected static final double COUNTRY_CONTAINS_PLACE_WT- See Also:
-
PREF_COUNTRY
Preferred Country or Location -- when user supplies the context that may be missing.... We accept that and weight such preference higher.- See Also:
-
PREF_LOCATION
- See Also:
-
COUNTRY_CONTAINS
- See Also:
-
ADMIN_CONTAINS
- See Also:
-
MATCHCONF_BARE_ACRONYM
public static final int MATCHCONF_BARE_ACRONYM- See Also:
-
MATCHCONF_MINIMUM
public static final int MATCHCONF_MINIMUMThe bare minimum confidence -- if rules negate confidence points, confidence may go below 20.- See Also:
-
MATCHCONF_MANY_LOC
public static final int MATCHCONF_MANY_LOCAbsolute Confidence: Many Locations matched a single name. No country is in scope; No country mentioned in document, so this is very low confidence.- See Also:
-
MATCHCONF_MANY_COUNTRIES
public static final int MATCHCONF_MANY_COUNTRIESAbsolute Confidence: Many locations matched, with multiple countries in scope So, Many countries mentioned in document- See Also:
-
MATCHCONF_MANY_COUNTRY
public static final int MATCHCONF_MANY_COUNTRYAbsolute Confidence: Many locations matched, but one country in scope. So, 1 country mentioned in document- See Also:
-
MATCHCONF_NAME_REGION
public static final int MATCHCONF_NAME_REGIONAbsolute Confidence: Name, Region; City, State; Capital, Country; etc. Patterns of qualified places.- See Also:
-
MATCHCONF_ONE_LOC
public static final int MATCHCONF_ONE_LOCAbsolute Confidence: Unique name in gazetteer. Confidence is high, however this needs to be tempered by the number of gazetteers, coverage, and diversity- See Also:
-
MATCHCONF_GEODETIC
public static final int MATCHCONF_GEODETICAbsolute Confidence: Geographic location of a named place lines up with a coordinate in-scope- See Also:
-
MATCHCONF_QUALIFIER_MAJOR_PLACE
public static final int MATCHCONF_QUALIFIER_MAJOR_PLACEConfidence Qualifier: The chosen place happens to be a major place, e.g., large city.- See Also:
-
MATCHCONF_QUALIFIER_COUNTRY_MENTIONED
public static final int MATCHCONF_QUALIFIER_COUNTRY_MENTIONEDConfidence Qualifier: The chosen place happens to be in a country mentioned in the document- See Also:
-
MATCHCONF_QUALIFIER_AMBIGUOUS_NAME
public static final int MATCHCONF_QUALIFIER_AMBIGUOUS_NAMEConfidence Qualifier: Ambiguous- See Also:
-
MATCHCONF_QUALIFIER_UNIQUE_COUNTRY
public static final int MATCHCONF_QUALIFIER_UNIQUE_COUNTRYConfidence Qualifier: Name appears in only one country.- See Also:
-
MATCHCONF_QUALIFIER_HIGH_SCORE
public static final int MATCHCONF_QUALIFIER_HIGH_SCOREConfidence Qualifier: The chosen place scored high compared to the runner up- See Also:
-
MATCHCONF_QUALIFIER_LOWERCASE
public static final int MATCHCONF_QUALIFIER_LOWERCASEConfidence Qualifier: Start here if you have a lower case term that may be a place. -10 points or more for lower case matches, however feat_class P and A win back 5 points; others are less likely places.- See Also:
-
MATCHCONF_PREFERRED
public static final int MATCHCONF_PREFERREDA subtle boost for locations that were preferred -- especially helps when there is no inherent context and we must rely on the caller's intuition.- See Also:
-
-
Constructor Details
-
LocationChooserRule
public LocationChooserRule()
-
-
Method Details
-
reset
public void reset()Description copied from class:GeocodeRuleno-op, unless overriden.- Overrides:
resetin classGeocodeRule
-
evaluate
- Overrides:
evaluatein classGeocodeRule- Parameters:
names- list of found place names
-
evaluate
Walk the entire list. -
inferCountry
-
getInferredCountryCount
How likely is it that this country is substantially relevant to the document based on hard geography. not trivial mentions of country names/codes. "Inferred" countries is more telling than short codes, for example- Parameters:
cc-- Returns:
-
evaluate
Yet unchosen location. Consider given evidence first, creating some weight there, then introducing innate properties of possible locations, thereby amplifying the differences in the candidates.- Specified by:
evaluatein classGeocodeRule- Parameters:
name- matched name in textgeo- gazetteer entry or location
-
assessConfidence
Confidence of your final chosen location for a given name is assembled as the sum of some absolute metric plus some additional qualifiers. The absolute provides some context at the document level, whereas the qualifiers are refinements.conf = A + Q1 + Q2... // this may change.
- Parameters:
pc-
-