Class LocationChooserRule
java.lang.Object
org.opensextant.extractors.geo.rules.GeocodeRule
org.opensextant.extractors.geo.rules.LocationChooserRule
- All Implemented Interfaces:
org.opensextant.data.MatchSchema
A final geocoding pass or two. Loop through candidates and choose the location that best fits the context.
As needed cache chosen entries to optimize, e.g. co-referrenced places aformentioned in document.
Ideally, consider choosing a best place for the particular instance of a name, but percolate that to the other mentions of that same name. Is it the same place? No need to disambiguate it multiple times at this point.
- Author:
- ubaldino
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
protected static final double
static final String
protected static final double
static final int
static final int
Absolute Confidence: Geographic location of a named place lines up with a coordinate in-scopestatic final int
Absolute Confidence: Many locations matched, with multiple countries in scope So, Many countries mentioned in documentstatic final int
Absolute Confidence: Many locations matched, but one country in scope.static final int
Absolute Confidence: Many Locations matched a single name.static final int
The bare minimum confidence -- if rules negate confidence points, confidence may go below 20.static final int
Absolute Confidence: Name, Region; City, State; Capital, Country; etc.static final int
Absolute Confidence: Unique name in gazetteer.static final int
A subtle boost for locations that were preferred -- especially helps when there is no inherent context and we must rely on the caller's intuition.static final int
Confidence Qualifier: Ambiguousstatic final int
Confidence Qualifier: The chosen place happens to be in a country mentioned in the documentstatic final int
Confidence Qualifier: The chosen place scored high compared to the runner upstatic final int
Confidence Qualifier: Start here if you have a lower case term that may be a place.static final int
Confidence Qualifier: The chosen place happens to be a major place, e.g., large city.static final int
Confidence Qualifier: Name appears in only one country.static final String
Preferred Country or Location -- when user supplies the context that may be missing....static final String
Fields inherited from class org.opensextant.extractors.geo.rules.GeocodeRule
AVG_WORD_LEN, boundaryObserver, coordObserver, countryObserver, defaultMethod, LEX1, LEX2, locationOnly, log, LOWERCASE, NAME, textCase, UPPERCASE, weight
Fields inherited from interface org.opensextant.data.MatchSchema
VAL_COORD, VAL_COUNTRY, VAL_PLACE, VAL_POSTAL, VAL_TAXON
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
Confidence of your final chosen location for a given name is assembled as the sum of some absolute metric plus some additional qualifiers.void
evaluate
(List<PlaceCandidate> names) void
evaluate
(List<PlaceCandidate> names, org.opensextant.processing.Parameters preferences) Walk the entire list.void
evaluate
(PlaceCandidate name, org.opensextant.data.Place geo) Yet unchosen location.int
How likely is it that this country is substantially relevant to the document based on hard geography.void
inferCountry
(String cc) void
reset()
no-op, unless overriden.Methods inherited from class org.opensextant.extractors.geo.rules.GeocodeRule
filterByNameOnly, filterOutByFrequency, internalPlaceID, isRelevant, isShort, logMsg, sameBoundary, sameCountry, sameCountry, sameLexicalName, setBoundaryObserver, setCountryObserver, setDefaultMethod, setGeohash, setLocationObserver, setTextCase, textCase
-
Field Details
-
ADMIN_CONTAINS_PLACE_WT
protected static final double ADMIN_CONTAINS_PLACE_WT- See Also:
-
COUNTRY_CONTAINS_PLACE_WT
protected static final double COUNTRY_CONTAINS_PLACE_WT- See Also:
-
PREF_COUNTRY
Preferred Country or Location -- when user supplies the context that may be missing.... We accept that and weight such preference higher.- See Also:
-
PREF_LOCATION
- See Also:
-
COUNTRY_CONTAINS
- See Also:
-
ADMIN_CONTAINS
- See Also:
-
MATCHCONF_BARE_ACRONYM
public static final int MATCHCONF_BARE_ACRONYM- See Also:
-
MATCHCONF_MINIMUM
public static final int MATCHCONF_MINIMUMThe bare minimum confidence -- if rules negate confidence points, confidence may go below 20.- See Also:
-
MATCHCONF_MANY_LOC
public static final int MATCHCONF_MANY_LOCAbsolute Confidence: Many Locations matched a single name. No country is in scope; No country mentioned in document, so this is very low confidence.- See Also:
-
MATCHCONF_MANY_COUNTRIES
public static final int MATCHCONF_MANY_COUNTRIESAbsolute Confidence: Many locations matched, with multiple countries in scope So, Many countries mentioned in document- See Also:
-
MATCHCONF_MANY_COUNTRY
public static final int MATCHCONF_MANY_COUNTRYAbsolute Confidence: Many locations matched, but one country in scope. So, 1 country mentioned in document- See Also:
-
MATCHCONF_NAME_REGION
public static final int MATCHCONF_NAME_REGIONAbsolute Confidence: Name, Region; City, State; Capital, Country; etc. Patterns of qualified places.- See Also:
-
MATCHCONF_ONE_LOC
public static final int MATCHCONF_ONE_LOCAbsolute Confidence: Unique name in gazetteer. Confidence is high, however this needs to be tempered by the number of gazetteers, coverage, and diversity- See Also:
-
MATCHCONF_GEODETIC
public static final int MATCHCONF_GEODETICAbsolute Confidence: Geographic location of a named place lines up with a coordinate in-scope- See Also:
-
MATCHCONF_QUALIFIER_MAJOR_PLACE
public static final int MATCHCONF_QUALIFIER_MAJOR_PLACEConfidence Qualifier: The chosen place happens to be a major place, e.g., large city.- See Also:
-
MATCHCONF_QUALIFIER_COUNTRY_MENTIONED
public static final int MATCHCONF_QUALIFIER_COUNTRY_MENTIONEDConfidence Qualifier: The chosen place happens to be in a country mentioned in the document- See Also:
-
MATCHCONF_QUALIFIER_AMBIGUOUS_NAME
public static final int MATCHCONF_QUALIFIER_AMBIGUOUS_NAMEConfidence Qualifier: Ambiguous- See Also:
-
MATCHCONF_QUALIFIER_UNIQUE_COUNTRY
public static final int MATCHCONF_QUALIFIER_UNIQUE_COUNTRYConfidence Qualifier: Name appears in only one country.- See Also:
-
MATCHCONF_QUALIFIER_HIGH_SCORE
public static final int MATCHCONF_QUALIFIER_HIGH_SCOREConfidence Qualifier: The chosen place scored high compared to the runner up- See Also:
-
MATCHCONF_QUALIFIER_LOWERCASE
public static final int MATCHCONF_QUALIFIER_LOWERCASEConfidence Qualifier: Start here if you have a lower case term that may be a place. -10 points or more for lower case matches, however feat_class P and A win back 5 points; others are less likely places.- See Also:
-
MATCHCONF_PREFERRED
public static final int MATCHCONF_PREFERREDA subtle boost for locations that were preferred -- especially helps when there is no inherent context and we must rely on the caller's intuition.- See Also:
-
-
Constructor Details
-
LocationChooserRule
public LocationChooserRule()
-
-
Method Details
-
reset
public void reset()Description copied from class:GeocodeRule
no-op, unless overriden.- Overrides:
reset
in classGeocodeRule
-
evaluate
- Overrides:
evaluate
in classGeocodeRule
- Parameters:
names
- list of found place names
-
evaluate
Walk the entire list. -
inferCountry
-
getInferredCountryCount
How likely is it that this country is substantially relevant to the document based on hard geography. not trivial mentions of country names/codes. "Inferred" countries is more telling than short codes, for example- Parameters:
cc
-- Returns:
-
evaluate
Yet unchosen location. Consider given evidence first, creating some weight there, then introducing innate properties of possible locations, thereby amplifying the differences in the candidates.- Specified by:
evaluate
in classGeocodeRule
- Parameters:
name
- matched name in textgeo
- gazetteer entry or location
-
assessConfidence
Confidence of your final chosen location for a given name is assembled as the sum of some absolute metric plus some additional qualifiers. The absolute provides some context at the document level, whereas the qualifiers are refinements.conf = A + Q1 + Q2... // this may change.
- Parameters:
pc
-
-