Class PostalGeocoder
- All Implemented Interfaces:
org.opensextant.data.MatchSchema
,org.opensextant.extraction.Extractor
,BoundaryObserver
,CountryObserver
For example the Postal code "11111" in different countries is two distinct codes, since we assume a postal code is unique within a country, but may occur in more than one country.
Xponents Methodology:
- "Rules" are added to PlaceCandidates to inform caller of basic lexical rules fired - "PlaceEvidence" is NOT used to score Places, because there is very little geographic association across tags - Confidence is assigned to a PlaceCandidate only based on complexity of the match
Returned "TextMatch" tags are marked as filtered_out for SHORT or YEAR codes. Returned "TextMatch" tags may or may not have a location selected.
- Author:
- ubaldino
-
Field Summary
Fields inherited from interface org.opensextant.extraction.Extractor
NO_DOC_ID
Fields inherited from interface org.opensextant.data.MatchSchema
VAL_COORD, VAL_COUNTRY, VAL_PLACE, VAL_POSTAL, VAL_TAXON
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
associateMatches
(List<PlaceCandidate> matches, List<PlaceCandidate> postalMatches) Given geotagging from a prior pass of PlaceGeocoder or other stuff, compare and align those tags with POSTAL tags.void
boundaryLevel1InScope
(String nameNorm, org.opensextant.data.Place p) Given the name (lower case, strip quotes), the location candidate infers an ADMIN boundaryvoid
boundaryLevel2InScope
(String nameNorm, org.opensextant.data.Place p) Given the name (lower case, strip quotes), the location candidate infers an ADMIN boundaryvoid
cleanup()
Very simple resource reporting and cleanup.void
void
void
int
void
countryInScope
(String cc) Use a country code to signal that a country was mentioned.void
countryInScope
(org.opensextant.data.Country C) Use a country object to signal a country was mentioned or is in scopeCalculates totals and ratios for the discovered set of countries.boolean
Have you seen this country before?boolean
countryObserved
(org.opensextant.data.Country C) Have you seen this country before?static List<org.opensextant.extraction.TextMatch>
deriveMatches
(List<PlaceCandidate> postalMatches, org.opensextant.data.TextInput t) For situations of the form:List<org.opensextant.extraction.TextMatch>
List<org.opensextant.extraction.TextMatch>
extract
(org.opensextant.data.TextInput input) Tag, choose location if possible and emit an array of text matches.getName()
static boolean
linkGeography
(PlaceCandidate postal, PlaceCandidate otherMention, String slot, String featPrefix) Calculates totals and ratios for the discovered set of boundaries, inferred or explicit.void
reset()
void
setGeneralMatches
(List<org.opensextant.extraction.TextMatch> arr) OPTIMIZATION: Set the general purpose matches (geo, taxons, etc) from a prior processing step.static boolean
-
Field Details
-
VERSION
- See Also:
-
METHOD_DEFAULT
-
-
Constructor Details
-
PostalGeocoder
public PostalGeocoder()
-
-
Method Details
-
getName
- Specified by:
getName
in interfaceorg.opensextant.extraction.Extractor
-
configure
public void configure() throws org.opensextant.ConfigException- Specified by:
configure
in interfaceorg.opensextant.extraction.Extractor
- Throws:
org.opensextant.ConfigException
-
configure
- Specified by:
configure
in interfaceorg.opensextant.extraction.Extractor
- Throws:
org.opensextant.ConfigException
-
configure
- Specified by:
configure
in interfaceorg.opensextant.extraction.Extractor
- Throws:
org.opensextant.ConfigException
-
setGeneralMatches
OPTIMIZATION: Set the general purpose matches (geo, taxons, etc) from a prior processing step. This helps avoid PostalGeocoder from re-running the same. Only call this if the matches array includes the output of running the PlaceGeocoder.- Parameters:
arr
-
-
extract
public List<org.opensextant.extraction.TextMatch> extract(org.opensextant.data.TextInput input) throws org.opensextant.extraction.ExtractionException Tag, choose location if possible and emit an array of text matches.INPUT: Free text that may have postal addresses.
OUTPUT: TextMatch arrary where each match may be:
- high confidence: Admin code + Postal code that makes sense
- low confidence: Postal code alone
There is nothing in between really, for example:
..... CA 94537 ... # a valid zip code in California next to "CA" postal abbreviation. HIGH confidence ..... 94537 .... # a bare 5-digit number. LOW confidence. ..... SA6 19DN ... # bare alpha-numeric postal code. MED confidence
NOTE: Not multi-thread safe. A single call here has some amount of internal state; A second simultaneous call would disrupt that- Specified by:
extract
in interfaceorg.opensextant.extraction.Extractor
- Parameters:
input
- TextInput- Returns:
- array of TextMatch
- Throws:
org.opensextant.extraction.ExtractionException
- if extraction fails (Solr or Lucene errors) or rules mechanics.
-
extract
public List<org.opensextant.extraction.TextMatch> extract(String input) throws org.opensextant.extraction.ExtractionException - Specified by:
extract
in interfaceorg.opensextant.extraction.Extractor
- Throws:
org.opensextant.extraction.ExtractionException
-
cleanup
public void cleanup()Very simple resource reporting and cleanup.- Specified by:
cleanup
in interfaceorg.opensextant.extraction.Extractor
-
reset
public void reset() -
boundaryLevel1InScope
Description copied from interface:BoundaryObserver
Given the name (lower case, strip quotes), the location candidate infers an ADMIN boundary- Specified by:
boundaryLevel1InScope
in interfaceBoundaryObserver
-
boundaryLevel2InScope
Description copied from interface:BoundaryObserver
Given the name (lower case, strip quotes), the location candidate infers an ADMIN boundary- Specified by:
boundaryLevel2InScope
in interfaceBoundaryObserver
-
placeMentionCount
Description copied from interface:BoundaryObserver
Calculates totals and ratios for the discovered set of boundaries, inferred or explicit.- Specified by:
placeMentionCount
in interfaceBoundaryObserver
- Returns:
- counts for boundary places mentioned or inferred
-
countryInScope
Description copied from interface:CountryObserver
Use a country code to signal that a country was mentioned.- Specified by:
countryInScope
in interfaceCountryObserver
- Parameters:
cc
- country code
-
countryInScope
public void countryInScope(org.opensextant.data.Country C) Description copied from interface:CountryObserver
Use a country object to signal a country was mentioned or is in scope- Specified by:
countryInScope
in interfaceCountryObserver
- Parameters:
C
- country object
-
countryObserved
Description copied from interface:CountryObserver
Have you seen this country before?- Specified by:
countryObserved
in interfaceCountryObserver
- Parameters:
cc
- country code- Returns:
- true if observer saw country
-
countryObserved
public boolean countryObserved(org.opensextant.data.Country C) Description copied from interface:CountryObserver
Have you seen this country before?- Specified by:
countryObserved
in interfaceCountryObserver
- Parameters:
C
- country object- Returns:
- true if observer saw country
-
countryCount
public int countryCount()- Specified by:
countryCount
in interfaceCountryObserver
-
countryMentionCount
Description copied from interface:CountryObserver
Calculates totals and ratios for the discovered set of countries.- Specified by:
countryMentionCount
in interfaceCountryObserver
- Returns:
- map of country code : counts
-
associateMatches
public static void associateMatches(List<PlaceCandidate> matches, List<PlaceCandidate> postalMatches) Given geotagging from a prior pass of PlaceGeocoder or other stuff, compare and align those tags with POSTAL tags. -
linkGeography
public static boolean linkGeography(PlaceCandidate postal, PlaceCandidate otherMention, String slot, String featPrefix) -
deriveMatches
public static List<org.opensextant.extraction.TextMatch> deriveMatches(List<PlaceCandidate> postalMatches, org.opensextant.data.TextInput t) For situations of the form:CITY PROV POSTAL CITY PROV POSTAL COUNTRY PROV POSTAL COUNTRY etc. where PROV is either name or ADM1 postal code; And POSTAL appears in any order in tuple.
Do the following: (a) generate new span (PlaceCandidate) match (b) set the chosen location to be City or Province whichever is finest resolution. (c) insert new match into original arrayreturn super set of all matches. This makes use of the linkedGeography.
- Parameters:
postalMatches
-t
-- Returns:
- all postal matches, now with derived ones added.
-
unqualifiedPostalLocation
-