Package org.opensextant.extractors.geo
Class PostalTagger
java.lang.Object
org.opensextant.extraction.SolrMatcherSupport
org.opensextant.extractors.geo.GazetteerMatcher
org.opensextant.extractors.geo.PostalTagger
- All Implemented Interfaces:
 Closeable,AutoCloseable,org.opensextant.data.MatchSchema,org.opensextant.extraction.Extractor
public class PostalTagger
extends GazetteerMatcher
implements org.opensextant.data.MatchSchema, org.opensextant.extraction.Extractor
Postal Tagger tags and returns any alphanumeric token or phrase that resembles postal codes
 and abbreviations.  This includes simple filter rules, and nothing attempting geocoding.
- Author:
 - ubaldino
 
- 
Field Summary
FieldsFields inherited from class org.opensextant.extractors.geo.GazetteerMatcher
AR_TAG_FIELD, CJK_TAG_FIELD, DEFAULT_TAG_FIELD, filter, lang2nameFieldFields inherited from class org.opensextant.extraction.SolrMatcherSupport
DEFAULT_TAG_LIMIT, getNamesTime, log, requestHandler, solr, tagNamesTime, totalTimeFields inherited from interface org.opensextant.extraction.Extractor
NO_DOC_IDFields inherited from interface org.opensextant.data.MatchSchema
VAL_COORD, VAL_COUNTRY, VAL_PLACE, VAL_POSTAL, VAL_TAXON - 
Constructor Summary
Constructors - 
Method Summary
Modifier and TypeMethodDescriptionvoidcleanup()Very simple resource reporting and cleanup.voidvoidvoidList<org.opensextant.extraction.TextMatch>List<org.opensextant.extraction.TextMatch>extract(org.opensextant.data.TextInput input) Tag, choose location if possible and emit an array of text matches.Be explicit about the solr core to use for tagging.getName()voidsetMinLen(int l) Override the default MIN_LEN=4 length for a postal code.Methods inherited from class org.opensextant.extractors.geo.GazetteerMatcher
createPlace, createTag, getFiltrationRatio, getGazetteer, getMatcherParameters, initialize, placesAt, reportMemory, searchAdvanced, searchAdvanced, setAllowLowerCase, setAllowLowerCaseAbbreviations, setEnableCaseFilter, setEnableCodeHunter, setMatchFilter, tagText, tagText, tagText, tagText, tagTextMethods inherited from class org.opensextant.extraction.SolrMatcherSupport
close, getRetrievingNamesTime, getTaggingNamesTime, getTotalTime, setTaggerHandler, tagTextCallSolrTagger 
- 
Field Details
- 
VERSION
- See Also:
 
 - 
METHOD_DEFAULT
 
 - 
 - 
Constructor Details
- 
PostalTagger
public PostalTagger() throws org.opensextant.ConfigException- Throws:
 org.opensextant.ConfigException
 
 - 
 - 
Method Details
- 
getName
- Specified by:
 getNamein interfaceorg.opensextant.extraction.Extractor
 - 
getCoreName
Description copied from class:SolrMatcherSupportBe explicit about the solr core to use for tagging.- Overrides:
 getCoreNamein classGazetteerMatcher- Returns:
 - the core name
 
 - 
configure
public void configure()- Specified by:
 configurein interfaceorg.opensextant.extraction.Extractor
 - 
configure
- Specified by:
 configurein interfaceorg.opensextant.extraction.Extractor- Throws:
 org.opensextant.ConfigException
 - 
configure
- Specified by:
 configurein interfaceorg.opensextant.extraction.Extractor- Throws:
 org.opensextant.ConfigException
 - 
extract
public List<org.opensextant.extraction.TextMatch> extract(org.opensextant.data.TextInput input) throws org.opensextant.extraction.ExtractionException Tag, choose location if possible and emit an array of text matches.INPUT: Free text that may have postal addresses.
OUTPUT: TextMatch array of all possible postal codes that pass trivial noise filters.
- Specified by:
 extractin interfaceorg.opensextant.extraction.Extractor- Parameters:
 input- TextInput- Returns:
 - array of TextMatch
 - Throws:
 org.opensextant.extraction.ExtractionException- if extraction fails (Solr or Lucene errors) or rules mechanics.
 - 
extract
public List<org.opensextant.extraction.TextMatch> extract(String input) throws org.opensextant.extraction.ExtractionException - Specified by:
 extractin interfaceorg.opensextant.extraction.Extractor- Throws:
 org.opensextant.extraction.ExtractionException
 - 
cleanup
public void cleanup()Very simple resource reporting and cleanup.- Specified by:
 cleanupin interfaceorg.opensextant.extraction.Extractor
 - 
setMinLen
public void setMinLen(int l) Override the default MIN_LEN=4 length for a postal code. Any textmatch with length < this length will be filtered out. Postal codes in CA, FO, GB, GG, IE, IM, IS, JE, MT all have postal codes that are 2 or 3 alphanum. 
 -