Package org.opensextant.extractors.geo
Class PostalTagger
java.lang.Object
org.opensextant.extraction.SolrMatcherSupport
org.opensextant.extractors.geo.GazetteerMatcher
org.opensextant.extractors.geo.PostalTagger
- All Implemented Interfaces:
Closeable
,AutoCloseable
,org.opensextant.data.MatchSchema
,org.opensextant.extraction.Extractor
public class PostalTagger
extends GazetteerMatcher
implements org.opensextant.data.MatchSchema, org.opensextant.extraction.Extractor
Postal Tagger tags and returns any alphanumeric token or phrase that resembles postal codes
and abbreviations. This includes simple filter rules, and nothing attempting geocoding.
- Author:
- ubaldino
-
Field Summary
Fields inherited from class org.opensextant.extractors.geo.GazetteerMatcher
AR_TAG_FIELD, CJK_TAG_FIELD, DEFAULT_TAG_FIELD, filter, lang2nameField
Fields inherited from class org.opensextant.extraction.SolrMatcherSupport
DEFAULT_TAG_LIMIT, getNamesTime, log, requestHandler, solr, tagNamesTime, totalTime
Fields inherited from interface org.opensextant.extraction.Extractor
NO_DOC_ID
Fields inherited from interface org.opensextant.data.MatchSchema
VAL_COORD, VAL_COUNTRY, VAL_PLACE, VAL_POSTAL, VAL_TAXON
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
cleanup()
Very simple resource reporting and cleanup.void
void
void
List<org.opensextant.extraction.TextMatch>
List<org.opensextant.extraction.TextMatch>
extract
(org.opensextant.data.TextInput input) Tag, choose location if possible and emit an array of text matches.Be explicit about the solr core to use for tagging.getName()
void
setMinLen
(int l) Override the default MIN_LEN=4 length for a postal code.Methods inherited from class org.opensextant.extractors.geo.GazetteerMatcher
createPlace, createTag, getFiltrationRatio, getGazetteer, getMatcherParameters, initialize, placesAt, reportMemory, searchAdvanced, searchAdvanced, setAllowLowerCase, setAllowLowerCaseAbbreviations, setEnableCaseFilter, setEnableCodeHunter, setMatchFilter, tagText, tagText, tagText, tagText, tagText
Methods inherited from class org.opensextant.extraction.SolrMatcherSupport
close, getRetrievingNamesTime, getTaggingNamesTime, getTotalTime, setTaggerHandler, tagTextCallSolrTagger
-
Field Details
-
VERSION
- See Also:
-
METHOD_DEFAULT
-
-
Constructor Details
-
PostalTagger
public PostalTagger() throws org.opensextant.ConfigException- Throws:
org.opensextant.ConfigException
-
-
Method Details
-
getName
- Specified by:
getName
in interfaceorg.opensextant.extraction.Extractor
-
getCoreName
Description copied from class:SolrMatcherSupport
Be explicit about the solr core to use for tagging.- Overrides:
getCoreName
in classGazetteerMatcher
- Returns:
- the core name
-
configure
public void configure()- Specified by:
configure
in interfaceorg.opensextant.extraction.Extractor
-
configure
- Specified by:
configure
in interfaceorg.opensextant.extraction.Extractor
- Throws:
org.opensextant.ConfigException
-
configure
- Specified by:
configure
in interfaceorg.opensextant.extraction.Extractor
- Throws:
org.opensextant.ConfigException
-
extract
public List<org.opensextant.extraction.TextMatch> extract(org.opensextant.data.TextInput input) throws org.opensextant.extraction.ExtractionException Tag, choose location if possible and emit an array of text matches.INPUT: Free text that may have postal addresses.
OUTPUT: TextMatch array of all possible postal codes that pass trivial noise filters.
- Specified by:
extract
in interfaceorg.opensextant.extraction.Extractor
- Parameters:
input
- TextInput- Returns:
- array of TextMatch
- Throws:
org.opensextant.extraction.ExtractionException
- if extraction fails (Solr or Lucene errors) or rules mechanics.
-
extract
public List<org.opensextant.extraction.TextMatch> extract(String input) throws org.opensextant.extraction.ExtractionException - Specified by:
extract
in interfaceorg.opensextant.extraction.Extractor
- Throws:
org.opensextant.extraction.ExtractionException
-
cleanup
public void cleanup()Very simple resource reporting and cleanup.- Specified by:
cleanup
in interfaceorg.opensextant.extraction.Extractor
-
setMinLen
public void setMinLen(int l) Override the default MIN_LEN=4 length for a postal code. Any textmatch with length < this length will be filtered out. Postal codes in CA, FO, GB, GG, IE, IM, IS, JE, MT all have postal codes that are 2 or 3 alphanum.
-