Class AbstractFlexPat
java.lang.Object
org.opensextant.extractors.flexpat.AbstractFlexPat
- All Implemented Interfaces:
Extractor
- Direct Known Subclasses:
PatternsOfLife,XCoord,XTemporal
FlexPat Extractor -- given a set of pattern families, extract, filter and
normalize matches.
- Author:
- ubaldino
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected booleanprotected org.slf4j.Loggerprotected intCHARS.protected RegexPatternManagerprotected String -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcleanup()Extractor interface: extractors are responsible for cleaning up after themselves.voidConfigures whatever default patterns file is named.voidconfigure(InputStream strm, String name) voidConfigure using a particular pattern file.voidConfigure using a URL pointer to the pattern file.protected abstract RegexPatternManagercreatePatternManager(InputStream s, String name) Create a pattern manager given the input stream and the file name.voidvoidprotected voidset_match_id(TextMatch m, int count) Optional.voidsetMatchWidth(int w) Match Width is the text buffer before and after a TextMatch.
-
Field Details
-
match_width
protected int match_widthCHARS. SHP DBF limit is 255 bytes, so SHP file outputters should assess at that time how/when to curtail match width. The max pre/post text seen useful has typically been about 200-250 characters. -
log
protected org.slf4j.Logger log -
debug
protected boolean debug -
patterns
-
patterns_file
-
-
Constructor Details
-
AbstractFlexPat
public AbstractFlexPat() -
AbstractFlexPat
public AbstractFlexPat(boolean b)
-
-
Method Details
-
createPatternManager
protected abstract RegexPatternManager createPatternManager(InputStream s, String name) throws IOException Create a pattern manager given the input stream and the file name.- Parameters:
s- stream of patterns config filename- app name- Returns:
- the regex pattern manager
- Throws:
IOException- Signals that an I/O exception has occurred.
-
getPatternManager
-
configure
Configures whatever default patterns file is named.- Specified by:
configurein interfaceExtractor- Throws:
ConfigException- config error, pattern file not found
-
configure
Configure using a particular pattern file.- Specified by:
configurein interfaceExtractor- Parameters:
patfile- a pattern file.- Throws:
ConfigException- if pattern file not found
-
configure
Configure using a URL pointer to the pattern file.- Specified by:
configurein interfaceExtractor- Parameters:
patfile- patterns file URL- Throws:
ConfigException- if pattern file not found
-
configure
- Throws:
ConfigException
-
setMatchWidth
public void setMatchWidth(int w) Match Width is the text buffer before and after a TextMatch. Match buffers are used to create a match ID- Parameters:
w- width
-
set_match_id
Optional. Assign an identifier to each Text Match found. This is an MD5 of the match in-situ. If context is provided, it is used to generate the identity. If a count is provided it is used. otherwise make use of just pattern ID + text value.- Parameters:
m- a TextMatchcount- incrementor used for uniqueness
-
cleanup
public void cleanup()Extractor interface: extractors are responsible for cleaning up after themselves. -
enableAll
public void enableAll() -
disableAll
public void disableAll()
-