Class AbstractFlexPat
java.lang.Object
org.opensextant.extractors.flexpat.AbstractFlexPat
- All Implemented Interfaces:
Extractor
- Direct Known Subclasses:
PatternsOfLife
,XCoord
,XTemporal
FlexPat Extractor -- given a set of pattern families, extract, filter and
normalize matches.
- Author:
- ubaldino
-
Field Summary
Modifier and TypeFieldDescriptionprotected boolean
protected org.slf4j.Logger
protected int
CHARS.protected RegexPatternManager
protected String
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
cleanup()
Extractor interface: extractors are responsible for cleaning up after themselves.void
Configures whatever default patterns file is named.void
configure
(InputStream strm, String name) void
Configure using a particular pattern file.void
Configure using a URL pointer to the pattern file.protected abstract RegexPatternManager
createPatternManager
(InputStream s, String name) Create a pattern manager given the input stream and the file name.void
void
protected void
set_match_id
(TextMatch m, int count) Optional.void
setMatchWidth
(int w) Match Width is the text buffer before and after a TextMatch.
-
Field Details
-
match_width
protected int match_widthCHARS. SHP DBF limit is 255 bytes, so SHP file outputters should assess at that time how/when to curtail match width. The max pre/post text seen useful has typically been about 200-250 characters. -
log
protected org.slf4j.Logger log -
debug
protected boolean debug -
patterns
-
patterns_file
-
-
Constructor Details
-
AbstractFlexPat
public AbstractFlexPat() -
AbstractFlexPat
public AbstractFlexPat(boolean b)
-
-
Method Details
-
createPatternManager
protected abstract RegexPatternManager createPatternManager(InputStream s, String name) throws IOException Create a pattern manager given the input stream and the file name.- Parameters:
s
- stream of patterns config filename
- app name- Returns:
- the regex pattern manager
- Throws:
IOException
- Signals that an I/O exception has occurred.
-
getPatternManager
-
configure
Configures whatever default patterns file is named.- Specified by:
configure
in interfaceExtractor
- Throws:
ConfigException
- config error, pattern file not found
-
configure
Configure using a particular pattern file.- Specified by:
configure
in interfaceExtractor
- Parameters:
patfile
- a pattern file.- Throws:
ConfigException
- if pattern file not found
-
configure
Configure using a URL pointer to the pattern file.- Specified by:
configure
in interfaceExtractor
- Parameters:
patfile
- patterns file URL- Throws:
ConfigException
- if pattern file not found
-
configure
- Throws:
ConfigException
-
setMatchWidth
public void setMatchWidth(int w) Match Width is the text buffer before and after a TextMatch. Match buffers are used to create a match ID- Parameters:
w
- width
-
set_match_id
Optional. Assign an identifier to each Text Match found. This is an MD5 of the match in-situ. If context is provided, it is used to generate the identity. If a count is provided it is used. otherwise make use of just pattern ID + text value.- Parameters:
m
- a TextMatchcount
- incrementor used for uniqueness
-
cleanup
public void cleanup()Extractor interface: extractors are responsible for cleaning up after themselves. -
enableAll
public void enableAll() -
disableAll
public void disableAll()
-