Class AbstractFlexPat
- java.lang.Object
-
- org.opensextant.extractors.flexpat.AbstractFlexPat
-
- All Implemented Interfaces:
Extractor
- Direct Known Subclasses:
PatternsOfLife
,XCoord
,XTemporal
public abstract class AbstractFlexPat extends java.lang.Object implements Extractor
FlexPat Extractor -- given a set of pattern families, extract, filter and normalize matches.- Author:
- ubaldino
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
debug
protected org.slf4j.Logger
log
protected int
match_width
CHARS.protected RegexPatternManager
patterns
protected java.lang.String
patterns_file
-
Constructor Summary
Constructors Constructor Description AbstractFlexPat()
AbstractFlexPat(boolean b)
-
Method Summary
Modifier and Type Method Description void
configure()
Configures whatever default patterns file is named.void
configure(java.io.InputStream strm, java.lang.String name)
void
configure(java.lang.String patfile)
Configure using a particular pattern file.void
configure(java.net.URL patfile)
Configure using a URL pointer to the pattern file.protected abstract RegexPatternManager
createPatternManager(java.io.InputStream s, java.lang.String name)
Create a pattern manager given the input stream and the file name.void
disableAll()
void
enableAll()
RegexPatternManager
getPatternManager()
void
markComplete()
protected void
set_match_id(TextMatch m, int count)
Optional.void
setMatchWidth(int w)
Match Width is the text buffer before and after a TextMatch.void
updateProgress(double progress)
-
-
-
Field Detail
-
match_width
protected int match_width
CHARS. SHP DBF limit is 255 bytes, so SHP file outputters should assess at that time how/when to curtail match width. The max pre/post text seen useful has typically been about 200-250 characters.
-
log
protected org.slf4j.Logger log
-
debug
protected boolean debug
-
patterns
protected RegexPatternManager patterns
-
patterns_file
protected java.lang.String patterns_file
-
-
Method Detail
-
createPatternManager
protected abstract RegexPatternManager createPatternManager(java.io.InputStream s, java.lang.String name) throws java.io.IOException
Create a pattern manager given the input stream and the file name.- Parameters:
s
- stream of patterns config filename
- app name- Returns:
- the regex pattern manager
- Throws:
java.io.IOException
- Signals that an I/O exception has occurred.
-
getPatternManager
public RegexPatternManager getPatternManager()
-
configure
public void configure() throws ConfigException
Configures whatever default patterns file is named.- Specified by:
configure
in interfaceExtractor
- Throws:
ConfigException
- config error, pattern file not found
-
configure
public void configure(java.lang.String patfile) throws ConfigException
Configure using a particular pattern file.- Specified by:
configure
in interfaceExtractor
- Parameters:
patfile
- a pattern file.- Throws:
ConfigException
- if pattern file not found
-
configure
public void configure(java.net.URL patfile) throws ConfigException
Configure using a URL pointer to the pattern file.- Specified by:
configure
in interfaceExtractor
- Parameters:
patfile
- patterns file URL- Throws:
ConfigException
- if pattern file not found
-
configure
public void configure(java.io.InputStream strm, java.lang.String name) throws ConfigException
- Throws:
ConfigException
-
setMatchWidth
public void setMatchWidth(int w)
Match Width is the text buffer before and after a TextMatch. Match buffers are used to create a match ID- Parameters:
w
- width
-
set_match_id
protected void set_match_id(TextMatch m, int count)
Optional. Assign an identifier to each Text Match found. This is an MD5 of the match in-situ. If context is provided, it is used to generate the identity. If a count is provided it is used. otherwise make use of just pattern ID + text value.- Parameters:
m
- a TextMatchcount
- incrementor used for uniqueness
-
enableAll
public void enableAll()
-
disableAll
public void disableAll()
-
updateProgress
public void updateProgress(double progress)
-
markComplete
public void markComplete()
-
-