Class RegexPatternManager
- java.lang.Object
-
- org.opensextant.extractors.flexpat.RegexPatternManager
-
- Direct Known Subclasses:
PatternManager
,PatternManager
,PoliPatternManager
public abstract class RegexPatternManager extends java.lang.Object
This is the culmination of various date/time extraction efforts in python and Java. This API poses no assumptions on input data or on execution. Features of REGEX patterns file:
- DEFINE - a component of a pattern to match
- RULE - a complete pattern to match
See XCoord PatternManager for a good example implementation.
- Author:
- dlutz (lutzdavp), ubaldino
-
-
Field Summary
Fields Modifier and Type Field Description boolean
debug
protected org.slf4j.Logger
log
protected java.lang.String
patternFile
protected java.util.Map<java.lang.String,RegexPattern>
patterns
protected java.util.List<RegexPattern>
patterns_list
java.util.List<PatternTestCase>
testcases
boolean
testing
-
Constructor Summary
Constructors Constructor Description RegexPatternManager(java.io.InputStream s, java.lang.String n)
-
Method Summary
Modifier and Type Method Description protected abstract RegexPattern
create_pattern(java.lang.String fam, java.lang.String rule, java.lang.String desc)
Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.protected abstract PatternTestCase
create_testcase(java.lang.String id, java.lang.String fam, java.lang.String text)
Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXTvoid
disableAll()
Enable a family of patternsabstract void
enable_pattern(RegexPattern p)
enable an instance of a pattern based on the global settings.void
enable_patterns(java.lang.String name)
default adapter -- you must override.void
enableAll()
RegexPattern
get_pattern(java.lang.String id)
Access the paterns by IDjava.util.Collection<RegexPattern>
get_patterns()
java.lang.String
getConfigurationDebug()
Instead of relying on a logging API, we now throw Exceptionsages for real configuration errors, and capture configuration details in a buffer if debug is on.java.util.Map<java.lang.String,java.lang.String>
group_map(RegexPattern p, java.util.regex.Matcher matched)
NOTE: We're dealing with Java6's inability to use named groups.java.util.Map<java.lang.String,TextEntity>
group_matches(RegexPattern p, java.util.regex.Matcher matched)
Matched fields as TextEntitiesvoid
initialize(java.io.InputStream io)
Initializes the pattern manager implementations.protected abstract boolean
validate_pattern(RegexPattern pat)
Implementation has the option to check a pattern; For now invalid patterns are only logged.
-
-
-
Field Detail
-
log
protected org.slf4j.Logger log
-
patterns
protected java.util.Map<java.lang.String,RegexPattern> patterns
-
patterns_list
protected java.util.List<RegexPattern> patterns_list
-
patternFile
protected java.lang.String patternFile
-
debug
public boolean debug
-
testing
public boolean testing
-
testcases
public java.util.List<PatternTestCase> testcases
-
-
Method Detail
-
get_patterns
public java.util.Collection<RegexPattern> get_patterns()
- Returns:
- collection of patterns
-
get_pattern
public RegexPattern get_pattern(java.lang.String id)
Access the paterns by ID- Parameters:
id
- pattern id- Returns:
- found pattern or null
-
create_pattern
protected abstract RegexPattern create_pattern(java.lang.String fam, java.lang.String rule, java.lang.String desc)
Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.- Parameters:
fam
- familyrule
- rule ID within the familydesc
- optional description- Returns:
- pattern object
-
validate_pattern
protected abstract boolean validate_pattern(RegexPattern pat)
Implementation has the option to check a pattern; For now invalid patterns are only logged.- Parameters:
pat
- pattern object- Returns:
- true if pattern is valid
-
create_testcase
protected abstract PatternTestCase create_testcase(java.lang.String id, java.lang.String fam, java.lang.String text)
Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXT- Parameters:
id
- pattern idfam
- pattern familytext
- text for test case- Returns:
- test case object
-
enable_pattern
public abstract void enable_pattern(RegexPattern p)
enable an instance of a pattern based on the global settings.- Parameters:
p
- the pattern obj to enable
-
enable_patterns
public void enable_patterns(java.lang.String name)
default adapter -- you must override. This should be abstract, but not all pattern managers are required to support this.- Parameters:
name
- pattern name to enable.
-
disableAll
public void disableAll()
Enable a family of patterns
-
enableAll
public void enableAll()
-
initialize
public void initialize(java.io.InputStream io) throws java.io.IOException
Initializes the pattern manager implementations. Reads the DEFINEs and RULEs from the pattern file and does the requisite substitutions. After initialization patterns HashMap will be populated.- Parameters:
io
- stream- Throws:
java.io.IOException
- if patterns file can not be loaded and parsed
-
getConfigurationDebug
public java.lang.String getConfigurationDebug()
Instead of relying on a logging API, we now throw Exceptionsages for real configuration errors, and capture configuration details in a buffer if debug is on.- Returns:
- the configuration debug
-
group_map
public java.util.Map<java.lang.String,java.lang.String> group_map(RegexPattern p, java.util.regex.Matcher matched)
NOTE: We're dealing with Java6's inability to use named groups. So we have to track FlexPat slots in line with Matcher fields matched. Essentially this comes down to a simple Name:Offset pairing; our limitation here is no nesting.- Parameters:
p
- patternmatched
- matcher- Returns:
- map containing the matched groups, as deciphered by Flexpat and the definitions in the patterns file
-
group_matches
public java.util.Map<java.lang.String,TextEntity> group_matches(RegexPattern p, java.util.regex.Matcher matched)
Matched fields as TextEntities- Parameters:
p
- patternmatched
- java RE Matcher- Returns:
- keyed TextEntity
-
-