Class RegexPatternManager
java.lang.Object
org.opensextant.extractors.flexpat.RegexPatternManager
- Direct Known Subclasses:
PatternManager,PatternManager,PoliPatternManager
This is the culmination of various date/time extraction efforts in python and Java. This API poses no assumptions on input data or on execution. Features of REGEX patterns file:
- DEFINE - a component of a pattern to match
- RULE - a complete pattern to match
See XCoord PatternManager for a good example implementation.
- Author:
- dlutz (lutzdavp), ubaldino
-
Field Summary
FieldsModifier and TypeFieldDescriptionbooleanprotected org.slf4j.Loggerprotected Stringprotected Map<String,RegexPattern> protected List<RegexPattern>boolean -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract RegexPatterncreate_pattern(String fam, String rule, String desc) Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.protected abstract PatternTestCasecreate_testcase(String id, String fam, String text) Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXTvoidEnable a family of patternsabstract voidenable an instance of a pattern based on the global settings.voidenable_patterns(String name) default adapter -- you must override.voidget_pattern(String id) Access the paterns by IDInstead of relying on a logging API, we now throw Exceptionsages for real configuration errors, and capture configuration details in a buffer if debug is on.group_map(RegexPattern p, Matcher matched) NOTE: We're dealing with Java6's inability to use named groups.group_matches(RegexPattern p, Matcher matched) Matched fields as TextEntitiesvoidInitializes the pattern manager implementations.protected abstract booleanImplementation has the option to check a pattern; For now invalid patterns are only logged.
-
Field Details
-
log
protected org.slf4j.Logger log -
patterns
-
patterns_list
-
patternFile
-
debug
public boolean debug -
testing
public boolean testing -
testcases
-
-
Constructor Details
-
RegexPatternManager
- Throws:
IOException
-
-
Method Details
-
get_patterns
- Returns:
- collection of patterns
-
get_pattern
Access the paterns by ID- Parameters:
id- pattern id- Returns:
- found pattern or null
-
create_pattern
Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.- Parameters:
fam- familyrule- rule ID within the familydesc- optional description- Returns:
- pattern object
-
validate_pattern
Implementation has the option to check a pattern; For now invalid patterns are only logged.- Parameters:
pat- pattern object- Returns:
- true if pattern is valid
-
create_testcase
Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXT- Parameters:
id- pattern idfam- pattern familytext- text for test case- Returns:
- test case object
-
enable_pattern
enable an instance of a pattern based on the global settings.- Parameters:
p- the pattern obj to enable
-
enable_patterns
default adapter -- you must override. This should be abstract, but not all pattern managers are required to support this.- Parameters:
name- pattern name to enable.
-
disableAll
public void disableAll()Enable a family of patterns -
enableAll
public void enableAll() -
initialize
Initializes the pattern manager implementations. Reads the DEFINEs and RULEs from the pattern file and does the requisite substitutions. After initialization patterns HashMap will be populated.- Parameters:
io- stream- Throws:
IOException- if patterns file can not be loaded and parsed
-
getConfigurationDebug
Instead of relying on a logging API, we now throw Exceptionsages for real configuration errors, and capture configuration details in a buffer if debug is on.- Returns:
- the configuration debug
-
group_map
NOTE: We're dealing with Java6's inability to use named groups. So we have to track FlexPat slots in line with Matcher fields matched. Essentially this comes down to a simple Name:Offset pairing; our limitation here is no nesting.- Parameters:
p- patternmatched- matcher- Returns:
- map containing the matched groups, as deciphered by Flexpat and the definitions in the patterns file
-
group_matches
Matched fields as TextEntities- Parameters:
p- patternmatched- java RE Matcher- Returns:
- keyed TextEntity
-