Class RegexPatternManager
java.lang.Object
org.opensextant.extractors.flexpat.RegexPatternManager
- Direct Known Subclasses:
PatternManager
,PatternManager
,PoliPatternManager
This is the culmination of various date/time extraction efforts in python and Java. This API poses no assumptions on input data or on execution. Features of REGEX patterns file:
- DEFINE - a component of a pattern to match
- RULE - a complete pattern to match
See XCoord PatternManager for a good example implementation.
- Author:
- dlutz (lutzdavp), ubaldino
-
Field Summary
Modifier and TypeFieldDescriptionboolean
protected org.slf4j.Logger
protected String
protected Map<String,
RegexPattern> protected List<RegexPattern>
boolean
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected abstract RegexPattern
create_pattern
(String fam, String rule, String desc) Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.protected abstract PatternTestCase
create_testcase
(String id, String fam, String text) Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXTvoid
Enable a family of patternsabstract void
enable an instance of a pattern based on the global settings.void
enable_patterns
(String name) default adapter -- you must override.void
get_pattern
(String id) Access the paterns by IDInstead of relying on a logging API, we now throw Exceptionsages for real configuration errors, and capture configuration details in a buffer if debug is on.group_map
(RegexPattern p, Matcher matched) NOTE: We're dealing with Java6's inability to use named groups.group_matches
(RegexPattern p, Matcher matched) Matched fields as TextEntitiesvoid
Initializes the pattern manager implementations.protected abstract boolean
Implementation has the option to check a pattern; For now invalid patterns are only logged.
-
Field Details
-
log
protected org.slf4j.Logger log -
patterns
-
patterns_list
-
patternFile
-
debug
public boolean debug -
testing
public boolean testing -
testcases
-
-
Constructor Details
-
RegexPatternManager
- Throws:
IOException
-
-
Method Details
-
get_patterns
- Returns:
- collection of patterns
-
get_pattern
Access the paterns by ID- Parameters:
id
- pattern id- Returns:
- found pattern or null
-
create_pattern
Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.- Parameters:
fam
- familyrule
- rule ID within the familydesc
- optional description- Returns:
- pattern object
-
validate_pattern
Implementation has the option to check a pattern; For now invalid patterns are only logged.- Parameters:
pat
- pattern object- Returns:
- true if pattern is valid
-
create_testcase
Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXT- Parameters:
id
- pattern idfam
- pattern familytext
- text for test case- Returns:
- test case object
-
enable_pattern
enable an instance of a pattern based on the global settings.- Parameters:
p
- the pattern obj to enable
-
enable_patterns
default adapter -- you must override. This should be abstract, but not all pattern managers are required to support this.- Parameters:
name
- pattern name to enable.
-
disableAll
public void disableAll()Enable a family of patterns -
enableAll
public void enableAll() -
initialize
Initializes the pattern manager implementations. Reads the DEFINEs and RULEs from the pattern file and does the requisite substitutions. After initialization patterns HashMap will be populated.- Parameters:
io
- stream- Throws:
IOException
- if patterns file can not be loaded and parsed
-
getConfigurationDebug
Instead of relying on a logging API, we now throw Exceptionsages for real configuration errors, and capture configuration details in a buffer if debug is on.- Returns:
- the configuration debug
-
group_map
NOTE: We're dealing with Java6's inability to use named groups. So we have to track FlexPat slots in line with Matcher fields matched. Essentially this comes down to a simple Name:Offset pairing; our limitation here is no nesting.- Parameters:
p
- patternmatched
- matcher- Returns:
- map containing the matched groups, as deciphered by Flexpat and the definitions in the patterns file
-
group_matches
Matched fields as TextEntities- Parameters:
p
- patternmatched
- java RE Matcher- Returns:
- keyed TextEntity
-