Class PatternManager
java.lang.Object
org.opensextant.extractors.flexpat.RegexPatternManager
org.opensextant.extractors.xcoord.PatternManager
This is the culmination of various coordinate extraction efforts in python and Java. This API poses no assumptions on input data or on execution.
Common Coordinate Enumeration (CCE) is a concept for enumerating the coordinate representations. See XConstants for details. The basics of CCE include a family (DD, DMS, MGRS, etc.) and style ( enumerated in patterns config file).
Features of REGEX patterns file:
- DEFINE - a component of a coord pattern to match
- RULE - a complete pattern to match
- TEST - an example of the text the pattern should match in part or whole.
The Rules file: The Rules is an external text file containing rules consisting of regular expressions used to identify geocoords. Below is an example of what a simple rule might look like:
// Parts of a decimal degree Latitude/Longitude
#DEFINE decDegLat \d?\d\.\d{1,20}
#DEFINE decDegLon [0-1]?\d?\d\.\d{1,20}
// TARGET: DD-xx, Decimal Deg, Preceding Hemisphere (a) H DD.DDDDDD° HDDD.DDDDDD°, optional deg symbol
#RULE DD 01 <hemiLatPre>\s?<decDegLat><degSym>?\s*<latlonSep>?\s*<hemiLonPre>\s?<decDegLon>lt;degSym>?
#TEST DD 01 N42.3, W102.4
Where the DEFINE statements relay fields that the PatternManager will recall
at runtime. The RULE is a composition of DEFINEs, other literals and regex
patterns. A rule must have a family and a rule ID within that family. And the
TEST statement (which is enumerated the same as the RULE family and ID). At
runtime all tests are further labeled with an incrementor, e.g. for TEST
"DD-01" might be the eighth test in the pattern file, so the test will be
labeled internally as DD-01#8.- Author:
- dlutz, MITRE creator (lutzdavp), ubaldino, MITRE adaptor, swainza
-
Field Summary
FieldsFields inherited from class org.opensextant.extractors.flexpat.RegexPatternManager
debug, log, patternFile, patterns, patterns_list, testcases, testing -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected RegexPatterncreate_pattern(String fam, String rule, String desc) Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.protected PatternTestCasecreate_testcase(String id, String fam, String text) Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXTvoidenable_CCE_family(int cce_fam, boolean enabled) voidenable_pattern(RegexPattern repat) enable an instance of a pattern based on the global settings.voidInitializes the pattern manager implementations.protected booleanvalidate_pattern(RegexPattern repat) Implementation has the option to check a pattern; For now invalid patterns are only logged.Methods inherited from class org.opensextant.extractors.flexpat.RegexPatternManager
disableAll, enable_patterns, enableAll, get_pattern, get_patterns, getConfigurationDebug, group_map, group_matches
-
Field Details
-
CCE_family_state
-
-
Constructor Details
-
PatternManager
- Throws:
IOException
-
-
Method Details
-
initialize
Description copied from class:RegexPatternManagerInitializes the pattern manager implementations. Reads the DEFINEs and RULEs from the pattern file and does the requisite substitutions. After initialization patterns HashMap will be populated.- Overrides:
initializein classRegexPatternManager- Parameters:
io- stream- Throws:
IOException
-
enable_CCE_family
public void enable_CCE_family(int cce_fam, boolean enabled) - Parameters:
cce_fam-enabled-
-
enable_pattern
enable an instance of a pattern based on the global settings.- Specified by:
enable_patternin classRegexPatternManager- Parameters:
repat-
-
create_pattern
Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.- Specified by:
create_patternin classRegexPatternManager- Parameters:
fam-rule-desc-- Returns:
-
validate_pattern
Implementation has the option to check a pattern; For now invalid patterns are only logged.- Specified by:
validate_patternin classRegexPatternManager- Parameters:
repat-- Returns:
-
create_testcase
Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXT- Specified by:
create_testcasein classRegexPatternManager- Parameters:
id-text-fam-- Returns:
-