Class PatternManager
java.lang.Object
org.opensextant.extractors.flexpat.RegexPatternManager
org.opensextant.extractors.xcoord.PatternManager
This is the culmination of various coordinate extraction efforts in python and Java. This API poses no assumptions on input data or on execution.
Common Coordinate Enumeration (CCE) is a concept for enumerating the coordinate representations. See XConstants for details. The basics of CCE include a family (DD, DMS, MGRS, etc.) and style ( enumerated in patterns config file).
Features of REGEX patterns file:
- DEFINE - a component of a coord pattern to match
- RULE - a complete pattern to match
- TEST - an example of the text the pattern should match in part or whole.
The Rules file: The Rules is an external text file containing rules consisting of regular expressions used to identify geocoords. Below is an example of what a simple rule might look like:
// Parts of a decimal degree Latitude/Longitude #DEFINE decDegLat \d?\d\.\d{1,20} #DEFINE decDegLon [0-1]?\d?\d\.\d{1,20} // TARGET: DD-xx, Decimal Deg, Preceding Hemisphere (a) H DD.DDDDDD° HDDD.DDDDDD°, optional deg symbol #RULE DD 01 <hemiLatPre>\s?<decDegLat><degSym>?\s*<latlonSep>?\s*<hemiLonPre>\s?<decDegLon>lt;degSym>? #TEST DD 01 N42.3, W102.4Where the DEFINE statements relay fields that the PatternManager will recall at runtime. The RULE is a composition of DEFINEs, other literals and regex patterns. A rule must have a family and a rule ID within that family. And the TEST statement (which is enumerated the same as the RULE family and ID). At runtime all tests are further labeled with an incrementor, e.g. for TEST "DD-01" might be the eighth test in the pattern file, so the test will be labeled internally as DD-01#8.
- Author:
- dlutz, MITRE creator (lutzdavp), ubaldino, MITRE adaptor, swainza
-
Field Summary
Fields inherited from class org.opensextant.extractors.flexpat.RegexPatternManager
debug, log, patternFile, patterns, patterns_list, testcases, testing
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected RegexPattern
create_pattern
(String fam, String rule, String desc) Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.protected PatternTestCase
create_testcase
(String id, String fam, String text) Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXTvoid
enable_CCE_family
(int cce_fam, boolean enabled) void
enable_pattern
(RegexPattern repat) enable an instance of a pattern based on the global settings.void
Initializes the pattern manager implementations.protected boolean
validate_pattern
(RegexPattern repat) Implementation has the option to check a pattern; For now invalid patterns are only logged.Methods inherited from class org.opensextant.extractors.flexpat.RegexPatternManager
disableAll, enable_patterns, enableAll, get_pattern, get_patterns, getConfigurationDebug, group_map, group_matches
-
Field Details
-
CCE_family_state
-
-
Constructor Details
-
PatternManager
- Throws:
IOException
-
-
Method Details
-
initialize
Description copied from class:RegexPatternManager
Initializes the pattern manager implementations. Reads the DEFINEs and RULEs from the pattern file and does the requisite substitutions. After initialization patterns HashMap will be populated.- Overrides:
initialize
in classRegexPatternManager
- Parameters:
io
- stream- Throws:
IOException
-
enable_CCE_family
public void enable_CCE_family(int cce_fam, boolean enabled) - Parameters:
cce_fam
-enabled
-
-
enable_pattern
enable an instance of a pattern based on the global settings.- Specified by:
enable_pattern
in classRegexPatternManager
- Parameters:
repat
-
-
create_pattern
Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.- Specified by:
create_pattern
in classRegexPatternManager
- Parameters:
fam
-rule
-desc
-- Returns:
-
validate_pattern
Implementation has the option to check a pattern; For now invalid patterns are only logged.- Specified by:
validate_pattern
in classRegexPatternManager
- Parameters:
repat
-- Returns:
-
create_testcase
Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXT- Specified by:
create_testcase
in classRegexPatternManager
- Parameters:
id
-text
-fam
-- Returns:
-