Class PatternManager

java.lang.Object
org.opensextant.extractors.flexpat.RegexPatternManager
org.opensextant.extractors.xcoord.PatternManager

public final class PatternManager extends RegexPatternManager

This is the culmination of various coordinate extraction efforts in python and Java. This API poses no assumptions on input data or on execution.

Common Coordinate Enumeration (CCE) is a concept for enumerating the coordinate representations. See XConstants for details. The basics of CCE include a family (DD, DMS, MGRS, etc.) and style ( enumerated in patterns config file).

Features of REGEX patterns file:

  • DEFINE - a component of a coord pattern to match
  • RULE - a complete pattern to match
  • TEST - an example of the text the pattern should match in part or whole.

The Rules file: The Rules is an external text file containing rules consisting of regular expressions used to identify geocoords. Below is an example of what a simple rule might look like:

 // Parts of a decimal degree Latitude/Longitude
 #DEFINE  decDegLat   \d?\d\.\d{1,20}
 #DEFINE  decDegLon   [0-1]?\d?\d\.\d{1,20}

 // TARGET: DD-xx, Decimal Deg, Preceding Hemisphere (a) H DD.DDDDDD° HDDD.DDDDDD°, optional deg symbol
 #RULE   DD      01      <hemiLatPre>\s?<decDegLat><degSym>?\s*<latlonSep>?\s*<hemiLonPre>\s?<decDegLon>lt;degSym>?
 #TEST   DD      01      N42.3, W102.4
 
Where the DEFINE statements relay fields that the PatternManager will recall at runtime. The RULE is a composition of DEFINEs, other literals and regex patterns. A rule must have a family and a rule ID within that family. And the TEST statement (which is enumerated the same as the RULE family and ID). At runtime all tests are further labeled with an incrementor, e.g. for TEST "DD-01" might be the eighth test in the pattern file, so the test will be labeled internally as DD-01#8.
Author:
dlutz, MITRE creator (lutzdavp), ubaldino, MITRE adaptor, swainza
  • Field Details

  • Constructor Details

  • Method Details

    • initialize

      public void initialize(InputStream io) throws IOException
      Description copied from class: RegexPatternManager
      Initializes the pattern manager implementations. Reads the DEFINEs and RULEs from the pattern file and does the requisite substitutions. After initialization patterns HashMap will be populated.
      Overrides:
      initialize in class RegexPatternManager
      Parameters:
      io - stream
      Throws:
      IOException
    • enable_CCE_family

      public void enable_CCE_family(int cce_fam, boolean enabled)
      Parameters:
      cce_fam -
      enabled -
    • enable_pattern

      public void enable_pattern(RegexPattern repat)
      enable an instance of a pattern based on the global settings.
      Specified by:
      enable_pattern in class RegexPatternManager
      Parameters:
      repat -
    • create_pattern

      protected RegexPattern create_pattern(String fam, String rule, String desc)
      Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.
      Specified by:
      create_pattern in class RegexPatternManager
      Parameters:
      fam -
      rule -
      desc -
      Returns:
    • validate_pattern

      protected boolean validate_pattern(RegexPattern repat)
      Implementation has the option to check a pattern; For now invalid patterns are only logged.
      Specified by:
      validate_pattern in class RegexPatternManager
      Parameters:
      repat -
      Returns:
    • create_testcase

      protected PatternTestCase create_testcase(String id, String fam, String text)
      Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXT
      Specified by:
      create_testcase in class RegexPatternManager
      Parameters:
      id -
      text -
      fam -
      Returns: