Class RegexPatternManager

java.lang.Object
org.opensextant.extractors.flexpat.RegexPatternManager
Direct Known Subclasses:
PatternManager, PatternManager, PoliPatternManager

public abstract class RegexPatternManager extends Object

This is the culmination of various date/time extraction efforts in python and Java. This API poses no assumptions on input data or on execution. Features of REGEX patterns file:

  • DEFINE - a component of a pattern to match
  • RULE - a complete pattern to match
This work started in Java 6 and has the limitation of Java 6 Regex, mainly that there are no named groups available in matching.

See XCoord PatternManager for a good example implementation.

Author:
dlutz (lutzdavp), ubaldino
  • Field Details

  • Constructor Details

  • Method Details

    • get_patterns

      public Collection<RegexPattern> get_patterns()
      Returns:
      collection of patterns
    • get_pattern

      public RegexPattern get_pattern(String id)
      Access the paterns by ID
      Parameters:
      id - pattern id
      Returns:
      found pattern or null
    • create_pattern

      protected abstract RegexPattern create_pattern(String fam, String rule, String desc)
      Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.
      Parameters:
      fam - family
      rule - rule ID within the family
      desc - optional description
      Returns:
      pattern object
    • validate_pattern

      protected abstract boolean validate_pattern(RegexPattern pat)
      Implementation has the option to check a pattern; For now invalid patterns are only logged.
      Parameters:
      pat - pattern object
      Returns:
      true if pattern is valid
    • create_testcase

      protected abstract PatternTestCase create_testcase(String id, String fam, String text)
      Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXT
      Parameters:
      id - pattern id
      fam - pattern family
      text - text for test case
      Returns:
      test case object
    • enable_pattern

      public abstract void enable_pattern(RegexPattern p)
      enable an instance of a pattern based on the global settings.
      Parameters:
      p - the pattern obj to enable
    • enable_patterns

      public void enable_patterns(String name)
      default adapter -- you must override. This should be abstract, but not all pattern managers are required to support this.
      Parameters:
      name - pattern name to enable.
    • disableAll

      public void disableAll()
      Enable a family of patterns
    • enableAll

      public void enableAll()
    • initialize

      public void initialize(InputStream io) throws IOException
      Initializes the pattern manager implementations. Reads the DEFINEs and RULEs from the pattern file and does the requisite substitutions. After initialization patterns HashMap will be populated.
      Parameters:
      io - stream
      Throws:
      IOException - if patterns file can not be loaded and parsed
    • getConfigurationDebug

      public String getConfigurationDebug()
      Instead of relying on a logging API, we now throw Exceptionsages for real configuration errors, and capture configuration details in a buffer if debug is on.
      Returns:
      the configuration debug
    • group_map

      public Map<String,String> group_map(RegexPattern p, Matcher matched)
      NOTE: We're dealing with Java6's inability to use named groups. So we have to track FlexPat slots in line with Matcher fields matched. Essentially this comes down to a simple Name:Offset pairing; our limitation here is no nesting.
      Parameters:
      p - pattern
      matched - matcher
      Returns:
      map containing the matched groups, as deciphered by Flexpat and the definitions in the patterns file
    • group_matches

      public Map<String,TextEntity> group_matches(RegexPattern p, Matcher matched)
      Matched fields as TextEntities
      Parameters:
      p - pattern
      matched - java RE Matcher
      Returns:
      keyed TextEntity