Class RegexPatternManager

  • Direct Known Subclasses:
    PatternManager, PatternManager, PoliPatternManager

    public abstract class RegexPatternManager
    extends java.lang.Object

    This is the culmination of various date/time extraction efforts in python and Java. This API poses no assumptions on input data or on execution. Features of REGEX patterns file:

    • DEFINE - a component of a pattern to match
    • RULE - a complete pattern to match
    This work started in Java 6 and has the limitation of Java 6 Regex, mainly that there are no named groups available in matching.

    See XCoord PatternManager for a good example implementation.

    Author:
    dlutz (lutzdavp), ubaldino
    • Constructor Summary

      Constructors 
      Constructor Description
      RegexPatternManager​(java.io.InputStream s, java.lang.String n)  
    • Method Summary

      Modifier and Type Method Description
      protected abstract RegexPattern create_pattern​(java.lang.String fam, java.lang.String rule, java.lang.String desc)
      Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.
      protected abstract PatternTestCase create_testcase​(java.lang.String id, java.lang.String fam, java.lang.String text)
      Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXT
      void disableAll()
      Enable a family of patterns
      abstract void enable_pattern​(RegexPattern p)
      enable an instance of a pattern based on the global settings.
      void enable_patterns​(java.lang.String name)
      default adapter -- you must override.
      void enableAll()  
      RegexPattern get_pattern​(java.lang.String id)
      Access the paterns by ID
      java.util.Collection<RegexPattern> get_patterns()  
      java.lang.String getConfigurationDebug()
      Instead of relying on a logging API, we now throw Exceptionsages for real configuration errors, and capture configuration details in a buffer if debug is on.
      java.util.Map<java.lang.String,​java.lang.String> group_map​(RegexPattern p, java.util.regex.Matcher matched)
      NOTE: We're dealing with Java6's inability to use named groups.
      java.util.Map<java.lang.String,​TextEntity> group_matches​(RegexPattern p, java.util.regex.Matcher matched)
      Matched fields as TextEntities
      void initialize​(java.io.InputStream io)
      Initializes the pattern manager implementations.
      protected abstract boolean validate_pattern​(RegexPattern pat)
      Implementation has the option to check a pattern; For now invalid patterns are only logged.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • log

        protected org.slf4j.Logger log
      • patterns

        protected java.util.Map<java.lang.String,​RegexPattern> patterns
      • patterns_list

        protected java.util.List<RegexPattern> patterns_list
      • patternFile

        protected java.lang.String patternFile
      • debug

        public boolean debug
      • testing

        public boolean testing
    • Constructor Detail

      • RegexPatternManager

        public RegexPatternManager​(java.io.InputStream s,
                                   java.lang.String n)
                            throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • get_patterns

        public java.util.Collection<RegexPattern> get_patterns()
        Returns:
        collection of patterns
      • get_pattern

        public RegexPattern get_pattern​(java.lang.String id)
        Access the paterns by ID
        Parameters:
        id - pattern id
        Returns:
        found pattern or null
      • create_pattern

        protected abstract RegexPattern create_pattern​(java.lang.String fam,
                                                       java.lang.String rule,
                                                       java.lang.String desc)
        Implementation must create a RegexPattern given the basic RULE define, #RULE FAMILY RID REGEX PatternManager here adds compiled pattern and DEFINES.
        Parameters:
        fam - family
        rule - rule ID within the family
        desc - optional description
        Returns:
        pattern object
      • validate_pattern

        protected abstract boolean validate_pattern​(RegexPattern pat)
        Implementation has the option to check a pattern; For now invalid patterns are only logged.
        Parameters:
        pat - pattern object
        Returns:
        true if pattern is valid
      • create_testcase

        protected abstract PatternTestCase create_testcase​(java.lang.String id,
                                                           java.lang.String fam,
                                                           java.lang.String text)
        Implementation must create TestCases given the #TEST directive, #TEST RID TID TEXT
        Parameters:
        id - pattern id
        fam - pattern family
        text - text for test case
        Returns:
        test case object
      • enable_pattern

        public abstract void enable_pattern​(RegexPattern p)
        enable an instance of a pattern based on the global settings.
        Parameters:
        p - the pattern obj to enable
      • enable_patterns

        public void enable_patterns​(java.lang.String name)
        default adapter -- you must override. This should be abstract, but not all pattern managers are required to support this.
        Parameters:
        name - pattern name to enable.
      • disableAll

        public void disableAll()
        Enable a family of patterns
      • enableAll

        public void enableAll()
      • initialize

        public void initialize​(java.io.InputStream io)
                        throws java.io.IOException
        Initializes the pattern manager implementations. Reads the DEFINEs and RULEs from the pattern file and does the requisite substitutions. After initialization patterns HashMap will be populated.
        Parameters:
        io - stream
        Throws:
        java.io.IOException - if patterns file can not be loaded and parsed
      • getConfigurationDebug

        public java.lang.String getConfigurationDebug()
        Instead of relying on a logging API, we now throw Exceptionsages for real configuration errors, and capture configuration details in a buffer if debug is on.
        Returns:
        the configuration debug
      • group_map

        public java.util.Map<java.lang.String,​java.lang.String> group_map​(RegexPattern p,
                                                                                java.util.regex.Matcher matched)
        NOTE: We're dealing with Java6's inability to use named groups. So we have to track FlexPat slots in line with Matcher fields matched. Essentially this comes down to a simple Name:Offset pairing; our limitation here is no nesting.
        Parameters:
        p - pattern
        matched - matcher
        Returns:
        map containing the matched groups, as deciphered by Flexpat and the definitions in the patterns file
      • group_matches

        public java.util.Map<java.lang.String,​TextEntity> group_matches​(RegexPattern p,
                                                                              java.util.regex.Matcher matched)
        Matched fields as TextEntities
        Parameters:
        p - pattern
        matched - java RE Matcher
        Returns:
        keyed TextEntity