Class AbstractFlexPat

java.lang.Object
org.opensextant.extractors.flexpat.AbstractFlexPat
All Implemented Interfaces:
Extractor
Direct Known Subclasses:
PatternsOfLife, XCoord, XTemporal

public abstract class AbstractFlexPat extends Object implements Extractor
FlexPat Extractor -- given a set of pattern families, extract, filter and normalize matches.
Author:
ubaldino
  • Field Details

    • match_width

      protected int match_width
      CHARS. SHP DBF limit is 255 bytes, so SHP file outputters should assess at that time how/when to curtail match width. The max pre/post text seen useful has typically been about 200-250 characters.
    • log

      protected org.slf4j.Logger log
    • debug

      protected boolean debug
    • patterns

      protected RegexPatternManager patterns
    • patterns_file

      protected String patterns_file
  • Constructor Details

    • AbstractFlexPat

      public AbstractFlexPat()
    • AbstractFlexPat

      public AbstractFlexPat(boolean b)
  • Method Details

    • createPatternManager

      protected abstract RegexPatternManager createPatternManager(InputStream s, String name) throws IOException
      Create a pattern manager given the input stream and the file name.
      Parameters:
      s - stream of patterns config file
      name - app name
      Returns:
      the regex pattern manager
      Throws:
      IOException - Signals that an I/O exception has occurred.
    • getPatternManager

      public RegexPatternManager getPatternManager()
    • configure

      public void configure() throws ConfigException
      Configures whatever default patterns file is named.
      Specified by:
      configure in interface Extractor
      Throws:
      ConfigException - config error, pattern file not found
    • configure

      public void configure(String patfile) throws ConfigException
      Configure using a particular pattern file.
      Specified by:
      configure in interface Extractor
      Parameters:
      patfile - a pattern file.
      Throws:
      ConfigException - if pattern file not found
    • configure

      public void configure(URL patfile) throws ConfigException
      Configure using a URL pointer to the pattern file.
      Specified by:
      configure in interface Extractor
      Parameters:
      patfile - patterns file URL
      Throws:
      ConfigException - if pattern file not found
    • configure

      public void configure(InputStream strm, String name) throws ConfigException
      Throws:
      ConfigException
    • setMatchWidth

      public void setMatchWidth(int w)
      Match Width is the text buffer before and after a TextMatch. Match buffers are used to create a match ID
      Parameters:
      w - width
    • set_match_id

      protected void set_match_id(TextMatch m, int count)
      Optional. Assign an identifier to each Text Match found. This is an MD5 of the match in-situ. If context is provided, it is used to generate the identity. If a count is provided it is used. otherwise make use of just pattern ID + text value.
      Parameters:
      m - a TextMatch
      count - incrementor used for uniqueness
    • cleanup

      public void cleanup()
      Extractor interface: extractors are responsible for cleaning up after themselves.
      Specified by:
      cleanup in interface Extractor
    • enableAll

      public void enableAll()
    • disableAll

      public void disableAll()