Class MatcherUtils


  • public class MatcherUtils
    extends java.lang.Object
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String CLOSE_CHARS  
      static java.lang.String START_CHARS  
    • Constructor Summary

      Constructors 
      Constructor Description
      MatcherUtils()  
    • Method Summary

      Modifier and Type Method Description
      static void filterMatchesBySpans​(java.lang.String buffer, java.util.List<TextMatch> matches)
      A simple demonstration of how to sift through matches identifying which matches appear within tags.
      static java.util.List<TextEntity> findTagSpans​(java.lang.String text)
      Trivial attempt at locating edges of tags in data.
      static void reduceMatches​(java.util.List<TextMatch> matches)
      Reduce actual valid matches by identifying duplicates or sub-matches.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • MatcherUtils

        public MatcherUtils()
    • Method Detail

      • reduceMatches

        public static void reduceMatches​(java.util.List<TextMatch> matches)
        Reduce actual valid matches by identifying duplicates or sub-matches. Overlapping spans are not considered filtered out.
        Parameters:
        matches - set of matches you need to sift through to find filtered out items.
      • findTagSpans

        public static java.util.List<TextEntity> findTagSpans​(java.lang.String text)
        Trivial attempt at locating edges of tags in data. This allows us to tag any data, but post-filter any items match within tags, that is if you have
         [A]text[A] [A]text[/A] [A data]text
         
        where A is a tag, but the (angle,paren,square,curly) bracket marks the start of a tag area. We are finding those start/ends of the tag area, not the text span represented by the matching tags. Supported characters are > and [ for now.
          Tags are:
            CHAR TEXT ? CHAR     # <a href=''>
        
          Tags are not:
            CHAR SPACE TEXT .....# an open tag, followed by non-alpha and/or not closed.
        
          Tag names are always ASCII, as these are simple tag detection tools.
          Uniccode tags are allowable.
        
          To properly detect end tags, [/a] or </a> then "/" is the only allowable character after
          an opening char for a tag.
         
        Parameters:
        text -
        Returns:
        list of TextEntity with no text, just span offsets
      • filterMatchesBySpans

        public static void filterMatchesBySpans​(java.lang.String buffer,
                                                java.util.List<TextMatch> matches)
        A simple demonstration of how to sift through matches identifying which matches appear within tags. So we say [A]match[/A] -- match is good. [A]match -- match is good. [A match]text other_match -- match is not good; other_match is fine. The result is that matches inside tags are "filteredOut"
        Parameters:
        buffer - the raw signal text where the matches were found.
        matches - TextMatch array