java.lang.Object

org.opensextant.extraction.MatcherUtils

public class MatcherUtils extends Object

Field Summary

Fields

Modifier and Type

Field

Description

static final String

CLOSE_CHARS

static final String

START_CHARS
Constructor Summary

Constructors

Constructor

Description

MatcherUtils()
Method Summary

Modifier and Type

Method

Description

static void

filterMatchesBySpans(String buffer, List<TextMatch> matches)

A simple demonstration of how to sift through matches identifying which matches appear within tags.

static List<TextEntity>

findTagSpans(String text)

Trivial attempt at locating edges of tags in data.

static void

reduceMatches(List<TextMatch> matches)

Reduce actual valid matches by identifying duplicates or sub-matches.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- START_CHARS
  
  public static final String START_CHARS
  See Also:
  
  Constant Field Values
- CLOSE_CHARS
  
  public static final String CLOSE_CHARS
  See Also:
  
  Constant Field Values
Constructor Details
- MatcherUtils
  
  public MatcherUtils()
Method Details
- reduceMatches
  
  public static void reduceMatches(List<TextMatch> matches)
  
  Reduce actual valid matches by identifying duplicates or sub-matches. Overlapping spans are not considered filtered out.
  
  Parameters:
  
  matches - set of matches you need to sift through to find filtered out items.
- findTagSpans
  
  public static List<TextEntity> findTagSpans(String text)
  Trivial attempt at locating edges of tags in data. This allows us to tag any data, but post-filter any items match within tags, that is if you have
  [A]text[A] [A]text[/A] [A data]text
  where A is a tag, but the (angle,paren,square,curly) bracket marks the start of a tag area. We are finding those start/ends of the tag area, not the text span represented by the matching tags. Supported characters are > and [ for now.
  Tags are: CHAR TEXT ? CHAR # <a href=''> Tags are not: CHAR SPACE TEXT .....# an open tag, followed by non-alpha and/or not closed. Tag names are always ASCII, as these are simple tag detection tools. Uniccode tags are allowable. To properly detect end tags, [/a] or </a> then "/" is the only allowable character after an opening char for a tag.
  Parameters:
  
  text -
  
  Returns:
  
  list of TextEntity with no text, just span offsets
- filterMatchesBySpans
  
  public static void filterMatchesBySpans(String buffer, List<TextMatch> matches)
  
  A simple demonstration of how to sift through matches identifying which matches appear within tags. So we say [A]match[/A] -- match is good. [A]match -- match is good. [A match]text other_match -- match is not good; other_match is fine. The result is that matches inside tags are "filteredOut"
  
  Parameters:
  
  buffer - the raw signal text where the matches were found.
  
  matches - TextMatch array

Class MatcherUtils

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

START_CHARS

CLOSE_CHARS

Constructor Details

MatcherUtils

Method Details

reduceMatches

findTagSpans

filterMatchesBySpans