Class TextEntity

java.lang.Object
org.opensextant.extraction.TextEntity
Direct Known Subclasses:
TextMatch

public class TextEntity extends Object
A very simple struct to hold data useful for post-processing entities once found.
Author:
Marc C. Ubaldino, MITRE, ubaldino at mitre dot org
  • Field Details

    • text

      protected String text
    • start

      public int start
      char offset of entity; location in document where entity starts.
    • end

      public int end
      char offset of entity; location in document where entity ends.
    • postChar

      public char postChar
      char immediately after span
    • preChar

      public char preChar
      char immediately before span
    • match_id

      public String match_id
    • is_submatch

      public boolean is_submatch
      If this entity is contained completely within some other
    • is_overlap

      public boolean is_overlap
      If this entity is a overlaps with some other
    • is_duplicate

      public boolean is_duplicate
      If this entity is a duplicate of some other
  • Constructor Details

    • TextEntity

      public TextEntity(int x1, int x2)
      Simple Span representation.
      Parameters:
      x1 - start offset
      x2 - end offset
  • Method Details

    • setText

      public void setText(String t)
      sets the value of the TextEntity
      Parameters:
      t - text
    • setTextOnly

      public void setTextOnly(String t)
      Set just the value, without incurring the cost of other metrics or flags about the text that likely are unchanged.
      Parameters:
      t - the text
    • isASCII

      public boolean isASCII()
      If non-punctuation content is purely ASCII vs. Latin1 vs. unicode.
      Returns:
      true if text value is purely ASCII
    • isLower

      public boolean isLower()
      test If text (that has a case sense) is ALL lower case
      Returns:
      true if all lower.
    • isUpper

      public boolean isUpper()
      test If text (that has a case sense) is ALL upper case
      Returns:
      true if all upper.
    • isMixedCase

      public boolean isMixedCase()
      test if text is mixed case.
      Returns:
      true if neither allower or all upper.
    • getText

      public String getText()
      Returns:
      text, value of a TextEntity
    • getLength

      public int getLength()
      get the length of the matched text
      Returns:
      int, length
    • setContext

      public void setContext(String before, String after)
      Set the context with before and after windows
      Parameters:
      before - text before match
      after - text after match
    • setContext

      public void setContext(String window)
      Set the context buffer from a single window
      Parameters:
      window - textual window
    • getContext

      public String getContext()
      Returns:
      context buffer regardless if it is singular context or separate pre/post match
    • getContextBefore

      public String getContextBefore()
      Returns:
      text before match
    • getContextAfter

      public String getContextAfter()
      Returns:
      text after match
    • toString

      public String toString()
      Overrides:
      toString in class Object
      Returns:
      string representation of entity
    • contains

      public boolean contains(int x)
      Assess if an offset is within this span
      Parameters:
      x - offest to test
      Returns:
      if this entity contains the offset
    • copy

      public void copy(TextEntity m)
      Parameters:
      m - match/entity object to copy
    • isWithin

      public boolean isWithin(TextEntity t)
    • isAfter

      public boolean isAfter(TextEntity t)
      Assuming simple whitespace separation or other simple delimiters, is this term following the argument entity?
      Parameters:
      t - other entity
      Returns:
      true if t occurs after the current entity
    • isBefore

      public boolean isBefore(TextEntity t)
      Assuming simple whitespace separation or other simple delimiters, is this term preceeding the argument entity?
      Parameters:
      t - other TextEntity
      Returns:
      true if t is before the current entity
    • isSameMatch

      public boolean isSameMatch(TextEntity t)
    • isRightMatch

      public boolean isRightMatch(TextEntity t)
    • isLeftMatch

      public boolean isLeftMatch(TextEntity t)
    • isOverlap

      public boolean isOverlap(TextEntity t)
    • isWithinChars

      public boolean isWithinChars(TextEntity t, int nchars)
      Proximity test between this text span and another This is A; B is input. use nchars=2
          AaaaaaaB               // B next to A
                 BbbbbbA         // B before A
                 Bbbbb     A     // A far from A
          Aaaaaa B               // B within nchars of A
              AaaBbbaaa          // B is inside A, so they are "within"
      
       
      Parameters:
      t - TextEntity span
      nchars - number of characters
      Returns:
      True if given entity span is within nchars, left or right