Class TextEntity

  • Direct Known Subclasses:
    TextMatch

    public class TextEntity
    extends java.lang.Object
    A very simple struct to hold data useful for post-processing entities once found.
    Author:
    Marc C. Ubaldino, MITRE, ubaldino at mitre dot org
    • Field Summary

      Fields 
      Modifier and Type Field Description
      int end
      char offset of entity; location in document where entity ends.
      boolean is_duplicate
      If this entity is a duplicate of some other
      boolean is_overlap
      If this entity is a overlaps with some other
      boolean is_submatch
      If this entity is contained completely within some other
      java.lang.String match_id  
      char postChar
      char immediately after span
      char preChar
      char immediately before span
      int start
      char offset of entity; location in document where entity starts.
      protected java.lang.String text  
    • Constructor Summary

      Constructors 
      Constructor Description
      TextEntity​(int x1, int x2)
      Simple Span representation.
    • Method Summary

      Modifier and Type Method Description
      boolean contains​(int x)
      Assess if an offset is within this span
      void copy​(TextEntity m)  
      java.lang.String getContext()  
      java.lang.String getContextAfter()  
      java.lang.String getContextBefore()  
      int getLength()
      get the length of the matched text
      java.lang.String getText()  
      boolean isAfter​(TextEntity t)
      Assuming simple whitespace separation or other simple delimiters, is this term following the argument entity?
      boolean isASCII()
      If non-punctuation content is purely ASCII vs.
      boolean isBefore​(TextEntity t)
      Assuming simple whitespace separation or other simple delimiters, is this term preceeding the argument entity?
      boolean isLeftMatch​(TextEntity t)  
      boolean isLower()
      test If text (that has a case sense) is ALL lower case
      boolean isMixedCase()
      test if text is mixed case.
      boolean isOverlap​(TextEntity t)  
      boolean isRightMatch​(TextEntity t)  
      boolean isSameMatch​(TextEntity t)  
      boolean isUpper()
      test If text (that has a case sense) is ALL upper case
      boolean isWithin​(TextEntity t)  
      boolean isWithinChars​(TextEntity t, int nchars)
      Proximity test between this text span and another This is A; B is input.
      void setContext​(java.lang.String window)
      Set the context buffer from a single window
      void setContext​(java.lang.String before, java.lang.String after)
      Set the context with before and after windows
      void setText​(java.lang.String t)
      sets the value of the TextEntity
      void setTextOnly​(java.lang.String t)
      Set just the value, without incurring the cost of other metrics or flags about the text that likely are unchanged.
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • text

        protected java.lang.String text
      • start

        public int start
        char offset of entity; location in document where entity starts.
      • end

        public int end
        char offset of entity; location in document where entity ends.
      • postChar

        public char postChar
        char immediately after span
      • preChar

        public char preChar
        char immediately before span
      • match_id

        public java.lang.String match_id
      • is_submatch

        public boolean is_submatch
        If this entity is contained completely within some other
      • is_overlap

        public boolean is_overlap
        If this entity is a overlaps with some other
      • is_duplicate

        public boolean is_duplicate
        If this entity is a duplicate of some other
    • Constructor Detail

      • TextEntity

        public TextEntity​(int x1,
                          int x2)
        Simple Span representation.
        Parameters:
        x1 - start offset
        x2 - end offset
    • Method Detail

      • setText

        public void setText​(java.lang.String t)
        sets the value of the TextEntity
        Parameters:
        t - text
      • setTextOnly

        public void setTextOnly​(java.lang.String t)
        Set just the value, without incurring the cost of other metrics or flags about the text that likely are unchanged.
        Parameters:
        t - the text
      • isASCII

        public boolean isASCII()
        If non-punctuation content is purely ASCII vs. Latin1 vs. unicode.
        Returns:
        true if text value is purely ASCII
      • isLower

        public boolean isLower()
        test If text (that has a case sense) is ALL lower case
        Returns:
        true if all lower.
      • isUpper

        public boolean isUpper()
        test If text (that has a case sense) is ALL upper case
        Returns:
        true if all upper.
      • isMixedCase

        public boolean isMixedCase()
        test if text is mixed case.
        Returns:
        true if neither allower or all upper.
      • getText

        public java.lang.String getText()
        Returns:
        text, value of a TextEntity
      • getLength

        public int getLength()
        get the length of the matched text
        Returns:
        int, length
      • setContext

        public void setContext​(java.lang.String before,
                               java.lang.String after)
        Set the context with before and after windows
        Parameters:
        before - text before match
        after - text after match
      • setContext

        public void setContext​(java.lang.String window)
        Set the context buffer from a single window
        Parameters:
        window - textual window
      • getContext

        public java.lang.String getContext()
        Returns:
        context buffer regardless if it is singular context or separate pre/post match
      • getContextBefore

        public java.lang.String getContextBefore()
        Returns:
        text before match
      • getContextAfter

        public java.lang.String getContextAfter()
        Returns:
        text after match
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
        Returns:
        string representation of entity
      • contains

        public boolean contains​(int x)
        Assess if an offset is within this span
        Parameters:
        x - offest to test
        Returns:
        if this entity contains the offset
      • copy

        public void copy​(TextEntity m)
        Parameters:
        m - match/entity object to copy
      • isWithin

        public boolean isWithin​(TextEntity t)
      • isAfter

        public boolean isAfter​(TextEntity t)
        Assuming simple whitespace separation or other simple delimiters, is this term following the argument entity?
        Parameters:
        t - other entity
        Returns:
        true if t occurs after the current entity
      • isBefore

        public boolean isBefore​(TextEntity t)
        Assuming simple whitespace separation or other simple delimiters, is this term preceeding the argument entity?
        Parameters:
        t - other TextEntity
        Returns:
        true if t is before the current entity
      • isSameMatch

        public boolean isSameMatch​(TextEntity t)
      • isRightMatch

        public boolean isRightMatch​(TextEntity t)
      • isLeftMatch

        public boolean isLeftMatch​(TextEntity t)
      • isOverlap

        public boolean isOverlap​(TextEntity t)
      • isWithinChars

        public boolean isWithinChars​(TextEntity t,
                                     int nchars)
        Proximity test between this text span and another This is A; B is input. use nchars=2
            AaaaaaaB               // B next to A
                   BbbbbbA         // B before A
                   Bbbbb     A     // A far from A
            Aaaaaa B               // B within nchars of A
                AaaBbbaaa          // B is inside A, so they are "within"
        
         
        Parameters:
        t - TextEntity span
        nchars - number of characters
        Returns:
        True if given entity span is within nchars, left or right