Package org.opensextant.extraction
Class TextEntity
- java.lang.Object
-
- org.opensextant.extraction.TextEntity
-
- Direct Known Subclasses:
TextMatch
public class TextEntity extends java.lang.Object
A very simple struct to hold data useful for post-processing entities once found.- Author:
- Marc C. Ubaldino, MITRE, ubaldino at mitre dot org
-
-
Field Summary
Fields Modifier and Type Field Description int
end
char offset of entity; location in document where entity ends.boolean
is_duplicate
If this entity is a duplicate of some otherboolean
is_overlap
If this entity is a overlaps with some otherboolean
is_submatch
If this entity is contained completely within some otherjava.lang.String
match_id
char
postChar
char immediately after spanchar
preChar
char immediately before spanint
start
char offset of entity; location in document where entity starts.protected java.lang.String
text
-
Constructor Summary
Constructors Constructor Description TextEntity(int x1, int x2)
Simple Span representation.
-
Method Summary
Modifier and Type Method Description boolean
contains(int x)
Assess if an offset is within this spanvoid
copy(TextEntity m)
java.lang.String
getContext()
java.lang.String
getContextAfter()
java.lang.String
getContextBefore()
int
getLength()
get the length of the matched textjava.lang.String
getText()
boolean
isAfter(TextEntity t)
Assuming simple whitespace separation or other simple delimiters, is this term following the argument entity?boolean
isASCII()
If non-punctuation content is purely ASCII vs.boolean
isBefore(TextEntity t)
Assuming simple whitespace separation or other simple delimiters, is this term preceeding the argument entity?boolean
isLeftMatch(TextEntity t)
boolean
isLower()
test If text (that has a case sense) is ALL lower caseboolean
isMixedCase()
test if text is mixed case.boolean
isOverlap(TextEntity t)
boolean
isRightMatch(TextEntity t)
boolean
isSameMatch(TextEntity t)
boolean
isUpper()
test If text (that has a case sense) is ALL upper caseboolean
isWithin(TextEntity t)
boolean
isWithinChars(TextEntity t, int nchars)
Proximity test between this text span and another This is A; B is input.void
setContext(java.lang.String window)
Set the context buffer from a single windowvoid
setContext(java.lang.String before, java.lang.String after)
Set the context with before and after windowsvoid
setText(java.lang.String t)
sets the value of the TextEntityvoid
setTextOnly(java.lang.String t)
Set just the value, without incurring the cost of other metrics or flags about the text that likely are unchanged.java.lang.String
toString()
-
-
-
Field Detail
-
text
protected java.lang.String text
-
start
public int start
char offset of entity; location in document where entity starts.
-
end
public int end
char offset of entity; location in document where entity ends.
-
postChar
public char postChar
char immediately after span
-
preChar
public char preChar
char immediately before span
-
match_id
public java.lang.String match_id
-
is_submatch
public boolean is_submatch
If this entity is contained completely within some other
-
is_overlap
public boolean is_overlap
If this entity is a overlaps with some other
-
is_duplicate
public boolean is_duplicate
If this entity is a duplicate of some other
-
-
Method Detail
-
setText
public void setText(java.lang.String t)
sets the value of the TextEntity- Parameters:
t
- text
-
setTextOnly
public void setTextOnly(java.lang.String t)
Set just the value, without incurring the cost of other metrics or flags about the text that likely are unchanged.- Parameters:
t
- the text
-
isASCII
public boolean isASCII()
If non-punctuation content is purely ASCII vs. Latin1 vs. unicode.- Returns:
- true if text value is purely ASCII
-
isLower
public boolean isLower()
test If text (that has a case sense) is ALL lower case- Returns:
- true if all lower.
-
isUpper
public boolean isUpper()
test If text (that has a case sense) is ALL upper case- Returns:
- true if all upper.
-
isMixedCase
public boolean isMixedCase()
test if text is mixed case.- Returns:
- true if neither allower or all upper.
-
getText
public java.lang.String getText()
- Returns:
- text, value of a TextEntity
-
getLength
public int getLength()
get the length of the matched text- Returns:
- int, length
-
setContext
public void setContext(java.lang.String before, java.lang.String after)
Set the context with before and after windows- Parameters:
before
- text before matchafter
- text after match
-
setContext
public void setContext(java.lang.String window)
Set the context buffer from a single window- Parameters:
window
- textual window
-
getContext
public java.lang.String getContext()
- Returns:
- context buffer regardless if it is singular context or separate pre/post match
-
getContextBefore
public java.lang.String getContextBefore()
- Returns:
- text before match
-
getContextAfter
public java.lang.String getContextAfter()
- Returns:
- text after match
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
- Returns:
- string representation of entity
-
contains
public boolean contains(int x)
Assess if an offset is within this span- Parameters:
x
- offest to test- Returns:
- if this entity contains the offset
-
copy
public void copy(TextEntity m)
- Parameters:
m
- match/entity object to copy
-
isWithin
public boolean isWithin(TextEntity t)
-
isAfter
public boolean isAfter(TextEntity t)
Assuming simple whitespace separation or other simple delimiters, is this term following the argument entity?- Parameters:
t
- other entity- Returns:
- true if t occurs after the current entity
-
isBefore
public boolean isBefore(TextEntity t)
Assuming simple whitespace separation or other simple delimiters, is this term preceeding the argument entity?- Parameters:
t
- other TextEntity- Returns:
- true if t is before the current entity
-
isSameMatch
public boolean isSameMatch(TextEntity t)
-
isRightMatch
public boolean isRightMatch(TextEntity t)
-
isLeftMatch
public boolean isLeftMatch(TextEntity t)
-
isOverlap
public boolean isOverlap(TextEntity t)
-
isWithinChars
public boolean isWithinChars(TextEntity t, int nchars)
Proximity test between this text span and another This is A; B is input. use nchars=2AaaaaaaB // B next to A BbbbbbA // B before A Bbbbb A // A far from A Aaaaaa B // B within nchars of A AaaBbbaaa // B is inside A, so they are "within"
- Parameters:
t
- TextEntity spannchars
- number of characters- Returns:
- True if given entity span is within nchars, left or right
-
-