Package org.opensextant.extraction
Class TextEntity
java.lang.Object
org.opensextant.extraction.TextEntity
- Direct Known Subclasses:
TextMatch
A very simple struct to hold data useful for post-processing entities once
found.
- Author:
- Marc C. Ubaldino, MITRE, ubaldino at mitre dot org
-
Field Summary
Modifier and TypeFieldDescriptionint
char offset of entity; location in document where entity ends.boolean
If this entity is a duplicate of some otherboolean
If this entity is a overlaps with some otherboolean
If this entity is contained completely within some otherchar
char immediately after spanchar
char immediately before spanint
char offset of entity; location in document where entity starts.protected String
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionboolean
contains
(int x) Assess if an offset is within this spanvoid
copy
(TextEntity m) int
get the length of the matched textgetText()
boolean
Assuming simple whitespace separation or other simple delimiters, is this term following the argument entity?boolean
isASCII()
If non-punctuation content is purely ASCII vs.boolean
Assuming simple whitespace separation or other simple delimiters, is this term preceeding the argument entity?boolean
boolean
isLower()
test If text (that has a case sense) is ALL lower caseboolean
test if text is mixed case.boolean
boolean
boolean
boolean
isUpper()
test If text (that has a case sense) is ALL upper caseboolean
boolean
isWithinChars
(TextEntity t, int nchars) Proximity test between this text span and another This is A; B is input.void
setContext
(String window) Set the context buffer from a single windowvoid
setContext
(String before, String after) Set the context with before and after windowsvoid
sets the value of the TextEntityvoid
Set just the value, without incurring the cost of other metrics or flags about the text that likely are unchanged.toString()
-
Field Details
-
text
-
start
public int startchar offset of entity; location in document where entity starts. -
end
public int endchar offset of entity; location in document where entity ends. -
postChar
public char postCharchar immediately after span -
preChar
public char preCharchar immediately before span -
match_id
-
is_submatch
public boolean is_submatchIf this entity is contained completely within some other -
is_overlap
public boolean is_overlapIf this entity is a overlaps with some other -
is_duplicate
public boolean is_duplicateIf this entity is a duplicate of some other
-
-
Constructor Details
-
TextEntity
public TextEntity(int x1, int x2) Simple Span representation.- Parameters:
x1
- start offsetx2
- end offset
-
-
Method Details
-
setText
sets the value of the TextEntity- Parameters:
t
- text
-
setTextOnly
Set just the value, without incurring the cost of other metrics or flags about the text that likely are unchanged.- Parameters:
t
- the text
-
isASCII
public boolean isASCII()If non-punctuation content is purely ASCII vs. Latin1 vs. unicode.- Returns:
- true if text value is purely ASCII
-
isLower
public boolean isLower()test If text (that has a case sense) is ALL lower case- Returns:
- true if all lower.
-
isUpper
public boolean isUpper()test If text (that has a case sense) is ALL upper case- Returns:
- true if all upper.
-
isMixedCase
public boolean isMixedCase()test if text is mixed case.- Returns:
- true if neither allower or all upper.
-
getText
- Returns:
- text, value of a TextEntity
-
getLength
public int getLength()get the length of the matched text- Returns:
- int, length
-
setContext
Set the context with before and after windows- Parameters:
before
- text before matchafter
- text after match
-
setContext
Set the context buffer from a single window- Parameters:
window
- textual window
-
getContext
- Returns:
- context buffer regardless if it is singular context or separate pre/post match
-
getContextBefore
- Returns:
- text before match
-
getContextAfter
- Returns:
- text after match
-
toString
-
contains
public boolean contains(int x) Assess if an offset is within this span- Parameters:
x
- offest to test- Returns:
- if this entity contains the offset
-
copy
- Parameters:
m
- match/entity object to copy
-
isWithin
-
isAfter
Assuming simple whitespace separation or other simple delimiters, is this term following the argument entity?- Parameters:
t
- other entity- Returns:
- true if t occurs after the current entity
-
isBefore
Assuming simple whitespace separation or other simple delimiters, is this term preceeding the argument entity?- Parameters:
t
- other TextEntity- Returns:
- true if t is before the current entity
-
isSameMatch
-
isRightMatch
-
isLeftMatch
-
isOverlap
-
isWithinChars
Proximity test between this text span and another This is A; B is input. use nchars=2AaaaaaaB // B next to A BbbbbbA // B before A Bbbbb A // A far from A Aaaaaa B // B within nchars of A AaaBbbaaa // B is inside A, so they are "within"
- Parameters:
t
- TextEntity spannchars
- number of characters- Returns:
- True if given entity span is within nchars, left or right
-