Package org.opensextant.extraction
Class TextEntity
java.lang.Object
org.opensextant.extraction.TextEntity
- Direct Known Subclasses:
TextMatch
A very simple struct to hold data useful for post-processing entities once
found.
- Author:
- Marc C. Ubaldino, MITRE, ubaldino at mitre dot org
-
Field Summary
FieldsModifier and TypeFieldDescriptionintchar offset of entity; location in document where entity ends.booleanIf this entity is a duplicate of some otherbooleanIf this entity is a overlaps with some otherbooleanIf this entity is contained completely within some othercharchar immediately after spancharchar immediately before spanintchar offset of entity; location in document where entity starts.protected String -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbooleancontains(int x) Assess if an offset is within this spanvoidcopy(TextEntity m) intget the length of the matched textgetText()booleanAssuming simple whitespace separation or other simple delimiters, is this term following the argument entity?booleanisASCII()If non-punctuation content is purely ASCII vs.booleanAssuming simple whitespace separation or other simple delimiters, is this term preceeding the argument entity?booleanbooleanisLower()test If text (that has a case sense) is ALL lower casebooleantest if text is mixed case.booleanbooleanbooleanbooleanisUpper()test If text (that has a case sense) is ALL upper casebooleanbooleanisWithinChars(TextEntity t, int nchars) Proximity test between this text span and another This is A; B is input.voidsetContext(String window) Set the context buffer from a single windowvoidsetContext(String before, String after) Set the context with before and after windowsvoidsets the value of the TextEntityvoidSet just the value, without incurring the cost of other metrics or flags about the text that likely are unchanged.toString()
-
Field Details
-
text
-
start
public int startchar offset of entity; location in document where entity starts. -
end
public int endchar offset of entity; location in document where entity ends. -
postChar
public char postCharchar immediately after span -
preChar
public char preCharchar immediately before span -
match_id
-
is_submatch
public boolean is_submatchIf this entity is contained completely within some other -
is_overlap
public boolean is_overlapIf this entity is a overlaps with some other -
is_duplicate
public boolean is_duplicateIf this entity is a duplicate of some other
-
-
Constructor Details
-
TextEntity
public TextEntity(int x1, int x2) Simple Span representation.- Parameters:
x1- start offsetx2- end offset
-
-
Method Details
-
setText
sets the value of the TextEntity- Parameters:
t- text
-
setTextOnly
Set just the value, without incurring the cost of other metrics or flags about the text that likely are unchanged.- Parameters:
t- the text
-
isASCII
public boolean isASCII()If non-punctuation content is purely ASCII vs. Latin1 vs. unicode.- Returns:
- true if text value is purely ASCII
-
isLower
public boolean isLower()test If text (that has a case sense) is ALL lower case- Returns:
- true if all lower.
-
isUpper
public boolean isUpper()test If text (that has a case sense) is ALL upper case- Returns:
- true if all upper.
-
isMixedCase
public boolean isMixedCase()test if text is mixed case.- Returns:
- true if neither allower or all upper.
-
getText
- Returns:
- text, value of a TextEntity
-
getLength
public int getLength()get the length of the matched text- Returns:
- int, length
-
setContext
Set the context with before and after windows- Parameters:
before- text before matchafter- text after match
-
setContext
Set the context buffer from a single window- Parameters:
window- textual window
-
getContext
- Returns:
- context buffer regardless if it is singular context or separate pre/post match
-
getContextBefore
- Returns:
- text before match
-
getContextAfter
- Returns:
- text after match
-
toString
-
contains
public boolean contains(int x) Assess if an offset is within this span- Parameters:
x- offest to test- Returns:
- if this entity contains the offset
-
copy
- Parameters:
m- match/entity object to copy
-
isWithin
-
isAfter
Assuming simple whitespace separation or other simple delimiters, is this term following the argument entity?- Parameters:
t- other entity- Returns:
- true if t occurs after the current entity
-
isBefore
Assuming simple whitespace separation or other simple delimiters, is this term preceeding the argument entity?- Parameters:
t- other TextEntity- Returns:
- true if t is before the current entity
-
isSameMatch
-
isRightMatch
-
isLeftMatch
-
isOverlap
-
isWithinChars
Proximity test between this text span and another This is A; B is input. use nchars=2AaaaaaaB // B next to A BbbbbbA // B before A Bbbbb A // A far from A Aaaaaa B // B within nchars of A AaaBbbaaa // B is inside A, so they are "within"- Parameters:
t- TextEntity spannchars- number of characters- Returns:
- True if given entity span is within nchars, left or right
-