opensextant.extractors.poli
index
./python/opensextant/extractors/poli.py

These "Patterns of Life" are strictly examples to illustrate more general reg-ex patterns
 (that is more general than the coordinate and date/time patterns).
 
 The main objective here is to show how beyond basic regex matches, we can add important business logic to the
 entity extraction process.

 
Classes
        
opensextant.FlexPat.PatternMatch(opensextant.TextMatch)
MACAddress
Money
TelephoneNumber
opensextant.FlexPat.RegexPatternManager(builtins.object)
PatternsOfLifeManager

 
class MACAddress(opensextant.FlexPat.PatternMatch)
    MACAddress(*args, **kwargs)
 
A general Pattern-based TextMatch.
This Python variation consolidates PoliMatch (patterns-of-life = poli) ideas in the Java API.
 
 
Method resolution order:
MACAddress
opensextant.FlexPat.PatternMatch
opensextant.TextMatch
opensextant.TextEntity
builtins.object

Methods defined here:
__init__(self, *args, **kwargs)
Initialize self.  See help(type(self)) for accurate signature.
normalize(self)
Optional, but recommended routine to normalize the matched data.
That is, parse fields, uppercase, streamline punctuation, etc.
As well, given such normalization result, this is the opportunity to additionally
validate the match.
:return:

Methods inherited from opensextant.FlexPat.PatternMatch:
attributes(self)
Render domain details to meaningful exported view of the data.
:return:
get_value(self, k)
Get Slot value -- returns first one.
:param k:
:return:

Data and other attributes inherited from opensextant.FlexPat.PatternMatch:
FOUND_CASE = 0
LOWER_CASE = 2
UPPER_CASE = 1

Methods inherited from opensextant.TextMatch:
__str__(self)
Return str(self).
populate(self, attrs)
Populate a TextMatch to normalize the set of attributes -- separate class fields on TextMatch from additional
optional attributes.
:param attrs:
:return:

Methods inherited from opensextant.TextEntity:
contains(self, x1)
if this span contains an offset x1
:param x1:
exact_match(self, t)
is_after(self, t)
is_before(self, t)
is_within(self, t)
if the given annotation, t, contains this
:param t:
:return:
overlaps(self, t)
Determine if t overlaps self.  If Right or Left match, t overlaps if it is longer.
If t is contained entirely within self, then it is not considered overlap -- it is Contained within.
:param t:
:return:

Data descriptors inherited from opensextant.TextEntity:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
class Money(opensextant.FlexPat.PatternMatch)
    Money(*args, **kwargs)
 
A general Pattern-based TextMatch.
This Python variation consolidates PoliMatch (patterns-of-life = poli) ideas in the Java API.
 
 
Method resolution order:
Money
opensextant.FlexPat.PatternMatch
opensextant.TextMatch
opensextant.TextEntity
builtins.object

Methods defined here:
__init__(self, *args, **kwargs)
Initialize self.  See help(type(self)) for accurate signature.
normalize(self)
Optional, but recommended routine to normalize the matched data.
That is, parse fields, uppercase, streamline punctuation, etc.
As well, given such normalization result, this is the opportunity to additionally
validate the match.
:return:

Methods inherited from opensextant.FlexPat.PatternMatch:
attributes(self)
Render domain details to meaningful exported view of the data.
:return:
get_value(self, k)
Get Slot value -- returns first one.
:param k:
:return:

Data and other attributes inherited from opensextant.FlexPat.PatternMatch:
FOUND_CASE = 0
LOWER_CASE = 2
UPPER_CASE = 1

Methods inherited from opensextant.TextMatch:
__str__(self)
Return str(self).
populate(self, attrs)
Populate a TextMatch to normalize the set of attributes -- separate class fields on TextMatch from additional
optional attributes.
:param attrs:
:return:

Methods inherited from opensextant.TextEntity:
contains(self, x1)
if this span contains an offset x1
:param x1:
exact_match(self, t)
is_after(self, t)
is_before(self, t)
is_within(self, t)
if the given annotation, t, contains this
:param t:
:return:
overlaps(self, t)
Determine if t overlaps self.  If Right or Left match, t overlaps if it is longer.
If t is contained entirely within self, then it is not considered overlap -- it is Contained within.
:param t:
:return:

Data descriptors inherited from opensextant.TextEntity:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
class PatternsOfLifeManager(opensextant.FlexPat.RegexPatternManager)
    PatternsOfLifeManager(cfg)
 
RegexPatternManager is the patterns configuration file parser.
See documentation: https://opensextant.github.io/Xponents/doc/Patterns.md
 
 
Method resolution order:
PatternsOfLifeManager
opensextant.FlexPat.RegexPatternManager
builtins.object

Methods defined here:
__init__(self, cfg)
Call as
    mgr = PatternsOfLifeManager("poli_patterns.cfg")
    patternsApp = PatternExtractor( mgr )
 
    test_results = patternsApp.default_tests()
    real_results = patternsApp.extract( ".... text blob..." )
 
:param cfg: patterns config file.

Methods inherited from opensextant.FlexPat.RegexPatternManager:
create_pattern(self, fam, rule, desc)
Override pattern class creation as needed.
create_testcase(self, tid, fam, text)
disable_all(self)
enable_all(self)
get_pattern(self, pid)
validate_pattern(self, repat)
Default validation is True
Override this if necessary, e.g., pattern implementation has additional metadata

Data descriptors inherited from opensextant.FlexPat.RegexPatternManager:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
class TelephoneNumber(opensextant.FlexPat.PatternMatch)
    TelephoneNumber(*args, **kwargs)
 
A general Pattern-based TextMatch.
This Python variation consolidates PoliMatch (patterns-of-life = poli) ideas in the Java API.
 
 
Method resolution order:
TelephoneNumber
opensextant.FlexPat.PatternMatch
opensextant.TextMatch
opensextant.TextEntity
builtins.object

Methods defined here:
__init__(self, *args, **kwargs)
Initialize self.  See help(type(self)) for accurate signature.
normalize(self)
Optional, but recommended routine to normalize the matched data.
That is, parse fields, uppercase, streamline punctuation, etc.
As well, given such normalization result, this is the opportunity to additionally
validate the match.
:return:

Methods inherited from opensextant.FlexPat.PatternMatch:
attributes(self)
Render domain details to meaningful exported view of the data.
:return:
get_value(self, k)
Get Slot value -- returns first one.
:param k:
:return:

Data and other attributes inherited from opensextant.FlexPat.PatternMatch:
FOUND_CASE = 0
LOWER_CASE = 2
UPPER_CASE = 1

Methods inherited from opensextant.TextMatch:
__str__(self)
Return str(self).
populate(self, attrs)
Populate a TextMatch to normalize the set of attributes -- separate class fields on TextMatch from additional
optional attributes.
:param attrs:
:return:

Methods inherited from opensextant.TextEntity:
contains(self, x1)
if this span contains an offset x1
:param x1:
exact_match(self, t)
is_after(self, t)
is_before(self, t)
is_within(self, t)
if the given annotation, t, contains this
:param t:
:return:
overlaps(self, t)
Determine if t overlaps self.  If Right or Left match, t overlaps if it is longer.
If t is contained entirely within self, then it is not considered overlap -- it is Contained within.
:param t:
:return:

Data descriptors inherited from opensextant.TextEntity:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)