opensextant
index
/Users/ubaldino/workspace/opensource/Xponents/python/opensextant/__init__.py

# -*- coding: utf-8 -*-

 
Package Contents
       
FlexPat
TaxCat
advas_phonetics
extractors (package)
gazetteer
phonetics
unicode
utility
wordstats
xlayer

 
Classes
       
abc.ABC(builtins.object)
Extractor
builtins.object
Coordinate
Country
Place
TextEntity
TextMatch

 
class Coordinate(builtins.object)
    Coordinate(row, lat=None, lon=None)
 
Convenient class for Lat/Lon pair.
Expects a row dict with 'lat' and 'lon',
or kwd args 'lat', 'lon'
@param row default dictionary
 
  Methods defined here:
__init__(self, row, lat=None, lon=None)
Initialize self.  See help(type(self)) for accurate signature.
__str__(self)
Return str(self).
format_coord(self)
set(self, lat, lon)
Set the location lat, lon
string_coord(self)
validate(self)

Data descriptors defined here:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
class Country(Coordinate)
    Country metadata
 
 
Method resolution order:
Country
Coordinate
builtins.object

Methods defined here:
__init__(self)
Initialize self.  See help(type(self)) for accurate signature.
__str__(self)
Return str(self).

Methods inherited from Coordinate:
format_coord(self)
set(self, lat, lon)
Set the location lat, lon
string_coord(self)
validate(self)

Data descriptors inherited from Coordinate:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
class Extractor(abc.ABC)
    
Method resolution order:
Extractor
abc.ABC
builtins.object

Methods defined here:
__init__(self)
Initialize self.  See help(type(self)) for accurate signature.
extract(self, text, **kwargs)
:param text: Unicode text input
:keyword features: an array of features to extract, e.g., "coordinate", "place", "MONEY"
:return: array of TextMatch

Data descriptors defined here:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

Data and other attributes defined here:
__abstractmethods__ = frozenset({'extract'})

 
class Place(Coordinate)
    Place(pid, name, lat=None, lon=None)
 
Location or GeoBase
Coordinate
Place
Country
 
or
Location
Coordinate
   + Place
 
   etc.  Not sure of the best data model for inheritance.
This Python API hopes to simplify the concepts in the Java API.
 
 
Method resolution order:
Place
Coordinate
builtins.object

Methods defined here:
__init__(self, pid, name, lat=None, lon=None)
Initialize self.  See help(type(self)) for accurate signature.
__str__(self)
Return str(self).
get_location(self)
Returns (LAT, LON) tuple
@return: tuple, (lat,lon)
has_coordinate(self)
set_location(self, lat, lon)

Methods inherited from Coordinate:
format_coord(self)
set(self, lat, lon)
Set the location lat, lon
string_coord(self)
validate(self)

Data descriptors inherited from Coordinate:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
class TextEntity(builtins.object)
    TextEntity(text, start, end)
 
A Text span.
 
classes and routines that align with Java org.opensextant.data and org.opensextant.extraction
 
TextEntity: represents a span of text
TextMatch: a TextEntity matched by a particular routine.  This is the basis for most all
extractors and annotators in OpenSetant.
 
  Methods defined here:
__init__(self, text, start, end)
Initialize self.  See help(type(self)) for accurate signature.
__str__(self)
Return str(self).
contains(self, x1)
if this span contains an offset x1
:param x1:
exact_match(self, t)
is_after(self, t)
is_before(self, t)
is_within(self, t)
if the given annotation, t, contains this
:param t:
:return:
overlaps(self, t)
Determine if t overlaps self.  If Right or Left match, t overlaps if it is longer.
If t is contained entirely within self, then it is not considered overlap -- it is Contained within.
:param t:
:return:

Data descriptors defined here:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
class TextMatch(TextEntity)
    TextMatch(*args, label=None)
 
An entity matched by some tagger; it is a text span with lots of metadata.
 
 
Method resolution order:
TextMatch
TextEntity
builtins.object

Methods defined here:
__init__(self, *args, label=None)
Initialize self.  See help(type(self)) for accurate signature.
__str__(self)
Return str(self).
normalize(self)
Optional, but recommended routine to normalize the matched data.
That is, parse fields, uppercase, streamline punctuation, etc.
As well, given such normalization result, this is the opportunity to additionally
validate the match.
:return:
populate(self, attrs)
Populate a TextMatch to normalize the set of attributes -- separate class fields on TextMatch from additional
optional attributes.
:param attrs:
:return:

Methods inherited from TextEntity:
contains(self, x1)
if this span contains an offset x1
:param x1:
exact_match(self, t)
is_after(self, t)
is_before(self, t)
is_within(self, t)
if the given annotation, t, contains this
:param t:
:return:
overlaps(self, t)
Determine if t overlaps self.  If Right or Left match, t overlaps if it is longer.
If t is contained entirely within self, then it is not considered overlap -- it is Contained within.
:param t:
:return:

Data descriptors inherited from TextEntity:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
Functions
       
as_place(ctry: opensextant.Country, name: str, name_type='N', oid=None)
Convert to Place.
:param ctry: Country object
:param name: the name to use
:param name_type:
:param oid: row ID
:return:
atan2(y, x, /)
Return the arc tangent (measured in radians) of y/x.
 
Unlike atan(y/x), the signs of both x and y are considered.
cos(x, /)
Return the cosine of x (measured in radians).
distance_cartesian(x1, y1, x2, y2)
Given X1, Y1 and X2, Y2 provide the 2-D Cartesian distance between two points.
distance_haversine(ddlon1, ddlat1, ddlon2, ddlat2)
Returns distance in meters for given decimal degree Lon/Lat (X,Y) pair
 
http://www.movable-type.co.uk/scripts/latlong.html
format_coord(lat, lon)
2.6, 3.6 format.
:param lat: latitude
:param lon: longitude
:return: string
get_country(namecode, standard='ISO')
Get Country object given a name, ISO or FIPS code.  For codes, you must be
clear about which standard the code is based in. Some code collisions exist.
"ZZ" will NOT be returned for the empty code -- if you pass in a NULL or empty
country code you may have a data quality issue.
:param namecode: 2- or 3-alpha code.
:param standard: 'ISO' or 'FIPS', 'name'
:return:  Country object
get_province(cc, adm1)
REQUIRES you load_provinces() first.
get_us_province(adm1: str)
:param adm1:  ADM1 code or for territories,
:return:
is_administrative(feat: str)
is_country(feat_code: str)
Test a feature code
is_populated(feat: str)
load_countries(csvpath=None)
parses Xponents Core/src/main/resource CSV file country-names-2015.csv
putting out an array of Country objects.
:return: array of Country
load_major_cities()
Loads City geo/demographic information -- this does not try to parse all name variants.
:return:
load_provinces()
Load, store and return a dictionary of ADM1 boundary names - provinces, states, republics, etc.
NOTE: Location information is not included in this province listing.  Just Country, ADM1, Name tuples.
:return:  dict
load_us_provinces()
Load, store internally and return the LIST of US states.
NOTE: Place objects for US States have a location (unlike list of world provinces).
To get location and feature information in full, you must use the SQLITE DB or Xponents Solr.
:return: array of Place objects
load_world_adm1()
Load, store and return a dictionary of ADM1 boundary names - provinces, states, republics, etc.
:return:  dict
make_HASC(cc, adm1, adm2=None)
Create a simplie hiearchical path for a boundary
:param cc:
:param adm1:
:param adm2:
:return:
mathlog = log(...)
log(x, [base=math.e])
Return the logarithm of x to the given base.
 
If the base not specified, returns the natural logarithm (base e) of x.
popscale(population, feature='city')
Given a population in context of the feature -- provide a
approximation of the size of the feature on a 10 point scale.
 
Approximations for 10 points:
Largest city is ~15 million
// Few cities top 30 million, e.g., 2^25.  popscale = 25 - 13 = 12.
Largest province is ~135 million
 
:param population:
:param feature:  city, district, or province allowed.
:return: index on 0..10 scale.
radians(x, /)
Convert angle x from degrees to radians.
render_match(m)
:param m: TextMatch
:return: dict
sin(x, /)
Return the sine of x (measured in radians).
sqrt(x, /)
Return the square root of x.
validate_lat(f)
validate_lon(f)

 
Data
        EARTH_RADIUS_WGS84 = 6378137.0
PY3 = True
adm1_by_hasc = {}
countries = []
countries_by_fips = {}
countries_by_iso = {}
countries_by_name = {}
usstates = {}