Python: package opensextant

opensextant

index
/Users/ubaldino/workspace/opensource/Xponents-Core/src/main/python/opensextant/__init__.py

# -*- coding: utf-8 -*-

Package Contents

FlexPat
TaxCat
advas_phonetics
extractors (package)
gazetteer
phonetics
unicode
utility
wordstats
xlayer

Classes



abc.ABC(builtins.object)

Extractor

builtins.object

Coordinate

Country
Place

Language
TextEntity

TextMatch

PlaceCandidate

class Coordinate(builtins.object)

    Coordinate(row, lat=None, lon=None) Convenient class for Lat/Lon pair. Expects a row dict with 'lat' and 'lon', or kwd args 'lat', 'lon' @param row default dictionary

Methods defined here:

__init__(self, row, lat=None, lon=None)
Initialize self.  See help(type(self)) for accurate signature.

__str__(self)
Return str(self).

format_coord(self)

set(self, lat, lon)
Set the location lat, lon

string_coord(self)

validate(self)

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Country(Coordinate)

    Country metadata

Method resolution order:

Country

Coordinate

builtins.object

Methods defined here:

__init__(self)
Initialize self.  See help(type(self)) for accurate signature.

__str__(self)
Return str(self).

Methods inherited from Coordinate:

format_coord(self)

set(self, lat, lon)
Set the location lat, lon

string_coord(self)

validate(self)

Data descriptors inherited from Coordinate:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Extractor(abc.ABC)


Method resolution order:

Extractor

abc.ABC

builtins.object

Methods defined here:

__init__(self)
Initialize self.  See help(type(self)) for accurate signature.

extract(self, text, **kwargs)
:param text: Unicode text input :keyword features: an array of features to extract, e.g., "coordinate", "place", "MONEY" :return: array of TextMatch

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

Data and other attributes defined here:

__abstractmethods__ = frozenset({'extract'})

class Language(builtins.object)

    Language(iso3, iso2, nmlist: list) Language Represents a single code/name pair Coding is 3-char or 2-char, either is optional. In some situations there are competeing 2-char codes in code books, such as Lib of Congress (LOC)

Methods defined here:

__init__(self, iso3, iso2, nmlist: list)
Initialize self.  See help(type(self)) for accurate signature.

__str__(self)
Return str(self).

get_name(self)

name_code(self)

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class Place(Coordinate)

    Place(pid, name, lat=None, lon=None) Location or GeoBase + Coordinate + Place + Country or Location + Coordinate    + Place    etc.  Not sure of the best data model for inheritance. This Python API hopes to simplify the concepts in the Java API.

Method resolution order:

Place

Coordinate

builtins.object

Methods defined here:

__init__(self, pid, name, lat=None, lon=None)
Initialize self.  See help(type(self)) for accurate signature.

__str__(self)
Return str(self).

format_feature(self)
Yield a consolidated feature coding. :return:  X/xxxx  format

get_location(self)
Returns (LAT, LON) tuple @return: tuple, (lat,lon)

has_coordinate(self)

set_location(self, lat, lon)

Methods inherited from Coordinate:

format_coord(self)

set(self, lat, lon)
Set the location lat, lon

string_coord(self)

validate(self)

Data descriptors inherited from Coordinate:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class PlaceCandidate(TextMatch)

    PlaceCandidate(*args, **kwargs) A TextMatch representing any geographic mention -- a Place object will represent the additional attributes for the chosen place. see also in Java org.opensextant.extractors.geo.PlaceCandidate class, which is a more in-depth version of this.  This Python class represents the response from the REST API, for example.

Method resolution order:

PlaceCandidate

TextMatch

TextEntity

builtins.object

Methods defined here:

__init__(self, *args, **kwargs)
Initialize self.  See help(type(self)) for accurate signature.

populate(self, attrs: dict)
Deserialize the attributes dict from either TextMatch schema or Place schema :param attrs: :return:

Methods inherited from TextMatch:

__str__(self)
Return str(self).

normalize(self)
Optional, but recommended routine to normalize the matched data. That is, parse fields, uppercase, streamline punctuation, etc. As well, given such normalization result, this is the opportunity to additionally validate the match. :return:

Methods inherited from TextEntity:

contains(self, x1)
if this span contains an offset x1 :param x1:

exact_match(self, t)

is_after(self, t)

is_before(self, t)

is_within(self, t)
if the given annotation, t, contains this :param t: :return:

overlaps(self, t)
Determine if t overlaps self.  If Right or Left match, t overlaps if it is longer. If t is contained entirely within self, then it is not considered overlap -- it is Contained within. :param t: :return:

Data descriptors inherited from TextEntity:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class TextEntity(builtins.object)

    TextEntity(text, start, end) A Text span. classes and routines that align with Java org.opensextant.data and org.opensextant.extraction * TextEntity: represents a span of text * TextMatch: a TextEntity matched by a particular routine.  This is the basis for most all extractors and annotators in OpenSetant.

Methods defined here:

__init__(self, text, start, end)
Initialize self.  See help(type(self)) for accurate signature.

__str__(self)
Return str(self).

contains(self, x1)
if this span contains an offset x1 :param x1:

exact_match(self, t)

is_after(self, t)

is_before(self, t)

is_within(self, t)
if the given annotation, t, contains this :param t: :return:

overlaps(self, t)
Determine if t overlaps self.  If Right or Left match, t overlaps if it is longer. If t is contained entirely within self, then it is not considered overlap -- it is Contained within. :param t: :return:

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

class TextMatch(TextEntity)

    TextMatch(*args, label=None) An entity matched by some tagger; it is a text span with lots of metadata.

Method resolution order:

TextMatch

TextEntity

builtins.object

Methods defined here:

__init__(self, *args, label=None)
Initialize self.  See help(type(self)) for accurate signature.

__str__(self)
Return str(self).

normalize(self)
Optional, but recommended routine to normalize the matched data. That is, parse fields, uppercase, streamline punctuation, etc. As well, given such normalization result, this is the opportunity to additionally validate the match. :return:

populate(self, attrs: dict)
Populate a TextMatch to normalize the set of attributes -- separate class fields on TextMatch from additional optional attributes. :param attrs: dict of standard Xponents API outputs. :return:

Methods inherited from TextEntity:

contains(self, x1)
if this span contains an offset x1 :param x1:

exact_match(self, t)

is_after(self, t)

is_before(self, t)

is_within(self, t)
if the given annotation, t, contains this :param t: :return:

overlaps(self, t)
Determine if t overlaps self.  If Right or Left match, t overlaps if it is longer. If t is contained entirely within self, then it is not considered overlap -- it is Contained within. :param t: :return:

Data descriptors inherited from TextEntity:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

Functions


add_language(lg: opensextant.Language, override=False)

atan2(y, x, /)
Return the arc tangent (measured in radians) of y/x. Unlike atan(y/x), the signs of both x and y are considered.

bbox(lat: float, lon: float, radius: int)
   Calculate coordinates for SW and NE corners of a SQUARE bounding box of edge length 2 x radius :param lat: decimal degree latitude :param lon: decimal degree longitude :param radius: meters from center point

centroid(arr: list)
:param arr:  a list of numeric coordinates (y,x) :return: Coordinate -- the average of sum(y), sum(x)

characterize_location(place: opensextant.Place, label: str)
Experimental: Not comprehensive characterization. This is intended to summarize PlaceCandidates extracted from text. Describe a Place in terms of a plain language feature type and the geographic scope or resolution. E.g, Place object "P/PPL", "city" E.g,  Place object "A/ADM4"  "admin" E.g,  Place object "S/COORD", "site" :param place: Place object :param label:  text match label, e.g., 'country', 'place', 'coord', etc. :return: feature string, resolution string

cos(x, /)
Return the cosine of x (measured in radians).

country_as_place(ctry: opensextant.Country, name: str, name_type='N', oid=None)
Convert to Place. :param ctry: Country object :param name: the name to use :param name_type: :param oid: row ID :return:

distance_cartesian(x1, y1, x2, y2)
Given X1, Y1 and X2, Y2 provide the 2-D Cartesian distance between two points.

distance_haversine(ddlon1, ddlat1, ddlon2, ddlat2)
Returns distance in meters for given decimal degree Lon/Lat (X,Y) pair http://www.movable-type.co.uk/scripts/latlong.html

format_coord(lat, lon)
2.6, 3.6 format. :param lat: latitude :param lon: longitude :return: string

geohash2point(gh)

geohash_cells(gh: str, radius: int)
For a radius in meters generate the cells contained within or touched by that radius. This is approximate precision based on: https://en.wikipedia.org/wiki/Geohash   which suggests this approximation could be done mathematically :return: Dict of 8 directionals ~ E, N, S, W; NE, SE, SW, NW.  If radius desired fits entirely within a lesser precision geohash grid, the only cell returned is "CENTROID", i.e.  radius=2000 (meters) for a geohash such as `9q5t`

geohash_cells_radially(lat: float, lon: float, radius: int)
Create a set of geohashes that contain the given area defined by lat,lon + radius

get_country(namecode, standard='ISO')
Get Country object given a name, ISO or FIPS code.  For codes, you must be clear about which standard the code is based in. Some code collisions exist. "ZZ" will NOT be returned for the empty code -- if you pass in a NULL or empty country code you may have a data quality issue. :param namecode: 2- or 3-alpha code. :param standard: 'ISO' or 'FIPS', 'name' :return:  Country object

get_lang_code(txt: str)

get_lang_name(code: str)

get_language(code: str) -> opensextant.Language
:param code: language ID or name :return: Language or None

get_province(cc, adm1)
REQUIRES you load_provinces() first.

get_us_province(adm1: str)
:param adm1:  ADM1 code or for territories, :return:

is_academic(feat_class: str, feat_code: str) -> bool
:param feat_class: geonames class :param feat_code:  geonames designation code :return:

is_administrative(feat: str)

is_country(feat_code: str)
Test a feature code

is_lang_chinese(lg: str)

is_lang_cjk(lg: str)

is_lang_english(lg: str)

is_lang_euro(lg: str)
true if lang is European -- romance, german, english, etc :param lg: :return:

is_lang_romance(lg: str)
If spanish, portuguese, italian, french, romanian

is_political(feat_code: str)
Test a feature code

is_populated(feat: str)

list_languages()
List out a flattened list of languages, de-duplicated by ISO2 language ID. TODO: alternatively list out every language :return:

load_countries(csvpath=None)
parses Xponents Core/src/main/resource CSV file country-names-2015.csv putting out an array of Country objects. :return: array of Country

load_languages()

load_major_cities()
Loads City geo/demographic information -- this does not try to parse all name variants. This produces Geonames use of FIPS codes. :return:

load_provinces()
Load, store and return a dictionary of ADM1 boundary names - provinces, states, republics, etc. NOTE: Location information is not included in this province listing.  Just Country, ADM1, Name tuples. NOTE: This reflects only GEONAMES ADMIN1 CODES ASCII -- which portrays most of the world (except US) as FIPS, not ISO. :return:  dict

load_us_provinces()
Load, store internally and return the LIST of US states. NOTE: Place objects for US States have a location (unlike list of world provinces). To get location and feature information in full, you must use the SQLITE DB or Xponents Solr. :return: array of Place objects

load_world_adm1()
Load, store and return a dictionary of ADM1 boundary names - provinces, states, republics, etc. Coding for ADM1 is FIPS based mostly :return:  dict

location_accuracy(conf, prec_err)
Both confidence and precision error are required to be non-zero and positive. Scale ACCURACY by confidence, and inversely log10( R^2 ) Decreasing accuracy with increasing radius, but keep scale on the order of visible things, e.g., 0.01 to 1.00.  This is only one definition of accuracy. Consider confidence = 100 (aka 100% chance we have the right location) * Country precision ~ +/- 100KM is accuracy = 0.091 * GPS precision is   10 M precision is accuracy 0.33 * 1M precision , accuracy =  1.0, (1 / (1+log(1*1)) = 1/1.  In other words a 1m error is basically "perfect" :param conf: confidence on 100 point scale (0-100) :param prec_err: error in location precision, meters :return:

log10(x, /)
Return the base 10 logarithm of x.

logger_config(logger_level: str, pkg: str)
LOGGING :param logger_level: :param pkg: Name of package :return:

make_HASC(cc, adm1, adm2=None)
Create a simplie hiearchical path for a boundary :param cc: :param adm1: :param adm2: :return:

mathlog = log(...)
log(x, [base=math.e]) Return the logarithm of x to the given base. If the base not specified, returns the natural logarithm (base e) of x.

parse_admin_code(adm1, delim='.')
:param delim: :param adm1: admin level 1 code :return: ADM1 code if possible.

pkg_resource_path(rsrc)

point2geohash(lat: float, lon: float, precision=6)

popscale(population, feature='city')
Given a population in context of the feature -- provide a approximation of the size of the feature on a 10 point scale. Approximations for 10 points: Largest city is ~15 million // Few cities top 30 million, e.g., 2^25.  popscale = 25 - 13 = 12. Largest province is ~135 million :param population: :param feature:  city, district, or province allowed. :return: index on 0..10 scale.

radial_geohash(lat, lon, radius)
Propose geohash cells for a given radius from a given point

radians(x, /)
Convert angle x from degrees to radians.

reduce_matches(matches)
Mark each match if it is a submatch or overlap or exact duplicate of other. :param matches: array of TextMatch (or TextEntity). This is the more object oriented version of reduce_matches_dict :return:

reduce_matches_dict(matches)
Accepts an array annotations (dict). Inserts the "submatch" flag in dict if there is a submatch (that is, if another TextEntity A wholly contains another, B -- B is a submatch). We just have to loop through half of the array ~ comparing each item to each other item once. :param matches: array of dicts.

render_match(m)
:param m: TextMatch :return: dict

sin(x, /)
Return the sine of x (measured in radians).

sqrt(x, /)
Return the square root of x.

validate_lat(f)

validate_lon(f)

Data

EARTH_RADIUS_WGS84 = 6378137.0
IGNORE_LANGUAGES = {'gaa'}
IS_DUPLICATE = 2
IS_SUBMATCH = 1
NOT_SUBMATCH = 0
PY3 = True
adm1_by_hasc = {}
countries = []
countries_by_fips = {}
countries_by_iso = {}
countries_by_name = {}
language_map = {}
usstates = {}

Data
		EARTH_RADIUS_WGS84 = 6378137.0 IGNORE_LANGUAGES = {'gaa'} IS_DUPLICATE = 2 IS_SUBMATCH = 1 NOT_SUBMATCH = 0 PY3 = True adm1_by_hasc = {} countries = [] countries_by_fips = {} countries_by_iso = {} countries_by_name = {} language_map = {} usstates = {}