Xponents

Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.

View the Project on GitHub

Xlayer: Xponents REST service

Xlayer (pr. “X Layer”) is an older name for the Xponents geotagger web service. We just call it “Xponents API” now. Under the hood the service is implemented in Java using Restlet framework and provides functionality described here in REST Docker. The remainder of this page describes the Python client and more details related to the server development. The Docker README focuses on the docker instance and the web service specification.

Contents here will help you:

Execution

In the Xponents project or distribution you’ll find the Xponents API server script:

    ./script/xlayer-server.sh  start 8080
    .... 
    ./script/xlayer-server.sh  stop 8080 

Alternatively, using Docker Compose:

   # In source tree you'll find docker-compose.yml 
   # You do not need to check out the project to use this.
   # Copy ../Examples/Docker/docker-compose.yml
   
   docker-compose up -d xponents

With the server running you will be able to test out the functions to process text and control the server (ping and stop).

GET  http://localhost:8080/xlayer/rest/process ? docid= & text= & features =
POST http://localhost:8080/xlayer/rest/process  JSON body { docid =, text =, features = }  
GET  http://localhost:8080/xlayer/rest/control/ping
GET  http://localhost:8080/xlayer/rest/control/stop

However you have started the API server, here are some test scripts to interact with it – you’ll need copies of the scripts from source tree here or from the distribution. The docker image has copies of these scripts for testing.

Xponents API Python Client

Please consider using this Python client to interact with the API server – as a reference implementation of the data model and processing/extraction services it is as complete as it needs to be. If you need or want to work in another language and want to contribute your implementation please file an issue in the GitHub project for Xponents. This page focuses on using the Python API to demonstrate the value of the Xponents geotagging and extraction approach.

[ Latest Release ] [ Python API reference ] [ Python Source ]

Install it, pip3 install opensextant-1.4.*.tar.gz. You now can make use of the opensextant.xlayer module. Here is a synopsis of the XlayerClient in action. You can also refer to the client test code where play_rest_api.py gets playful with the basics.


# Setup
from opensextant.xlayer import XlayerClient
client = XlayerClient(url)   # url = 'host:port'  or the full 'http://host:port/xlayer/rest/process'  URL.

# For each block of text:
textbuffer = " ..... "
result  = client.process("ab8ef7c...",             # document ID 
                         textbuffer,               # your raw text input
                         features=["geo",          # default is only "geo". If you want extracted features, add as you wish. 
                                   "postal", 
                                   "dates", 
                                   "taxons", 
                                   "filtered_out"])

# result is a simple array of opensextant.TextMatch
# where geotags are PlaceCandidate classes, which are subclasses of TextMatch

Let’s take this step by step. First, if you like looking at source code (see Source link above), opensextant.xlayer module is a full main program that provides additional test capabilities for command line use or to craft your own post-processing script.

Here is a exposé of the key data classes - TextMatch and PlaceCandidate:

Below “Part A” is looping through all TextMatches generically. “Part B” is much more speicific to logic for geotags aka PlaceCandidate objects. For Part B, look at the advanced geoinferencing topics that follow. More to come. WHY? Your objective with geotagging is usually to present these results in a spatial visualization or capture in a spatial database. All this metadata work will help you do that effectively.


from opensextant import TextMatch, PlaceCandidate, Place

# Let's say in the result array a TextMatch item had been created as such:
#
#   t = TextMatch("Sana'a Cafe", 55, 67)
#
# Name of a business mentioned at character span (55, 67)
# Additional metadata would be assigned internally.  See the loop example below:

# Part A -- Generic Text span interpretation
for t in result:
   # 
   if t.filtered_out:
     print("We'll ignore this tag.")
   else:     
     print(f"Match {t.id} of type {t.label}:  {t.text} at span ({t.start}, {t.end})")
     # 
     # NOTE:  For all TextMatch and subclasses `.attrs` field will contain additional metadata, including:
     #    - patterns -- ID of regex pattern
     #    - place    -- gazetter metadata and inferred precision, related geography, etc.
     #    - taxon    -- name of catalog and other metadata from taxonmomic key phrases
     # 
     print("\tATTRS", t.attrs)


# Part B -- Geo-inferencing interpretation. Looking at Countries, Placenames, Coordinates, and Postal entries
#
# Combine the loop logic as you need. This variation focuses on PlaceCandidate specifically 
for t in result:
  if not t.filtered_out and isinstance(t, PlaceCandidate):
    
    # PlaceCandidate is either a Country (or non-Coordinate location) or a Coordinate-bearing location.
    # Be sure to know the distinction. 
    if t.is_country:
      print("COUNTRY", t.text, t.place.country_code)
    else:
      print("GEOTAG", t.text)
      geo = t.place
      feature = f"{geo.feature_class}/{geo.feature_code}" 
      print(f"\tFEAT={feature} LL=({geo.lat}, {geo.lat}) in country {geo.country_code}")                  

Expert Topics in Xponents Geoinferencing

These topics are addressed here because you as the consumer of the Xponents API output need to interpret what is found in text. This is the inferencing aspect of all this – if you don’t take some action to interpret the output intelligently there really is no value or credibility of the output downstream. Review these topics for a flavor of that next level of inference follow.

Feature Class Use

Be aware that all sorts of geographic references are returned by the API service along this spectrum of about 10 M to 100 KM resolution: coordinates, postal, landmark, city, district, province, country and even region or body of water. Feature metadata (feature_class, feature_code) help distinguish types of features.

Gazetteer Metadata Use

Feature geographic metadata is encoded to ISO-3166 standards, so make use of it on these fields as noted in the schema below in REST Interface.

Precision Error and Location Uncertainty

Consider these aspects of inferred locations that have lat, lon tuple or a valid PlaceCandidate.place object:

Some Examples in this test script for nailing down this idea of location accuracy issue as it is relatively important for inferrencing spatial entities:


So if you have these situations in free text the API currently manages how location accuracy is calculated, 
But you can override that and calculate your own.  Examples:

  - a coordinate with one decimal precision, that is very confident
  - a coordinate with six decimals of precision, that is very confident
  - a coordinate with twelve decimals of precision that was formatted with default floating point precision 
  - a small city with a common name, where we are not very confident it is right
  - a landmark or park
  - mention of a large country or region

## Run: python3 test_location_accuracy.py

Important -- Look at the comments in code for each example. In the output see that 
the ACCURACY column is a result that typicallylands between 0.01 and 0.30 (on a 0.0 to 1.0 scale). 
This makes it easy to compare and visualize any geographic entity that has been inferred.

Accuracy 1.0 = 100% confident with a 1 meter of error.


EXAMPLES ....................	ACCURACY	CONF	PREC_ERR
Coord - 31.1N x 117.0W.           	0.100	90	10000
Coord - 31.123456N x 117.098765W. 	0.300	90	10
Coord - 31.123456789012N x 117.098765432101W. 	0.180	90	100
City Euguene, Oregon  .......... 	0.101	85	5000
Poblacion, ... Philippines  .... 	0.036	25	1000
Workshop of H. Wilson, Natick    	0.317	95	10
.....Khammouane.....in Laos..... 	0.058	60	50000

Xponents API Java Client

Use the opensextant-xponents maven artifact and org.opensextant.xlayer.XlayerClient(url) gives you a starting point to invoke the .process() method. API.

/* Note - import org.opensextant.output.Transforms is handling the JSON-to-Java 
 * object deserialization if for whatever reason that is wrong, you can adapt it as needed.  
 */
....

client = XlayerClient(url);
results = client.process(....);
/* Results is an array of TextMatch
   PlaceCandidate objects are subclass of TextMatch and will carry the geotagging details of geography, etc.
 */

Additional classes include:

Health Check

curl "http://localhost:8080/xlayer/rest/control/ping"

Stopping Cleanly

curl "http://localhost:8080/xlayer/rest/control/stop"

REST Interface

For example, run the Python client to see how easy it is to call the service above. Please note a Java version, XLayerClient, also exists in the src/main folder, with test code in src/test

INPUT:

OUTPUT:

Annotation schema

Geographic annotations additionally have:

Derived Postal annotations additionally have:

{
      "comment-only": "For an input 'Wellfleet, MA 02663' the individual matches will be given as normal, 
         but a composed match for the entire span will carry `related` section with 
         specific slots indicating the components of the postal match:
         
         'city', 'admin', 'country', 'postal'
       
         Each slot has the relevant `matchtext` and `match-id`. 
         Use the match-id to retrieve the full geocoding for that portion.
         The composed match here will usually carry the geocoding of the postal code.",
              
      "related": {
        "city": {
          "matchtext": "Wellfleet",
          "match-id": "place@0"
        },
        "admin": {
          "matchtext": "MA",
          "match-id": "place@11"
        },
        "postal": {
          "matchtext": "02663",
          "match-id": "postal@14"
        }
      },...
}

Non-Geographic annotations have:

Example JSON Output:

   from opensextant.xlayer import XlayerClient

   xtractor = XlayerClient(serverURL)

   # Python call -- Send the text, process the text, print the JSON response to console.
   xtractor.process("test doc#1",
     "Where is 56:08:45N, 117:33:12W?  Is it near Lisbon or closer to Saskatchewan?"
     + "Seriously, what part of Canada would you visit to see the new prime minister discus our border?"
     + "Do you think Hillary Clinton or former President Clinton have opinions on our Northern Border?")
 {
  "response": {
    "status": "ok",
    "numfound": 6
  }
  "annotations": [
    {
      /* A COORD 
       * ~~~~~~~~~~~~~~~~~~~
       */
      /* common annotation items */
      "text": " 56:08:45N, 117:33:12W",
      "type": "coordinate",
      "method": "DMS-01a",
      "length": 22,
      "offset": 8,      
      /*  annotation-specific items: */
      "cc": "CA",
      "lon": -117.55333,
      "prec": 15,
      "feat_code": "COORD",
      "lat": 56.145835,
      "adm1": "01",
      "feat_class": "S", 
      "filtered-out":false	      
    },
    {
      /* A COORD, with an indication of neighboring locales.
       *   option "revgeo" or "resolve_localities" will evoke this output for Coordinates.
       * ~~~~~~~~~~~~~~~~~~~
       */
      /* common annotation items */
      "text": " 56:08:45N, 117:33:12W",
      "type": "coordinate",
      "method": "DMS-01a",
      "length": 22,
      "offset": 8,      
      /*  annotation-specific items: */
      "cc": "CA",
      "lat": 56.145835,
      "lon": -117.55333,
      "prec": 15,
      "feat_class": "S", 
      "feat_code": "COORD",
      "adm1": "01",
      "filtered-out":false, 
      "related_place_name": "Grimshaw",
      "nearest_places": [
         {
	      "name": "Provincial Park of Alberta", 
	      "cc": "CA",
	      "adm1": "01",
	      "lat": 56.16,
	      "lon": -117.54,
	      "feat_class": "S", 
	      "feat_code": "PARK",
	      "prec": 5000,             /* Precision (in meters) is an approximate radius around the point
	                                 * that represents the total area of the feature.
	                                 */
          "distance": 3400	          /* Distance (in meters) from this nearby place to the found coordinate
                                      */
         }
         /* UP to 5 different locations */
      ]	      
    },
    
    {
      /* A PERSON 
       * ~~~~~~~~~~~~~~~~~~~
       */
      "text": "Hillary Clinton",
      "type": "taxon",
      "offset": 185,
      "length": 15,
      /*  annotation-specific items: */
      "taxon": "Person.Hillary Rodham Clinton",
      "catalog": "JRC",
      "filtered-out":false
    },
    {
      /* A PLACE 
       * ~~~~~~~~~~~~~~~~~~~
       */
      "confidence": 60,
      "cc": "PT",
      "text": "Lisbon",
      "lon": -9.13333,
      "prec": 10000,
      "length": 6,
      "feat_code": "PPLC",
      "offset": 44,
      "lat": 38.71667,
      "type": "place",
      "adm1": "14",
      "feat_class": "P",
      "filtered-out":false
    },
    {
      "confidence": 73,
      "cc": "CA",
      "text": "Saskatchewan",
      "lon": -106,
      "prec": 50000,
      "length": 12,
      "feat_code": "ADM1",
      "offset": 64,
      "lat": 54,
      "type": "place",
      "adm1": "11",
      "feat_class": "A",
      "filtered-out":false
    },
    {
      /* A COUNTRY 
       * ~~~~~~~~~~~~~~~~~~~
       */
      "cc": "CA",
      "text": "Canada",
      "length": 6,
      "type": "country",
      "offset": 101,
      "filtered-out":false
    },
    {
      "confidence": 93,
      "cc": "SA",
      "text": "Northern Border",
      "lon": 42.41667,
      "prec": 50000,
      "length": 15,
      "feat_code": "ADM1",
      "offset": 252,
      "lat": 30.25,
      "type": "place",
      "adm1": "15",
      "feat_class": "A",
      "filtered-out":false
    }
  ]
 }

Implementation

The key geocoders implemented in Xponents REST API are as follows:

INSTALLATION

Essentials:

  # Install the Python library using Pip. Pip handles installing OS-specific python resources as needed. 
  cd Xponents/
  mkdir piplib 
  pip3 install --target piplib python/opensextant-1.x.x.tar.gz
  OR 
  pip3 install --user python/opensextant-1.x.x.tar.gz

  # Run server
  ./script/xlayer-server.sh 3535 start

  # In another window, Run test client using Python.
  ./test/test-xlayer-python.sh 3535 ./test/data/randomness.txt

  # Once done, run the Java client 
  ./test/test-xlayer-java.sh 3535 ./test/data/randomness.txt


These are limited examples.  If you want to demonstrate running client and server on 
different hosts which is more realistic, by all means adapt the shell scripts as needed.

Rather than use shell scripting, we have used Groovy and Ant to simplify these tests for Java.
As these are for demonstration only, we do not intend to generalize the scripting beyond this.

History