Xponents

Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.

View the Project on GitHub

XCOORD PATTERNS

XCoord is a geographic coordinate extractor. It finds the most common coordinate patterns in free text. That is, if you want to geocode documents, chat messages, bulletins, etc that contain degrees/minute/seconds, decimal degrees or military grids (MGRS) you will want to use something like XCoord.

Try the XCoord demo in Examples source folder or in the distribution, as below.
The XCoord function is integrated fully with Xponents SDK and REST API by default.


    Input files that contain various coordinate patterns to see how the extraction behaves.
    
    ./xponents-demo.sh xcoord -h    

        TestXCoord  -f       -- system tests
        TestXCoord  -i TEXT  -- user test 
        TestXCoord  -t FILE  -- user test with file; one test per line
        TestXCoord  -t FILE  -- user test with file
        TestXCoord  -a       -- adhoc tests, e.g., recompiling code and testing
        TestXCoord  -h       -- help. 

Coordinate Rule Library

The main program and class XCoord is at XCoord while the accompanying patterns and rules file is geocoord_patterns.cfg .

For reference, review the XCoord DEFINES as you review RULES. There are subtle variations in field definitions.

For brevity sake, only true positive tests are included. “FAIL” tests or true negatives are omitted.
One test case per RULE is provided to illustrate each pattern. Sources of patterns are derived from federal research projects performed by the MITRE Corporation.

These five families of patterns are supported:

Conventions in pattern IDs. Each pattern is enumerated with the its family; Additional nomenclature includes:

Table 1. Sample Listing of XCoord Patterns and Example Targets for Extraction

Family Pattern ID Example
MGRS pattern    
MGRS MGRS-01 38SMB4611036560


   
UTM pattern    
UTM UTM-01 17N 699990 3333335
// Zone/Latitude band + northing + easting; Optionally with units “m”
// for meters and or N/E marker


   
Degree-Minute-Second patterns    
DMS DMS-01fs-a, DMS-01fs-b 01°44'55.5"N 101°22'33.0"E
N01°44'55.5" E101°22'33.0"
// fractional second resolution, w/hash marks, with hemisphere
DMS DMS-01fs-deg 01°44'55.5" 101°22'33.0"
// fractional second resolution, w/hash marks, NO hemisphere
DMS DMS-01dot-a, DMS-01dot-b 01.44.55N 055.44.33E
N01.44.55 E055.44.33
// explicit dot separator
DMS DMS-02 N42 18' 00" W102 24' 00" // variable length fields with separators and hemisphere
DMS DMS-01a, DMS-02a 421800N 1022400W
N421800 W1022400
// no field separators, D/M/S
DMS DMS-03a, DMS-03b 4218001234N 10224001234W
N4218001234 W10224001234
// no field separators; D/M/S.ss assummed


   
Degree-Minute patterns    
DM DM-00 4218N-009 10224W-003
// obscure fractional minute notation
DM DM-01a, DM-01a-dash, DM-01a-dot;
DM-01b, DM-01b-dash, DM-01b-dot
42 18-009N 102 24-003W
42-18-009N; 102-24-003W
42.18.009N 102.24.003W
// Ambiguous fractional minute separator
// is handled with distinct patterns

N4218.009W10224.003
N42 18-005 x W102 24-008
N42.18.005 x W102.24.008
DM DM-02a, DM-02b, DM-02b-dash 4218.009N 10224.003W
N4218.0 W10224.0
N4218-0018 W10224-0444
// 02a/b allows for fixed-width D/M without separators.
DM DM-03a, DM-03b 4218009N10224003W
N4218009W10224003
// Fixed-width patten for D/M.mmm
DM DM-03-av, DM-03-av-deg, DM-03-av-decdm N42 18' W102 24'
42° 18' 102° 24'
42° 18.44' 102° 24.11'
// D/M pattern with explicit hashmarks and separators
// 03-av-decdm is pattern with NO hemisphere
DM DM-03-bv 42° 18'N 102° 24'W
// trailing hemisphere, minute resolution
DM DM-04a, DM-04b N4218 W10224
4218N 10224W
// trivial DMH or HDM pattern.
DM DM-05 /4218N4/10224W5/
// Rare military format with checksum value.
DM DM-06 OBE
DM DM-07 42 DEG 18.0N 102 DEG 24.0W
// ‘DEG’ spelled out. fractional minute resolution
DM DM-08 +42 18.0 x -102 24.0


   
Decimal Degree patterns    
DD DD-01 N42.3, W102.4
DD DD-02 ` 42.3N; 102.4W `
DD DD-03 +42.3°;-102.4°
// explicit degree notation required, otherwise it is just a pair
// of floating point numbers.
DD DD-04 Latitude: N42.3° x Longitude: W102.3°
// Lat/Lon fields in text, decimal degree resolution
DD DD-05 N42°, W102°
DD DD-06 42° N, 102° W
DD DD-07 N42, W102
END    

XCOORD RELEASE NOTES

XCoord 2.1 through 2.5 June 2014

XCoord 2.0 2013-July Independence Release

XCoord 1.6 2013-MAR St Patrick’s release

XCoord v1.5 2013-JAN MLK release

Major improvements

XCoord v1.3, 2012-10-27 THANKSGIVING release

Minor improvements:

XCoord v1.2, 2012-10-31 HALLOWEEN release

Major improvements:

Details: