Geocoding Coordinates in Text with XCoord

Author: Marc. C. Ubaldino, MITRE Corporation
Date: 2014-June;  updated 2017-August
Copyright MITRE Corporation, 2012-2017

XCoord is a geographic coordinate extractor.  It finds the most common coordinate patterns in free text.  That is, if you want to geocode documents, chat messages, bulletins, etc that contain degrees/minute/seconds, decimal degrees or military grids (MGRS) you will want to use something like XCoord.  XCoord latest major version is in Xponents 2.9.

Synopsis:

# compile from source, if needed.
mvn install

# With an Xponents full release,  runs system tests
ant -f ./script/testing.xml  test-xcoord

# Run XCoord on your own file
ant -f ./script/testing.xml  xcoord
> file?   mytestdoc.txt

In any case, find your results as a CSV in the ./results folder.

Coordinate Rule Library

The main XCoord patterns and rules file is geocoord_patterns.cfg

For reference, review the XCoord DEFINES as you review RULES.  There are subtle variations in field definitions.

For brevity sake,  only true positive tests are included.  "FAIL" tests or true negatives are omitted.  One test case per RULE is provided to illustrate each pattern.  Sources of patterns are derived from federal research projects performed by the MITRE Corporation.

These five families of patterns are supported:


Appendix A.  Sample Coordinate Patterns

Conventions in pattern IDs.  Each pattern is enumerated with the its family; Additional nomenclature includes:

Table 1.  Sample Listing of XCoord v1.3 Patterns and Example Targets for Extraction

Family
Pattern ID
Example


MGRS pattern
MGRS
MGRS-01
38SMB4611036560


UTM pattern
UTM
UTM-01
17N 699990 3333335
// Zone/Latitude band + northing + easting; Optionally with units "m"
// for meters and or N/E marker


Degree-Minute-Second patterns
DMS
DMS-01fs-a,
DMS-01fs-b
01°44'55.5"N 101°22'33.0"E
N01°44'55.5" E101°22'33.0"
// fractional second resolution, w/hash marks, with hemisphere
DMS
DMS-01fs-deg
01°44'55.5" 101°22'33.0"
// fractional second resolution, w/hash marks, NO hemisphere
DMS DMS-01dot-a,
DMS-01dot-b
01.44.55N 055.44.33E
N01.44.55 E055.44.33
// explicit dot separator
DMS DMS-02
N42 18' 00" W102 24' 00"
// variable length fields with separators and hemisphere
DMS DMS-01a
DMS-02a
421800N 1022400W
N421800 W1022400
// no field separators, D/M/S
DMS DMS-03a
DMS-03b
4218001234N 10224001234W
N4218001234 W10224001234
// no field separators; D/M/S.ss assummed


Degree-Minute patterns
DM DM-00
4218N-009 10224W-003
// obscure fractional minute notation
DM DM-01a,
DM-01a-dash
DM-01a-dot


DM-01b
DM-01b-dash
DM-01b-dot
42 18-009N 102 24-003W
42-18-009N; 102-24-003W
42.18.009N 102.24.003W
// Ambiguous fractional minute separator
// is handled with distinct patterns

N4218.009W10224.003
N42 18-005 x W102 24-008
N42.18.005 x W102.24.008
DM DM-02a
DM-02b
DM-02b-dash

4218.009N 10224.003W
N4218.0 W10224.0
N4218-0018 W10224-0444

// 02a/b allows for fixed-width D/M without separators.
DM
DM-03a
DM-03b
4218009N10224003W
N4218009W10224003// Fixed-width patten for D/M.mmm
DM
DM-03-av
DM-03-av-deg
DM-03-av-decdm
N42 18' W102 24'
42° 18' 102° 24'
42° 18.44' 102° 24.11'
// D/M pattern with explicit hashmarks and separators
// 03-av-decdm is pattern with NO hemisphere


DM
DM-03-bv
42° 18'N 102° 24'W
// trailing hemisphere, minute resolution
DM
DM-04a,
DM-04b
N4218 W10224
4218N 10224W
// trivial DMH or HDM pattern.
DM
DM-05
/4218N4/10224W5/

// Rare military format with checksum value.
DM
DM-06
OBE
DM
DM-07
42 DEG 18.0N 102 DEG 24.0W
// 'DEG' spelled out. fractional minute resolution
DM
DM-08
+42 18.0 x -102 24.0


Decimal Degree patterns
DD
DD-01
N42.3, W102.4
DD
DD-02
42.3N; 102.4W
DD DD-03
+42.3°;-102.4°
// explicit degree notation required, otherwise it is just a pair
// of floating point numbers.
DD DD-04
Latitude: N42.3° x Longitude: W102.3°
// Lat/Lon fields in text, decimal degree resolution
DD DD-05
N42°, W102°
DD DD-06
42° N, 102° W
DD DD-07
N42, W102



DEFINITIONS REFERENCE

Table 2. Defined Fields in XCoord Patterns


Field Name
Pattern, Description
Hemispheres
hemiLat, hemiLon ENSW
hemiLatSign, hemiLonSign -, +
hemiLatPre, hemiLonPre -, +, ENSW


dmLatSep, dmLonSep Degree symbol or separator [-°:\s]\s?
msLatSep, msLonSep Min/Sec separator or symbol
[-'´’′:.\s]
secLatSep, secLonSep
Sec symbol or separator, can be double hash mark
["'´’”″′\s]{0,2}


latlonSep Lat/Lon separator
[;,/|=x\s]
dmsSep
Standard DMS field separator
[-.:\s]


decDegLat, decDegLon
Decimal Degree field  DD.ddd... upto 20 decimal places
\d?\d\.\d{1,20},
[0-1]?\d?\d\.\d{1,20}


Variable length decimal minutes and degrees

decMinLat, decMinLon Decimal minutes
decMinLat3, decMinLon3 Decimal minutes, 3 decimal places or more
degLat, degLon Variable length degrees


Variable length fractional units, that is the decimal part of minutes or seconds only. the .mmm in M.mmm
fractMinLat, fractMinLon decimal part of minutes
fractSecLat, fracSecLon decimal part of seconds
fractMinLat3, fractMinLon3 decimal part of minutes, 3 decimal places or more


minLat, minLon Minutes
secLat, secLon Seconds


Fixed length patterns
dmsDegLat, dmsDegLon 2-digit lat, 3-digit lon
dmsMinLat, dmsMinLon 2-digit minute
dmsSecLat, dmsSecLon 2-digit second


UTM components
UTMBand
[A-HJ-NP-Z]
UTMZone
[0-5]?\d
MGRSQuad [A-HJ-NP-Z][A-HJ-NP-V]
MGRSZone
[0-6]?\d\s?[C-HJ-NP-X]
UTMEasting
UTMNorthing
Easting_Northing , EastingNorthing
offsets with and without space separator