Package org.opensextant.extractors.geo.social


package org.opensextant.extractors.geo.social

Social Geoinferencing

In 2013 when OpenSextant was first released, we had a demo of how to parse tweets. That was fun.   Since then, what we've done with Xponents here is to develop a methodology for georeferencing any variety of data.  The terms geoinferencing, geocoding, geotagging seem to blur.  But for all intents and purposes what we mean is:

  • to detect all reasonable geographic cues in data, social media and other data alike
  • extract and report known entities supporting geographic intelligence: language, culture, nationality, etc.
  • provide a sense of confidence of codings and location accuracy
  • provide a rules trace of rules that fired arriving the geocoding
  • provide reasonable means for "tagging": meaning serializing data or persisting it in durable forms, e.g., inject geocode into a JSON subdocument or export a database schema (CSV headings) that are easy to interpret

This package applies mainly SolrGazetteer, XCoord and PlaceGeocoder in these classes:

  • XponentsTextGeotagger: demonstrate tagging social media text where there is additional metadata about user, location, profile.  This derives the geographic topics or locations in a post.
  • XponentsGeocoder: demonstrate the heuristic metadata rules for dealing with sparse inputs, as is the case with most social media. Xponents application here looks at finding the highest fidelity location associated with a user profile based on given country, location description, device GPS location, timezone and UTC offset.  The language of the post is compared to the primary language of any inferred Countries, as is the timezone of the post.  Such things help disambiguate free text descriptions of one's home location.

These Geoinferencers provide a base illustration for how other algorithms could extract and report discrete location intelligence.

SocialGeoDemo demonstrates using the TweetLoader to load JSON formatted data, process it, and output it as CSV or KML (using KMLDemoWriter and the stock OpenSextant output schema)


SocialGeoDemo usage:

   
      mkdir -p output
       java --cp lib/* -Dopensextant.solr=./solr/solr7
            -Xmx2g -Xms2g \
            SocialGeoDemo \
            --phase xponents-text --in src/test/resources/sample-tweets.json \
            --out ./output/sample-tweets+geocoded.json \
            --kml  ./output/sample-tweets.kml
    


The output is then a Geocoded JSON file as well as a KML file. 
  • Classes
    Class
    Description
    This is a light wrapper around TextMatch + Geocoding interfaces.
    A geoinferencer infers location on users and their messages.
    A cleaner approach to outputting geocoding data to KML using GISCore.
    A base-class that has the various hooks for logging, dev/test/evaluation, common dictionaries/resources, and helpful connectivity items.
    Pipeline focused on improving the location metadata for Tweets or Weibo or other social media that has metadata about user or messaging location.
    Variant TODO: Ideally, we would chain something like inferredLoc = geocode(Tweet, User, etc) then use the outputs from that to then mentionLocs = geocode(text, given=inferredLoc).