Package org.opensextant.extractors.geo.social
Social Geoinferencing
In 2013 when OpenSextant was first released, we had a demo of how
to parse tweets. That was fun. Since then, what we've
done with Xponents here is to develop a methodology for
georeferencing any variety of data. The terms
geoinferencing, geocoding, geotagging seem to blur. But for
all intents and purposes what we mean is:
- to detect all reasonable geographic cues in data, social media and other data alike
- extract and report known entities supporting geographic intelligence: language, culture, nationality, etc.
- provide a sense of confidence of codings and location accuracy
- provide a rules trace of rules that fired arriving the
geocoding
- provide reasonable means for "tagging": meaning serializing data or persisting it in durable forms, e.g., inject geocode into a JSON subdocument or export a database schema (CSV headings) that are easy to interpret
This package applies mainly SolrGazetteer, XCoord
and PlaceGeocoder in these classes:
- XponentsTextGeotagger: demonstrate tagging social media text where there is additional metadata about user, location, profile. This derives the geographic topics or locations in a post.
- XponentsGeocoder: demonstrate the heuristic metadata rules for dealing with sparse inputs, as is the case with most social media. Xponents application here looks at finding the highest fidelity location associated with a user profile based on given country, location description, device GPS location, timezone and UTC offset. The language of the post is compared to the primary language of any inferred Countries, as is the timezone of the post. Such things help disambiguate free text descriptions of one's home location.
These Geoinferencers provide a base illustration for how
other algorithms could extract and report discrete location
intelligence.
SocialGeoDemo demonstrates using the TweetLoader
to load JSON formatted data, process it, and output it as CSV or
KML (using KMLDemoWriter and the stock OpenSextant output
schema)
SocialGeoDemo usage:
mkdir -p output java --cp lib/* -Dopensextant.solr=./solr/solr7 -Xmx2g -Xms2g \ SocialGeoDemo \ --phase xponents-text --in src/test/resources/sample-tweets.json \ --out ./output/sample-tweets+geocoded.json \ --kml ./output/sample-tweets.kml
The output is then a Geocoded JSON file as well as a KML file.
-
ClassDescriptionThis is a light wrapper around TextMatch + Geocoding interfaces.A geoinferencer infers location on users and their messages.A cleaner approach to outputting geocoding data to KML using GISCore.A base-class that has the various hooks for logging, dev/test/evaluation, common dictionaries/resources, and helpful connectivity items.Pipeline focused on improving the location metadata for Tweets or Weibo or other social media that has metadata about user or messaging location.Variant TODO: Ideally, we would chain something like inferredLoc = geocode(Tweet, User, etc) then use the outputs from that to then mentionLocs = geocode(text, given=inferredLoc).