Class XponentGeocoder

All Implemented Interfaces:
org.opensextant.data.MatchSchema
Direct Known Subclasses:
XponentTextGeotagger

public class XponentGeocoder extends GeoInferencer
Pipeline focused on improving the location metadata for Tweets or Weibo or other social media that has metadata about user or messaging location. Assumptions: - microblog message has a User Profile or some subset of DeepEye social media fields: 'ugeo*', 'geo*', etc.; See DeepEye social API for Tweet. Tweet tw = DataUtility.fromDeepeye(R);
Author:
ubaldino
  • Field Details

    • gazetteer

      protected SolrGazetteer gazetteer
    • userlocX

      protected org.opensextant.extractors.xcoord.XCoord userlocX
    • tagger

      protected PlaceGeocoder tagger
    • recordsWithCoord

      protected long recordsWithCoord
    • recordsWithTZ

      protected long recordsWithTZ
    • recordsWithPlace

      protected long recordsWithPlace
    • profilePlaceFilter

      protected org.opensextant.extraction.MatchFilter profilePlaceFilter
      Xponents user "match filter" for PlaceGeocoder: Quickly filter out adhoc social media noise. Items matched in tagger will be ignored as soon as possible in pipeline hierarchy.
    • profileRule

      protected org.opensextant.extractors.geo.social.XponentGeocoder.UserProfileLocationRule profileRule
      Xponents user "geocoding rule" for PlaceGeocoder: custom metadata is fed to tagger using this rule. Evaluation of match/geo candidates is done here as we control tweet metadata evidence such as TZ, UTC offset, obscure country evidence, Language possibilities, etc.
    • DEFAULT_COUNTRY_CONF

      public static final int DEFAULT_COUNTRY_CONF
      See Also:
  • Constructor Details

    • XponentGeocoder

      public XponentGeocoder()
      For now "XpMeta" = geo processing tweets for province normalization. Any possible geo indication is resolved down to a Province code. "XpGeotag" = full text geotagging/geocoding.
  • Method Details

    • geoinferencePlaceMentions

      public Collection<GeoInference> geoinferencePlaceMentions(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException
      does not infer place mentions from free text
      Specified by:
      geoinferencePlaceMentions in class GeoInferencer
      Returns:
      Throws:
      org.opensextant.extraction.ExtractionException
    • report

      public String report()
      Renders a string buffer with a final report -- provided you set or increment the totalRecords value.
      Specified by:
      report in class GeoInferencer
      Returns:
    • configure

      public void configure() throws org.opensextant.ConfigException
      Makes use of a number of APIs:
      • XCoord to parse out additional coordinates not normalized
      • Gazetteer/GazetteerMatcher to resolve place identities by querying them directly
      • PlaceGeocoder to parse longer phrases of multiple words, tagging places so advanced rules could be applied to them.
      • LangID identifies language of text, although for short texts it does not work reliably.
      • GeonamesUtility and TextUtils provide metadata lookup for countries, timezone, language codes, etc.
      Specified by:
      configure in class SocialGeo
      Throws:
      org.opensextant.ConfigException
    • close

      public void close()
      Description copied from class: SocialGeo
      Release resources quietly.
      Specified by:
      close in class SocialGeo
    • geoinferenceTweetAuthor

      public GeoInference geoinferenceTweetAuthor(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException
      Geoinference user/author profile. Standard 'deepeye' annotation is "ugeo" or "country"
      Specified by:
      geoinferenceTweetAuthor in class GeoInferencer
      Parameters:
      tw - DeepEye Social Tweet
      Returns:
      annot DeepEye Annotation
      Throws:
      org.opensextant.extraction.ExtractionException
    • geoinferenceTweetStatus

      public GeoInference geoinferenceTweetStatus(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException
      Geoinference the location of the message, e.g., where the message was sent from. Standard 'deepeye' annotation is "geo"; most message locations are coordinates or hard locations.
      Specified by:
      geoinferenceTweetStatus in class GeoInferencer
      Parameters:
      tw - tweet as parsed by DeepEye
      Returns:
      Geo or Country annotation
      Throws:
      org.opensextant.extraction.ExtractionException - on running geolocation routines
    • parseFreeTextCoordinates

      public void parseFreeTextCoordinates(org.opensextant.data.Place g)
      Not common, but useful. Improve location resolution via various tricks
      Parameters:
      g -
    • provinceID

      public boolean provinceID(org.opensextant.data.Place g)
      Derive the Province ID if given a hard location.
      Parameters:
      g -
      Returns:
      true if Place object was embued with a Province ID and Country ID if relevant.
    • inferCountryTimezone

      public int inferCountryTimezone(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g) throws org.opensextant.extraction.ExtractionException
      Throws:
      org.opensextant.extraction.ExtractionException
    • getInferredCountry

      public Map<String,org.opensextant.extractors.geo.social.XponentGeocoder.InferredCountry> getInferredCountry(org.opensextant.data.social.Tweet t)
      Determine a starting set of countries -- if TZ/UTC is set, then use that,... then improve scores where tweet language is spoken. Otherwise, try where tweet lang is spoken.
      Parameters:
      t -
      Returns:
    • inferProvinceByHierarchy

      public int inferProvinceByHierarchy(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g)
      Use geographic hierarchy to find province related to this place. When standard hierarchy look fails, try tagging name value as free text
      Parameters:
      g - place that has some name/prov/country or name/ADM1/CC hiearchy
      Returns:
      confidence. greater than 0 means something was found.
    • inferCountryName

      public int inferCountryName(org.opensextant.data.Geocoding g)
      Trivial test to see if provided place description is as simple as a country name, rather than a description of a place or non-place. This is just a lookup, not a tagger. Place g is coded with the found country.
      Parameters:
      g - given geo text
      Returns:
      if g could be geocoded with country.
    • removePunct

      public static String removePunct(String s)
    • processLocation

      public GeoInference processLocation(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g, String rid, String annotName) throws org.opensextant.extraction.ExtractionException
      Detailed routine to uncover additional location information in tweet noise. Since SocGeo 1.13.8 we try to set a Province name in addition to ADM1 ID
      Parameters:
      tw - tweet
      g - a location on tweet, geo or user geo (ugeo)
      rid - Record ID from deepeye or other data ID.
      annotName - annotation type to store.
      Throws:
      org.opensextant.extraction.ExtractionException
    • getAdditionalMatches

      public Collection<org.opensextant.extraction.TextMatch> getAdditionalMatches()
      Geocoder does not return Additional matches.
      Specified by:
      getAdditionalMatches in class GeoInferencer
      Returns: