Class XponentTextGeotagger

All Implemented Interfaces:
org.opensextant.data.MatchSchema

public class XponentTextGeotagger extends XponentGeocoder
Variant TODO: Ideally, we would chain something like inferredLoc = geocode(Tweet, User, etc) then use the outputs from that to then mentionLocs = geocode(text, given=inferredLoc). But what a ridiculously intricate pipeline that gets to be,... and you quickly loose the generality of applying this to other data; already heavily Tweet-dependent.
  • Field Details

    • MATCHCONF_MINIMUM_SOCMEDIA

      protected int MATCHCONF_MINIMUM_SOCMEDIA
  • Constructor Details

    • XponentTextGeotagger

      public XponentTextGeotagger()
  • Method Details

    • configure

      public void configure() throws org.opensextant.ConfigException
      Description copied from class: XponentGeocoder
      Makes use of a number of APIs:
      • XCoord to parse out additional coordinates not normalized
      • Gazetteer/GazetteerMatcher to resolve place identities by querying them directly
      • PlaceGeocoder to parse longer phrases of multiple words, tagging places so advanced rules could be applied to them.
      • LangID identifies language of text, although for short texts it does not work reliably.
      • GeonamesUtility and TextUtils provide metadata lookup for countries, timezone, language codes, etc.
      Overrides:
      configure in class XponentGeocoder
      Throws:
      org.opensextant.ConfigException
    • geoinferenceTweetAuthor

      public GeoInference geoinferenceTweetAuthor(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException
      This routine does not look at Author.
      Overrides:
      geoinferenceTweetAuthor in class XponentGeocoder
      Parameters:
      tw - Tweet API object
      Returns:
      the annotation with geo-inference
      Throws:
      org.opensextant.extraction.ExtractionException - on tagging erorr
    • geoinferenceTweetStatus

      public GeoInference geoinferenceTweetStatus(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException
      Geotag and Geocode mentions in the message of a tweet. This may make use of Tweet metadata for disambiguation, not just the message content.
      Overrides:
      geoinferenceTweetStatus in class XponentGeocoder
      Parameters:
      tw - tweet rendered by Core API TweetUtility This will use Tweet.lang to direct tagging/tokenization.
      Returns:
      Geo or Country annotation
      Throws:
      org.opensextant.extraction.ExtractionException - on running geolocation routines
    • geoinferencePlaceMentions

      public Collection<GeoInference> geoinferencePlaceMentions(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException
      Content-based geotagging. Given the tweet status message, find all place mentions geocoding what is meaningful. Trivial finds are likely omitted or marked with low confidence.
      Overrides:
      geoinferencePlaceMentions in class XponentGeocoder
      Returns:
      Throws:
      org.opensextant.extraction.ExtractionException
    • filterOut

      public static boolean filterOut(PlaceCandidate m)
      WARNING: Copy of Deepeye Pipes default geotag filter. Adapt this rule/filter to items found in tweets.
      Parameters:
      m -
      Returns:
    • processLocationMentions

      public Collection<GeoInference> processLocationMentions(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g, String rid, String annotName) throws org.opensextant.extraction.ExtractionException
      This works best if your tweet provides a natural language text, Tweet.setTextNatural()
      Throws:
      org.opensextant.extraction.ExtractionException
    • getAdditionalMatches

      public Collection<org.opensextant.extraction.TextMatch> getAdditionalMatches()
      Geotagger does return Additional matches.
      Overrides:
      getAdditionalMatches in class XponentGeocoder
      Returns: