Class GeoInferencer

java.lang.Object
org.opensextant.extractors.geo.social.SocialGeo
org.opensextant.extractors.geo.social.GeoInferencer
All Implemented Interfaces:
org.opensextant.data.MatchSchema
Direct Known Subclasses:
XponentGeocoder

public abstract class GeoInferencer extends SocialGeo implements org.opensextant.data.MatchSchema
A geoinferencer infers location on users and their messages. This is a DeepEye-based API where Tweets, Records, and Annotations are the main inputs and outputs.
Author:
ubaldino
  • Field Details

    • AVERAGE_TEXT_SIZE

      public static int AVERAGE_TEXT_SIZE
      Avg text size (in chars) of tweets -- in 2014, I measured this to be about 90 chars. At that tweets with URLs dominate it seems, so actual natural language text avg size may be less.
    • langidTool

      protected org.opensextant.extractors.langid.LangDetect langidTool
    • totalRecords

      public long totalRecords
    • infersAuthors

      protected boolean infersAuthors
    • infersStatus

      protected boolean infersStatus
    • infersPlaces

      protected boolean infersPlaces
  • Constructor Details

    • GeoInferencer

      public GeoInferencer()
  • Method Details

    • setLanguageID

      public void setLanguageID(org.opensextant.extractors.langid.LangDetect lid)
      NOTE: the langID tool from Cybozu can only be loaded once per JVM. So it is initialized once by the data ingester, and then passed in here for use by the processor.
      Parameters:
      lid -
    • geoinferenceTweetAuthor

      public abstract GeoInference geoinferenceTweetAuthor(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException
      Infer author's location. Result is a geocoding annotation that contains lat, lon, Country and other gazetteer metadata.
      Parameters:
      tw - DeepEye Social Tweet
      Returns:
      annot DeepEye Annotation
      Throws:
      org.opensextant.extraction.ExtractionException
    • geoinferenceTweetStatus

      public abstract GeoInference geoinferenceTweetStatus(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException
      Infer location of message, if any such metadata is present. Result is a geocoding annotation that contains lat, lon, Country and other gazetteer metadata.
      Parameters:
      tw - DeepEye Social Tweet
      Returns:
      inference
      Throws:
      org.opensextant.extraction.ExtractionException
    • geoinferencePlaceMentions

      public abstract Collection<GeoInference> geoinferencePlaceMentions(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException
      Extract and geocode any mentioned places, countries, coordinates in social media text. For now, this takes a Tweet and uses AUTHOR profile location to help disambiguate found ambiguous tags.
      Parameters:
      tw -
      Returns:
      Throws:
      org.opensextant.extraction.ExtractionException
    • getAdditionalMatches

      public abstract Collection<org.opensextant.extraction.TextMatch> getAdditionalMatches()
      If there are by-products of geotagging or inferencing that are worth retrieving, they can be retrieved as "additional matches"
      Returns:
    • infersAuthorGeo

      public boolean infersAuthorGeo()
      True if your implementation reflects anything off author profile
    • infersStatusGeo

      public boolean infersStatusGeo()
      True if your implementation reflects anything off status/message
    • infersPlaces

      public boolean infersPlaces()
    • report

      public abstract String report()
      Processing report; This could be more structured ala ExtractionMetrics for now this is just a final message from the implementation about general performance.
      Returns:
    • pct

      protected double pct(long tot, long count)