Class GeoInferencer
java.lang.Object
org.opensextant.extractors.geo.social.SocialGeo
org.opensextant.extractors.geo.social.GeoInferencer
- All Implemented Interfaces:
org.opensextant.data.MatchSchema
- Direct Known Subclasses:
XponentGeocoder
A geoinferencer infers location on users and their messages.
This is a DeepEye-based API where Tweets, Records, and Annotations are the
main inputs and outputs.
- Author:
- ubaldino
-
Field Summary
Modifier and TypeFieldDescriptionstatic int
Avg text size (in chars) of tweets -- in 2014, I measured this to be about 90 chars.protected boolean
protected boolean
protected boolean
protected org.opensextant.extractors.langid.LangDetect
long
Fields inherited from class org.opensextant.extractors.geo.social.SocialGeo
allCountries, basicCountryNames, countries, evalMode, inferencerDescription, inferencerID, log, US_STATES
Fields inherited from interface org.opensextant.data.MatchSchema
VAL_COORD, VAL_COUNTRY, VAL_PLACE, VAL_POSTAL, VAL_TAXON
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionabstract Collection<GeoInference>
geoinferencePlaceMentions
(org.opensextant.data.social.Tweet tw) Extract and geocode any mentioned places, countries, coordinates in social media text.abstract GeoInference
geoinferenceTweetAuthor
(org.opensextant.data.social.Tweet tw) Infer author's location.abstract GeoInference
geoinferenceTweetStatus
(org.opensextant.data.social.Tweet tw) Infer location of message, if any such metadata is present.abstract Collection<org.opensextant.extraction.TextMatch>
If there are by-products of geotagging or inferencing that are worth retrieving, they can be retrieved as "additional matches"boolean
True if your implementation reflects anything off author profileboolean
boolean
True if your implementation reflects anything off status/messageprotected double
pct
(long tot, long count) abstract String
report()
Processing report; This could be more structured ala ExtractionMetrics for now this is just a final message from the implementation about general performance.void
setLanguageID
(org.opensextant.extractors.langid.LangDetect lid) NOTE: the langID tool from Cybozu can only be loaded once per JVM.Methods inherited from class org.opensextant.extractors.geo.social.SocialGeo
close, configure, flattenPrecision, getConfidence, getCountryNamed, getUSStateByCode, getUSStateByName, inferPlaceRecursively, inferPlaceRecursively, isValue, loadProvinceNames, loadUSStates, populateAllCountries, populateBasicCountryNames, scoreCountryPrediction, setProvinceName
-
Field Details
-
AVERAGE_TEXT_SIZE
public static int AVERAGE_TEXT_SIZEAvg text size (in chars) of tweets -- in 2014, I measured this to be about 90 chars. At that tweets with URLs dominate it seems, so actual natural language text avg size may be less. -
langidTool
protected org.opensextant.extractors.langid.LangDetect langidTool -
totalRecords
public long totalRecords -
infersAuthors
protected boolean infersAuthors -
infersStatus
protected boolean infersStatus -
infersPlaces
protected boolean infersPlaces
-
-
Constructor Details
-
GeoInferencer
public GeoInferencer()
-
-
Method Details
-
setLanguageID
public void setLanguageID(org.opensextant.extractors.langid.LangDetect lid) NOTE: the langID tool from Cybozu can only be loaded once per JVM. So it is initialized once by the data ingester, and then passed in here for use by the processor.- Parameters:
lid
-
-
geoinferenceTweetAuthor
public abstract GeoInference geoinferenceTweetAuthor(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException Infer author's location. Result is a geocoding annotation that contains lat, lon, Country and other gazetteer metadata.- Parameters:
tw
- DeepEye Social Tweet- Returns:
- annot DeepEye Annotation
- Throws:
org.opensextant.extraction.ExtractionException
-
geoinferenceTweetStatus
public abstract GeoInference geoinferenceTweetStatus(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException Infer location of message, if any such metadata is present. Result is a geocoding annotation that contains lat, lon, Country and other gazetteer metadata.- Parameters:
tw
- DeepEye Social Tweet- Returns:
- inference
- Throws:
org.opensextant.extraction.ExtractionException
-
geoinferencePlaceMentions
public abstract Collection<GeoInference> geoinferencePlaceMentions(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException Extract and geocode any mentioned places, countries, coordinates in social media text. For now, this takes a Tweet and uses AUTHOR profile location to help disambiguate found ambiguous tags.- Parameters:
tw
-- Returns:
- Throws:
org.opensextant.extraction.ExtractionException
-
getAdditionalMatches
If there are by-products of geotagging or inferencing that are worth retrieving, they can be retrieved as "additional matches"- Returns:
-
infersAuthorGeo
public boolean infersAuthorGeo()True if your implementation reflects anything off author profile -
infersStatusGeo
public boolean infersStatusGeo()True if your implementation reflects anything off status/message -
infersPlaces
public boolean infersPlaces() -
report
Processing report; This could be more structured ala ExtractionMetrics for now this is just a final message from the implementation about general performance.- Returns:
-
pct
protected double pct(long tot, long count)
-