Class XponentTextGeotagger
java.lang.Object
org.opensextant.extractors.geo.social.SocialGeo
org.opensextant.extractors.geo.social.GeoInferencer
org.opensextant.extractors.geo.social.XponentGeocoder
org.opensextant.extractors.geo.social.XponentTextGeotagger
- All Implemented Interfaces:
org.opensextant.data.MatchSchema
Variant
TODO: Ideally, we would chain something like inferredLoc = geocode(Tweet,
User, etc) then
use the outputs from that to then mentionLocs = geocode(text,
given=inferredLoc).
But what a ridiculously intricate pipeline that gets to be,... and you
quickly loose the
generality of applying this to other data; already heavily Tweet-dependent.
-
Field Summary
Fields inherited from class org.opensextant.extractors.geo.social.XponentGeocoder
DEFAULT_COUNTRY_CONF, gazetteer, profilePlaceFilter, profileRule, recordsWithCoord, recordsWithPlace, recordsWithTZ, tagger, userlocX
Fields inherited from class org.opensextant.extractors.geo.social.GeoInferencer
AVERAGE_TEXT_SIZE, infersAuthors, infersPlaces, infersStatus, langidTool, totalRecords
Fields inherited from class org.opensextant.extractors.geo.social.SocialGeo
allCountries, basicCountryNames, countries, evalMode, inferencerDescription, inferencerID, log, US_STATES
Fields inherited from interface org.opensextant.data.MatchSchema
VAL_COORD, VAL_COUNTRY, VAL_PLACE, VAL_POSTAL, VAL_TAXON
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
Makes use of a number of APIs: XCoord to parse out additional coordinates not normalized Gazetteer/GazetteerMatcher to resolve place identities by querying them directly PlaceGeocoder to parse longer phrases of multiple words, tagging places so advanced rules could be applied to them. LangID identifies language of text, although for short texts it does not work reliably. GeonamesUtility and TextUtils provide metadata lookup for countries, timezone, language codes, etc.static boolean
WARNING: Copy of Deepeye Pipes default geotag filter.geoinferencePlaceMentions
(org.opensextant.data.social.Tweet tw) Content-based geotagging.geoinferenceTweetAuthor
(org.opensextant.data.social.Tweet tw) This routine does not look at Author.geoinferenceTweetStatus
(org.opensextant.data.social.Tweet tw) Geotag and Geocode mentions in the message of a tweet.Collection<org.opensextant.extraction.TextMatch>
Geotagger does return Additional matches.processLocationMentions
(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g, String rid, String annotName) This works best if your tweet provides a natural language text, Tweet.setTextNatural()Methods inherited from class org.opensextant.extractors.geo.social.XponentGeocoder
close, getInferredCountry, inferCountryName, inferCountryTimezone, inferProvinceByHierarchy, parseFreeTextCoordinates, processLocation, provinceID, removePunct, report
Methods inherited from class org.opensextant.extractors.geo.social.GeoInferencer
infersAuthorGeo, infersPlaces, infersStatusGeo, pct, setLanguageID
Methods inherited from class org.opensextant.extractors.geo.social.SocialGeo
flattenPrecision, getConfidence, getCountryNamed, getUSStateByCode, getUSStateByName, inferPlaceRecursively, inferPlaceRecursively, isValue, loadProvinceNames, loadUSStates, populateAllCountries, populateBasicCountryNames, scoreCountryPrediction, setProvinceName
-
Field Details
-
MATCHCONF_MINIMUM_SOCMEDIA
protected int MATCHCONF_MINIMUM_SOCMEDIA
-
-
Constructor Details
-
XponentTextGeotagger
public XponentTextGeotagger()
-
-
Method Details
-
configure
public void configure() throws org.opensextant.ConfigExceptionDescription copied from class:XponentGeocoder
Makes use of a number of APIs:- XCoord to parse out additional coordinates not normalized
- Gazetteer/GazetteerMatcher to resolve place identities by querying them directly
- PlaceGeocoder to parse longer phrases of multiple words, tagging places so advanced rules could be applied to them.
- LangID identifies language of text, although for short texts it does not work reliably.
- GeonamesUtility and TextUtils provide metadata lookup for countries, timezone, language codes, etc.
- Overrides:
configure
in classXponentGeocoder
- Throws:
org.opensextant.ConfigException
-
geoinferenceTweetAuthor
public GeoInference geoinferenceTweetAuthor(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException This routine does not look at Author.- Overrides:
geoinferenceTweetAuthor
in classXponentGeocoder
- Parameters:
tw
- Tweet API object- Returns:
- the annotation with geo-inference
- Throws:
org.opensextant.extraction.ExtractionException
- on tagging erorr
-
geoinferenceTweetStatus
public GeoInference geoinferenceTweetStatus(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException Geotag and Geocode mentions in the message of a tweet. This may make use of Tweet metadata for disambiguation, not just the message content.- Overrides:
geoinferenceTweetStatus
in classXponentGeocoder
- Parameters:
tw
- tweet rendered by Core API TweetUtility This will use Tweet.lang to direct tagging/tokenization.- Returns:
- Geo or Country annotation
- Throws:
org.opensextant.extraction.ExtractionException
- on running geolocation routines
-
geoinferencePlaceMentions
public Collection<GeoInference> geoinferencePlaceMentions(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException Content-based geotagging. Given the tweet status message, find all place mentions geocoding what is meaningful. Trivial finds are likely omitted or marked with low confidence.- Overrides:
geoinferencePlaceMentions
in classXponentGeocoder
- Returns:
- Throws:
org.opensextant.extraction.ExtractionException
-
filterOut
WARNING: Copy of Deepeye Pipes default geotag filter. Adapt this rule/filter to items found in tweets.- Parameters:
m
-- Returns:
-
processLocationMentions
public Collection<GeoInference> processLocationMentions(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g, String rid, String annotName) throws org.opensextant.extraction.ExtractionException This works best if your tweet provides a natural language text, Tweet.setTextNatural()- Throws:
org.opensextant.extraction.ExtractionException
-
getAdditionalMatches
Geotagger does return Additional matches.- Overrides:
getAdditionalMatches
in classXponentGeocoder
- Returns:
-