Class XponentGeocoder
java.lang.Object
org.opensextant.extractors.geo.social.SocialGeo
org.opensextant.extractors.geo.social.GeoInferencer
org.opensextant.extractors.geo.social.XponentGeocoder
- All Implemented Interfaces:
org.opensextant.data.MatchSchema
- Direct Known Subclasses:
XponentTextGeotagger
Pipeline focused on improving the location metadata for Tweets or Weibo or
other social media that has metadata about user or messaging location.
Assumptions: - microblog message has a User Profile or some subset of DeepEye
social media fields: 'ugeo*', 'geo*', etc.; See DeepEye social API for Tweet.
Tweet tw = DataUtility.fromDeepeye(R);
- Author:
- ubaldino
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
protected SolrGazetteer
protected org.opensextant.extraction.MatchFilter
Xponents user "match filter" for PlaceGeocoder: Quickly filter out adhoc social media noise.protected org.opensextant.extractors.geo.social.XponentGeocoder.UserProfileLocationRule
Xponents user "geocoding rule" for PlaceGeocoder: custom metadata is fed to tagger using this rule.protected long
protected long
protected long
protected PlaceGeocoder
protected org.opensextant.extractors.xcoord.XCoord
Fields inherited from class org.opensextant.extractors.geo.social.GeoInferencer
AVERAGE_TEXT_SIZE, infersAuthors, infersPlaces, infersStatus, langidTool, totalRecords
Fields inherited from class org.opensextant.extractors.geo.social.SocialGeo
allCountries, basicCountryNames, countries, evalMode, inferencerDescription, inferencerID, log, US_STATES
Fields inherited from interface org.opensextant.data.MatchSchema
VAL_COORD, VAL_COUNTRY, VAL_PLACE, VAL_POSTAL, VAL_TAXON
-
Constructor Summary
ConstructorDescriptionFor now "XpMeta" = geo processing tweets for province normalization. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Release resources quietly.void
Makes use of a number of APIs: XCoord to parse out additional coordinates not normalized Gazetteer/GazetteerMatcher to resolve place identities by querying them directly PlaceGeocoder to parse longer phrases of multiple words, tagging places so advanced rules could be applied to them. LangID identifies language of text, although for short texts it does not work reliably. GeonamesUtility and TextUtils provide metadata lookup for countries, timezone, language codes, etc.geoinferencePlaceMentions
(org.opensextant.data.social.Tweet tw) does not infer place mentions from free textgeoinferenceTweetAuthor
(org.opensextant.data.social.Tweet tw) Geoinference user/author profile.geoinferenceTweetStatus
(org.opensextant.data.social.Tweet tw) Geoinference the location of the message, e.g., where the message was sent from.Collection<org.opensextant.extraction.TextMatch>
Geocoder does not return Additional matches.getInferredCountry
(org.opensextant.data.social.Tweet t) Determine a starting set of countries -- if TZ/UTC is set, then use that,...int
inferCountryName
(org.opensextant.data.Geocoding g) Trivial test to see if provided place description is as simple as a country name, rather than a description of a place or non-place.int
inferCountryTimezone
(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g) int
inferProvinceByHierarchy
(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g) Use geographic hierarchy to find province related to this place.void
parseFreeTextCoordinates
(org.opensextant.data.Place g) Not common, but useful.processLocation
(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g, String rid, String annotName) Detailed routine to uncover additional location information in tweet noise.boolean
provinceID
(org.opensextant.data.Place g) Derive the Province ID if given a hard location.static String
report()
Renders a string buffer with a final report -- provided you set or increment the totalRecords value.Methods inherited from class org.opensextant.extractors.geo.social.GeoInferencer
infersAuthorGeo, infersPlaces, infersStatusGeo, pct, setLanguageID
Methods inherited from class org.opensextant.extractors.geo.social.SocialGeo
flattenPrecision, getConfidence, getCountryNamed, getUSStateByCode, getUSStateByName, inferPlaceRecursively, inferPlaceRecursively, isValue, loadProvinceNames, loadUSStates, populateAllCountries, populateBasicCountryNames, scoreCountryPrediction, setProvinceName
-
Field Details
-
gazetteer
-
userlocX
protected org.opensextant.extractors.xcoord.XCoord userlocX -
tagger
-
recordsWithCoord
protected long recordsWithCoord -
recordsWithTZ
protected long recordsWithTZ -
recordsWithPlace
protected long recordsWithPlace -
profilePlaceFilter
protected org.opensextant.extraction.MatchFilter profilePlaceFilterXponents user "match filter" for PlaceGeocoder: Quickly filter out adhoc social media noise. Items matched in tagger will be ignored as soon as possible in pipeline hierarchy. -
profileRule
protected org.opensextant.extractors.geo.social.XponentGeocoder.UserProfileLocationRule profileRuleXponents user "geocoding rule" for PlaceGeocoder: custom metadata is fed to tagger using this rule. Evaluation of match/geo candidates is done here as we control tweet metadata evidence such as TZ, UTC offset, obscure country evidence, Language possibilities, etc. -
DEFAULT_COUNTRY_CONF
public static final int DEFAULT_COUNTRY_CONF- See Also:
-
-
Constructor Details
-
XponentGeocoder
public XponentGeocoder()For now "XpMeta" = geo processing tweets for province normalization. Any possible geo indication is resolved down to a Province code. "XpGeotag" = full text geotagging/geocoding.
-
-
Method Details
-
geoinferencePlaceMentions
public Collection<GeoInference> geoinferencePlaceMentions(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException does not infer place mentions from free text- Specified by:
geoinferencePlaceMentions
in classGeoInferencer
- Returns:
- Throws:
org.opensextant.extraction.ExtractionException
-
report
Renders a string buffer with a final report -- provided you set or increment the totalRecords value.- Specified by:
report
in classGeoInferencer
- Returns:
-
configure
public void configure() throws org.opensextant.ConfigExceptionMakes use of a number of APIs:- XCoord to parse out additional coordinates not normalized
- Gazetteer/GazetteerMatcher to resolve place identities by querying them directly
- PlaceGeocoder to parse longer phrases of multiple words, tagging places so advanced rules could be applied to them.
- LangID identifies language of text, although for short texts it does not work reliably.
- GeonamesUtility and TextUtils provide metadata lookup for countries, timezone, language codes, etc.
-
close
public void close()Description copied from class:SocialGeo
Release resources quietly. -
geoinferenceTweetAuthor
public GeoInference geoinferenceTweetAuthor(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException Geoinference user/author profile. Standard 'deepeye' annotation is "ugeo" or "country"- Specified by:
geoinferenceTweetAuthor
in classGeoInferencer
- Parameters:
tw
- DeepEye Social Tweet- Returns:
- annot DeepEye Annotation
- Throws:
org.opensextant.extraction.ExtractionException
-
geoinferenceTweetStatus
public GeoInference geoinferenceTweetStatus(org.opensextant.data.social.Tweet tw) throws org.opensextant.extraction.ExtractionException Geoinference the location of the message, e.g., where the message was sent from. Standard 'deepeye' annotation is "geo"; most message locations are coordinates or hard locations.- Specified by:
geoinferenceTweetStatus
in classGeoInferencer
- Parameters:
tw
- tweet as parsed by DeepEye- Returns:
- Geo or Country annotation
- Throws:
org.opensextant.extraction.ExtractionException
- on running geolocation routines
-
parseFreeTextCoordinates
public void parseFreeTextCoordinates(org.opensextant.data.Place g) Not common, but useful. Improve location resolution via various tricks- Parameters:
g
-
-
provinceID
public boolean provinceID(org.opensextant.data.Place g) Derive the Province ID if given a hard location.- Parameters:
g
-- Returns:
- true if Place object was embued with a Province ID and Country ID if relevant.
-
inferCountryTimezone
public int inferCountryTimezone(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g) throws org.opensextant.extraction.ExtractionException - Throws:
org.opensextant.extraction.ExtractionException
-
getInferredCountry
public Map<String,org.opensextant.extractors.geo.social.XponentGeocoder.InferredCountry> getInferredCountry(org.opensextant.data.social.Tweet t) Determine a starting set of countries -- if TZ/UTC is set, then use that,... then improve scores where tweet language is spoken. Otherwise, try where tweet lang is spoken.- Parameters:
t
-- Returns:
-
inferProvinceByHierarchy
public int inferProvinceByHierarchy(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g) Use geographic hierarchy to find province related to this place. When standard hierarchy look fails, try tagging name value as free text- Parameters:
g
- place that has some name/prov/country or name/ADM1/CC hiearchy- Returns:
- confidence. greater than 0 means something was found.
-
inferCountryName
public int inferCountryName(org.opensextant.data.Geocoding g) Trivial test to see if provided place description is as simple as a country name, rather than a description of a place or non-place. This is just a lookup, not a tagger. Place g is coded with the found country.- Parameters:
g
- given geo text- Returns:
- if g could be geocoded with country.
-
removePunct
-
processLocation
public GeoInference processLocation(org.opensextant.data.social.Tweet tw, org.opensextant.data.Place g, String rid, String annotName) throws org.opensextant.extraction.ExtractionException Detailed routine to uncover additional location information in tweet noise. Since SocGeo 1.13.8 we try to set a Province name in addition to ADM1 ID- Parameters:
tw
- tweetg
- a location on tweet, geo or user geo (ugeo)rid
- Record ID from deepeye or other data ID.annotName
- annotation type to store.- Throws:
org.opensextant.extraction.ExtractionException
-
getAdditionalMatches
Geocoder does not return Additional matches.- Specified by:
getAdditionalMatches
in classGeoInferencer
- Returns:
-