Class SocialGeo
java.lang.Object
org.opensextant.extractors.geo.social.SocialGeo
- Direct Known Subclasses:
GeoInferencer
A base-class that has the various hooks for logging, dev/test/evaluation,
common dictionaries/resources, and helpful connectivity items.
- Author:
- ubaldino
-
Field Summary
Modifier and TypeFieldDescriptionIf you populate allCountries withA particular hashing of the list of country names.protected org.opensextant.util.GeonamesUtility
protected boolean
protected final org.slf4j.Logger
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionabstract void
close()
Release resources quietly.abstract void
Configure your implementation.protected static void
flattenPrecision
(org.opensextant.data.Geocoding geo, org.opensextant.extractors.xcoord.GeocoordPrecision prec) facilitate getting a simple precision metric.protected double
getConfidence
(double c) org.opensextant.data.Country
org.opensextant.data.Place
getUSStateByCode
(String code) A dot-separated code, country code + FIPS numericorg.opensextant.data.Place
getUSStateByName
(String name) Lookup US States.org.opensextant.data.Place
inferPlaceRecursively
(SolrGazetteer gaz, org.opensextant.data.Geocoding poi) org.opensextant.data.Place
inferPlaceRecursively
(SolrGazetteer gaz, org.opensextant.data.Geocoding poi, boolean requireADM1) Try to find closest P/PPL* (city or village) within 5 km.static boolean
Generally useful test of string values.void
Geonames Helpers.void
CAVEAT: For now this only loads US states, despite us loadingvoid
Populate the allCountries listing.void
Create a lookup of the most common country names.int
scoreCountryPrediction
(org.opensextant.data.Country C, org.opensextant.data.social.Tweet tw) This score as a boost for any sort of disambiguation of ties or close scores in predictions.void
setProvinceName
(org.opensextant.data.Place somePlace) set Province name from given codes on somePlace.
-
Field Details
-
log
protected final org.slf4j.Logger log -
evalMode
protected boolean evalMode -
inferencerID
-
inferencerDescription
-
countries
protected org.opensextant.util.GeonamesUtility countries -
allCountries
If you populate allCountries with -
basicCountryNames
A particular hashing of the list of country names. -
US_STATES
-
-
Constructor Details
-
SocialGeo
public SocialGeo()
-
-
Method Details
-
configure
public abstract void configure() throws org.opensextant.ConfigExceptionConfigure your implementation.- Throws:
org.opensextant.ConfigException
-
close
public abstract void close()Release resources quietly. -
isValue
Generally useful test of string values.- Parameters:
v
-- Returns:
-
scoreCountryPrediction
public int scoreCountryPrediction(org.opensextant.data.Country C, org.opensextant.data.social.Tweet tw) This score as a boost for any sort of disambiguation of ties or close scores in predictions.Points: A Country may score in 0 or more of these three categories: TZ, UTC, LANG. TZ +3 - Country contains timezone named by Tweet.timezone UTC +3 - Country contains UTC offset named by Tweet.utcOffset (Hours); +4 - Or if Tweet is in period of DST and Country observes that DST offset. This is slightly less believable because users apparently do not always adjust TZ and time on devices. Just the same, if country uses DST and so is user, then that is more significant than without LANG +3 - Language of User and of Text are both Primary language of Country +2 - either language is Primary language of Country +1 - language of text is spoken in Country LON TODO: consider (Country.LatLon ~ Tweet.UTC) ? within 5 degrees. Countries vary by size this makes little sense. But for Cities and States it makes more sense. MAX score is 3 + 4 + 3 = 10
- Parameters:
C
- a country prediction for the tweet.tw
- the tweet- Returns:
- score 1 to ~20
-
populateBasicCountryNames
public void populateBasicCountryNames()Create a lookup of the most common country names. This is just a pure ASCII listing... of ISO country names. To get more country names, populateAllCountries() should be used. -
populateAllCountries
Populate the allCountries listing. Not all pipeline apps make use of SolrGazetteer or do geo work so this is not part of setup.- Parameters:
gaz
-
-
loadProvinceNames
Geonames Helpers. Attach Province name if useful. Ideally keep data coded in databases, and render name at presentation or export time, if needed. But no need to store superfluous name data that is just a reflection of things that are coded.- Throws:
IOException
-
getUSStateByName
Lookup US States.- Parameters:
name
-- Returns:
-
getUSStateByCode
A dot-separated code, country code + FIPS numeric- Parameters:
code
- CC.FF- Returns:
-
loadUSStates
CAVEAT: For now this only loads US states, despite us loading- Throws:
IOException
-
getConfidence
protected double getConfidence(double c) -
getCountryNamed
- Parameters:
nm
-- Returns:
-
inferPlaceRecursively
public org.opensextant.data.Place inferPlaceRecursively(SolrGazetteer gaz, org.opensextant.data.Geocoding poi) throws org.apache.solr.client.solrj.SolrServerException, IOException - Throws:
org.apache.solr.client.solrj.SolrServerException
IOException
-
setProvinceName
public void setProvinceName(org.opensextant.data.Place somePlace) set Province name from given codes on somePlace.- Parameters:
somePlace
- Place object with CC and ADM1 codes set.
-
inferPlaceRecursively
public org.opensextant.data.Place inferPlaceRecursively(SolrGazetteer gaz, org.opensextant.data.Geocoding poi, boolean requireADM1) throws org.apache.solr.client.solrj.SolrServerException, IOException Try to find closest P/PPL* (city or village) within 5 km. Or a local site or landmark. If not try at a radius = 10, then at 30 KM, and still if not, try a region, say ADM1 or ADM2 place boundary if one is nearby within 100 KM. If NOT, ... then maybe you are in a remote, sparse territory or over water. TODO: Province ID is helpful for many things -- missing ADM1 codes is a general problem. Fix missing ADM1 codes in gazetteer, e.g., use ESRI free data, geonames.org, etc. NOTE: there are not any missing ADM1 codes; USGS is solid. TODO: Solr 4.x has a major memory problem when trying to find closest points. In theory it should be indexed rather well, however for geodetic search it tries to load ALL the index for that 'geo' field into RAM (based on experience). From there it can sort geodetically to find a closest point. This is not helpful -- so as a work around this recursive search outward finds any items close by (SolrGazetteer.placeAt() sorts results outside of Solr). The issue is that the search returns only first 25 rows of unsorted results, then sorts geodetically. So for this work around try to minimize results by select feature types or something.- Parameters:
gaz
- an intialized SolrGazetteerpoi
- point of interestrequireADM1
- true if ADM1 level resolution is desired.- Returns:
- a single place that appears to be closest to POI
- Throws:
org.apache.solr.client.solrj.SolrServerException
IOException
-
flattenPrecision
protected static void flattenPrecision(org.opensextant.data.Geocoding geo, org.opensextant.extractors.xcoord.GeocoordPrecision prec) facilitate getting a simple precision metric. +/- 1m is sufficient for tracking points extracted from text.- Parameters:
geo
-prec
-
-