Class GeonamesUtility

java.lang.Object
org.opensextant.util.GeonamesUtility

public class GeonamesUtility extends Object
Author:
ubaldino
  • Field Details

    • UNK_Country

      public static final Country UNK_Country
    • COUNTRY_ADM0

      public static final Set<String> COUNTRY_ADM0
    • COUNTRY_ADM0_NORM

      public static final String COUNTRY_ADM0_NORM
      See Also:
    • KNOWN_NAME_COLLISIONS

      public static final Map<String,String> KNOWN_NAME_COLLISIONS
      Experimental. A trivial way of looking at mapping well-known name collisions to country codes
    • ABBREV_TYPE

      public static final char ABBREV_TYPE
      See Also:
    • NAME_TYPE

      public static final char NAME_TYPE
      See Also:
    • CODE_TYPE

      public static final char CODE_TYPE
      See Also:
    • unknownLanguages

      public final HashSet<String> unknownLanguages
  • Constructor Details

    • GeonamesUtility

      public GeonamesUtility() throws IOException
      A utility class that offers many static routines; If you instantiate this class it will require metadata files for country-names and feature-codes in your classpath
      Throws:
      IOException - if metadata files are not found or do not load.
  • Method Details

    • normalizeCountryName

      public static String normalizeCountryName(String c)
      This may help revert to a more readable country name, e.g., if you are given upper case name and you want some version of it as a proper name But no need to use this if you have good reference data.
      Parameters:
      c - country name
      Returns:
      capitalize the name of a country
    • getFeatureDesignation

      public static String getFeatureDesignation(String cls, String code)
    • getFeatureName

      public String getFeatureName(String cls, String code)
      Find a readable name or description of a class/code
      Parameters:
      cls - feature class, e.g., P
      code - feature code, e.g., PPL
      Returns:
      name for a feature/code pair
    • approximateLongitudeForUTCOffset

      public static int approximateLongitudeForUTCOffset(int utc)
      This helps get the general area +/-5 degrees for a given UTC offset. UTC offsets range from -12.00 to +14.00. This covers 360deg of planet over a 24 hour day. Each offset hour covers about 15deg. Any answer you get here is likely best used with a range of fuziness, e.g., +/- 5deg. REFERENCE: https://en.wikipedia.org/wiki/List_of_UTC_time_offsets
      Parameters:
      utc - UTC offset
      Returns:
      approximated longitude, in degrees
    • countriesInTimezone

      public Collection<String> countriesInTimezone(String tz)
      List all countries in a particular TZ
      Parameters:
      tz - TZ name
      Returns:
      list of country codes
    • countriesInUTCOffset

      public Collection<String> countriesInUTCOffset(double utc)
      List all countries in a particular UTC offset; These are usually -15.0 to 15.0 every 0.5 or 0.25 hrs.
      Parameters:
      utc - offset in decimal hours
      Returns:
      list of country codes found with the offset
    • countriesInDSTOffset

      public Collection<String> countriesInDSTOffset(double dst)
      This check only makes sense if you have date/time which is in a period of daylight savings. Then the UTC offset for that period can be used to see which countries adhere to that DST convention. E.g., Boston and New York US standard time: GMT-0500, DST: GMT-0400
      Parameters:
      dst - DST offset
      Returns:
      list of country codes observing DST at that offset
      See Also:
    • isValue

      public static boolean isValue(String s)
    • loadMajorCities

      public static List<Place> loadMajorCities(String resourcePath) throws IOException
      Geonames.org data set: citiesN.txt
      Parameters:
      resourcePath - CLASSPATH location of a resource.
      Returns:
      list of places
      Throws:
      IOException - if resource file is not found
    • mapMajorCityIDs

      public static Map<String,Place> mapMajorCityIDs(List<Place> cities)
      Convenience: prepare a map for lookup by ID. If these are geonames.org place objects then the geonames IDs do not line up with those curated within OpenSextant Gazetteer geonames.org cities data is usually unique by row, so if you provide 1000 cities in a list your map will have 1000 city place IDs in the map. No duplicates expected.
      Parameters:
      cities - arra of Place objects
      Returns:
      map of place ID to Place object
    • mapPopulationByLocation

      public static Map<String,Integer> mapPopulationByLocation(List<Place> cities)
      See mapPopulationByLocation(list, int). Default geohash prefix length is 5, which yields about 6km grids or so.
      Parameters:
      cities - list of major cities
      Returns:
      map of population summation over geohash grids.
    • mapPopulationByLocation

      public static Map<String,Integer> mapPopulationByLocation(List<Place> cities, int ghResolution)
      This organizes population data by geohash. Geohash of N-char prefixes. If multiple cities are located on top of that grid, then the populations Geohash prefix = 4, yields about 30 KM, and 5 yields 6 KM.
      Parameters:
      cities - list of major cities
      ghResolution - number of geohash chars in prefix, for keys in map. Higher resolution means finer geohash grid
      Returns:
      map of population summation over geohash grids.
    • loadMajorCities

      public static List<Place> loadMajorCities(InputStream strm) throws IOException
      Load the Geonames.org majorcities data file. Mainly to acquire population and other metrics.
        Schema: http://download.geonames.org/export/dump/ pass in the files
       formatted in geonames.org format, and named citiesNNNN.zip (.txt) where
       NNNN is the population threshold.
       
      Parameters:
      strm - input stream for geonames.org cities file
      Returns:
      list of xponents Place obj
      Throws:
      IOException - if parsing goes wrong.
    • getAdmin1Metadata

      @Deprecated public List<Place> getAdmin1Metadata()
      Deprecated.
      Use getUSStateMetadata
      Provides access to a array of ADM1 metadata. This is a mutable list -- if you want to add MORE admin metadata (entries, postal code mappings, etc) then have at it. For now this is US + territories only (as of v2.8.17)
      Returns:
      Array of Admin Level 1 Place objects
      Since:
      2.8.17
    • getUSStateMetadata

      public List<Place> getUSStateMetadata()
      Provides access to a array of ADM1 metadata. This is a mutable list -- if you want to add MORE admin metadata (entries, postal code mappings, etc) then have at it. For now this is US + territories only (as of v2.8.17)
      Returns:
      Array of US States (Admin Level 1) Place objects
    • getProvinceMetadata

      public List<Place> getProvinceMetadata()
      Alias for getWorldAdmin1Metadata.
      Returns:
      list of Places.
    • getWorldAdmin1Metadata

      public List<Place> getWorldAdmin1Metadata()
      Get the array of Place objects representing ADM1 level boundaries. This is literally just Names, ADM1 codes and Country code data. No location information, except for US States. To get a province by code, use
      Returns:
      list of Places
    • loadWorldAdmin1Metadata

      public void loadWorldAdmin1Metadata() throws IOException
      Source: geonames.org ADM1 codes/names in anglo/ASCII form. These codes do NOT contain geodetic information (lat/lon, etc) CAVEAT -- using such Place metadata will provide a coordinate of (0,0)
      Throws:
      IOException - if geonames.org table cannot be found in classpath
    • getAdmin1Place

      public Place getAdmin1Place(String cc, String adm1)
      Retrieve a Place object with the semi-official name (in Latin/Anglo terms) given CC and ADM1 code. You must load World ADM1 data first; use loadWorldAdmin1Metadata()
      Parameters:
      cc - ISO country code
      adm1 - ISO province code
      Returns:
      Place or null.
    • getProvince

      public Place getProvince(String cc, String adm1)
      Lookup by coded path, CC.ADM1. You must load World ADM1 data first; use loadWorldAdmin1Metadata() These Admin Places do NOT contain geodetic information (lat/lon, etc) CAVEAT -- using such Place metadata will provide a coordinate of (0,0) Alias for getAdmin1Place(String, String)
      Parameters:
      cc - country code
      adm1 - ADM level 1 code
      Returns:
      Place for province
    • getAdmin1PlaceByHASC

      public Place getAdmin1PlaceByHASC(String path)
      Lookup by coded path, CC.ADM1. You must load World ADM1 data first; use loadWorldAdmin1Metadata()
      Parameters:
      path - hierarchical path
      Returns:
      adm1 place obj
    • loadUSStateMetadata

      public void loadUSStateMetadata() throws IOException
        TODO: This is mildly informed by geonames.org, however even there
       we are still missing a mapping between ADM1 FIPS/ISO codes for a state
       and the Postal codes/abbreviations.
      
       Aliases for the same US province: "US.25" = "MA" = "US.MA" =
       "Massachussetts" = "the Bay State"
      
       Easily mapping the coded data (e.g., 'MA' = '25') worldwide would be
       helpful.
      
       TODO: Make use of geonames.org or other sources for ADM1 postal code
       listings at top level.
       
      Throws:
      IOException - if CSV file not found in classpath
    • getDefaultCountryName

      public String getDefaultCountryName(String cc_iso2)
      Finds a default country name for a CC if one exists.
      Parameters:
      cc_iso2 - country code.
      Returns:
      name of country
    • getISOCountries

      public Map<String,Country> getISOCountries()
      List all country names, official and variant names. This does not key any Territories. Territories that carry another nation's country code are attached to that country. Territories assigned their own ISO code are listed/keyed as Countries here.
      Returns:
      map of countries, keyed by ISO country code
    • getCountries

      public List<Country> getCountries()
    • getCountry

      public Country getCountry(String isocode)
      Get Country by the default ISO digraph returns the Unknown country if you are not using an ISO2 code. TODO: throw a GazetteerException of some sort. for null query or invalid code.
      Parameters:
      isocode - ISO code
      Returns:
      Country object
    • getCountryByAnyCode

      public Country getCountryByAnyCode(String cc)
      Find distinct country object by a code. Ambiguous codes will not do anything. This is really useful only if you have no idea what standard your data uses -- FIPS or ISO2/ISO3. If you know then use the API method corresponding to that standard. getCountry() is ISO by default.
      Parameters:
      cc - country code from any standard.
      Returns:
      found country object
    • getCountryByFIPS

      public Country getCountryByFIPS(String fips)
      Parameters:
      fips - FIPS code
      Returns:
      Country object
    • FIPS2ISO

      public String FIPS2ISO(String fips)
      Find an ISO code for a given FIPS entry.
      Parameters:
      fips - FIPS code
      Returns:
      null if key does not exist.
    • normalizeAdminCode

      public static String normalizeAdminCode(String v)
      Convert and ADM1 or ADM2 id to a normalized form. US.44 or US.00 gives you 44 or 00 for id part. In this case upper case code is returned. if code is a number alone, "0" is returned for "00", "000", etc. And other numbers are 0-padded as 2-digits
      Parameters:
      v - admin code
      Returns:
      fixed admin code
    • getHASC

      public static String getHASC(String c, String adm1)
      Get a hiearchical path for a boundar or a place. This presumes you have already normalized these values.
        CC.ADM1.ADM2.ADM3... etc. for example:
      
       'US.48.201' ... some county in Texas.
       
      Parameters:
      c - country code
      adm1 - ADM1 code
      Returns:
      HASC path
    • getHASC

      public static String getHASC(String c, String adm1, String adm2)
    • isCountryNameCollision

      public static boolean isCountryNameCollision(String nm)
      Experimental. Given a normalized name phrase, does it collide with country name? Usage: Savannah is a great city. Georgia is lucky it has 10 Chic-fil-a restraunts in that metro area. Georgia is not a country, but a US State. So the logic caller might take: If "savannah" is found, then ignore georgia("GG") as a possible country isCountryNameCollision -- is intending to be objective. If you choose to ignore the country or not is up to caller. Hence this function is not "ignoreCountry(placenm)" TODO: replace with simple config file of such rules that are objective and can be generalized
      Parameters:
      nm - country name
      Returns:
      if country name is ambiguous and collides with other name
    • isName

      public static boolean isName(char name_type)
      Check if name type is an Abbreviation
      Parameters:
      name_type - code
      Returns:
      true if code is abbreviation
    • isCode

      public static boolean isCode(char name_type)
    • isAbbreviation

      public static boolean isAbbreviation(char name_type)
      Check if name type is an Abbreviation
      Parameters:
      name_type - OpenSextant code
      Returns:
      true if code is abbreviation
    • isCountry

      public static boolean isCountry(String featCode)
      Is this Place a Country?
      Parameters:
      featCode - feat code or designation
      Returns:
      - true if this is a country or "country-like" place
    • isPoliticalEntity

      public static boolean isPoliticalEntity(String featCode)
      Test if a feature is a political entity ~ country, territory, sovereign land
      Parameters:
      featCode -
      Returns:
    • isAdmin1

      public static boolean isAdmin1(String featCode)
      Is this Place a State or Province?
      Parameters:
      featCode - feature code
      Returns:
      - true if this is a State, Province or other first level admin area
    • isAdmin2

      public static boolean isAdmin2(String featCode)
    • isUpperAdminLevel

      public static boolean isUpperAdminLevel(String featCode)
      Macro for reasoning with upper common levels of boundaries - province, districts.
      Parameters:
      featCode -
      Returns:
    • isNationalCapital

      public static boolean isNationalCapital(String featCode)
      Is this Place a National Capital?
      Parameters:
      featCode - feature code
      Returns:
      - true if this is a a national Capital area
    • isAbbreviation

      public static boolean isAbbreviation(Place p)
      Wrapper for isAbbreviation(name type)
      Parameters:
      p - place
      Returns:
      true if is coded as abbreviation
    • isCountry

      public static boolean isCountry(Place p)
      Wrapper for isCountry(feat code)
      Parameters:
      p - place
      Returns:
      true if is Country, e.g., PCLI
    • isPoliticalEntity

      public static boolean isPoliticalEntity(Place p)
      Test is Place feature is coded as PCL* (PCL, PCLIX, PCLH, PCLD, PCLF, PCLS, etc)
      Parameters:
      p - Place
      Returns:
      true if place is a political boundary feature
    • isNationalCapital

      public static boolean isNationalCapital(Place p)
      wrapper for isNationalCaptial( feat code )
      Parameters:
      p - place
      Returns:
      true if is PPLC or similar
    • isAdmin1

      public static boolean isAdmin1(Place p)
      Parameters:
      p - place
      Returns:
      true if is ADM1
    • isAdministrative

      public static boolean isAdministrative(String featClass)
      if a place or feature represents an administrative boundary.
      Parameters:
      featClass - feature type in question
      Returns:
      true if is admin
    • isAdministrative

      public static boolean isAdministrative(String featClass, String featCode)
      Administrative feat class + code test.
      Parameters:
      featClass -
      featCode -
      Returns:
    • isPopulated

      public static boolean isPopulated(String featClass)
      Parameters:
      featClass - geonames feature class, e.g., A, P, H, L, V, T, R
      Returns:
      true if P.
    • isSpot

      public static boolean isSpot(String fc)
    • isLand

      public static boolean isLand(String fc)
    • isPostal

      public static boolean isPostal(Place g)
    • isPostal

      public static boolean isPostal(String fc)
    • countrySpeaks

      public boolean countrySpeaks(String lang, String cc)
      Is language spoken in country ID'd by cc? See TextUtils for list of langauges provided by Library of Congress.
      Parameters:
      lang - mixed case langID or langID+Locale.
      cc - UPPERCASE country code.
      Returns:
      false if language is not known or not spoken in that country.
    • isPrimaryLanguage

      public boolean isPrimaryLanguage(String lang, String cc)
      If lang is primary lang.
      Parameters:
      lang - Lang ID
      cc - Country code
      Returns:
      true if lang is the primary language of country named by cc
    • primaryLangID

      public String primaryLangID(String cc)
      When lang ID will do. see primaryLanguage() if Language object is desired.
      Parameters:
      cc - Country code
      Returns:
      Lang ID
    • primaryLanguage

      public Language primaryLanguage(String cc)
      Primary language for a given country. By our convention, this will be the major language family, not the locale. E.g., primary language of Australia? 'en', not 'en_AU'; The hashmap records the first entry only which is language.
      Parameters:
      cc - Country code
      Returns:
      Language object
    • countriesSpeaking

      public Collection<String> countriesSpeaking(String lang)
      Examples: what countries speak french (fr)? what countries speak Rwandan French? (fr-RW)?
      Parameters:
      lang - lang ID
      Returns:
      list of country codes speaking lang
    • languagesInCountry

      public Collection<String> languagesInCountry(String cc)
      Parameters:
      cc - UPPERCASE country code.
      Returns:
      ISO Language codes for a country
    • loadCountryLanguages

      public void loadCountryLanguages() throws IOException
      Parse metadata from geonames.org (file in CLASSPATH @ /geonames.org/countryInfo.txt) and populate existing Country objects with language metadata. By the time you call this method Countries have names, codes, regions, aliases, timezones.
      Throws:
      IOException - if geonames.org resource file is not found
    • addLang

      protected void addLang(String langOrLocale, String cc)
      Parameters:
      langOrLocale - lang code
      cc - country code
    • getLang

      protected static String getLang(String langid)
      Parse lang ID from Locale. Internal method; Ensure argument is not null;
      Parameters:
      langid - lang ID
      Returns:
      language family