Class SolrGazetteer

java.lang.Object
org.opensextant.extractors.geo.SolrGazetteer

public class SolrGazetteer extends Object
Connects to a Solr sever via HTTP and tags place names in document. The SOLR_HOME environment variable must be set to the location of the Solr server.
Author:
David Smiley - dsmiley@mitre.org, Marc Ubaldino - ubaldino@mitre.org
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
     
    static final org.opensextant.data.Country
    The Constant UNK_Country.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Instantiates a new solr gazetteer.
    Instantiates a new solr gazetteer with the specified Solr Home location.
    SolrGazetteer(SolrProxy currentIndex)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Close or release all resources.
    static org.opensextant.data.Place
    closest(org.opensextant.data.LatLon yx, List<org.opensextant.data.Place> places)
    Iterate through a list and choose a place closest to the given point
    static org.apache.solr.common.params.ModifiableSolrParams
     
    protected static org.apache.solr.common.params.ModifiableSolrParams
    Creates a generic spatial query for up to first 25 rows.
    protected static org.apache.solr.common.params.ModifiableSolrParams
    For larger areas choose a higher number of Rows to return.
    List<org.opensextant.data.Place>
    findPlaces(String name, String parametricQuery, int lenTolerance)
    Given a name, find all locations matching that.
    List<org.opensextant.data.Place>
    Find all places for a given gazetteer Place ID.
    List<org.opensextant.data.Place>
    findPlacesRomanizedNameOf(String name, String parametricQuery, int lenTolerance)
    NOTE: This yields primarily ASCII transliterations/romanized versions of the given place.
    Map<String,org.opensextant.data.Country>
    List all country names, official and variant names.
    org.opensextant.data.Country
    getCountry(String isocode)
    Get Country by the default ISO digraph returns the Unknown country if you are not using an ISO2 code.
    org.opensextant.data.Country
    Gets the country by fips.
    Returns the SolrProxy used internally.
    static Map<String,org.opensextant.data.Country>
    loadCountries(org.apache.solr.client.solrj.SolrClient index)
    This only returns Country objects that are names; It does not produce any abbreviation variants.
    static String
    Normalize country name.
    org.opensextant.data.Place
    placeAt(org.opensextant.data.LatLon yx, int withinKM, String feature)
    This is a reasonable guess.
    List<org.opensextant.data.Place>
    placesAt(org.opensextant.data.LatLon yx, int withinKM)
    Find places located at a particular location.
    List<org.opensextant.data.Place>
    placesAt(org.opensextant.data.LatLon yx, int withinKM, String feature)
    Variation on placesAt().
    List<org.opensextant.data.Place>
    search(String place_string)
    Search the gazetteer using a phrase.
    List<org.opensextant.data.Place>
    search(String place, boolean as_solr)
    Instance method that reuses a set of SolrParams for optimized search.
    List<org.opensextant.data.Place>
    search(org.apache.solr.common.params.SolrParams p)
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DEFAULT_FIELDS

      public static final String DEFAULT_FIELDS
      See Also:
    • UNK_Country

      public static final org.opensextant.data.Country UNK_Country
      The Constant UNK_Country.
  • Constructor Details

    • SolrGazetteer

      public SolrGazetteer() throws org.opensextant.ConfigException
      Instantiates a new solr gazetteer.
      Throws:
      org.opensextant.ConfigException - Signals that a configuration exception has occurred.
    • SolrGazetteer

      public SolrGazetteer(String solrHome) throws org.opensextant.ConfigException
      Instantiates a new solr gazetteer with the specified Solr Home location.
      Parameters:
      solrHome - the location of solrHome.
      Throws:
      org.opensextant.ConfigException - Signals that a configuration exception has occurred.
    • SolrGazetteer

      public SolrGazetteer(SolrProxy currentIndex) throws org.opensextant.ConfigException
      Throws:
      org.opensextant.ConfigException
  • Method Details

    • getSolrProxy

      public SolrProxy getSolrProxy()
      Returns the SolrProxy used internally.
      Returns:
      the solr proxy
    • normalizeCountryName

      public static String normalizeCountryName(String c)
      Normalize country name.
      Parameters:
      c - the c
      Returns:
      the string
    • createGeodeticLookupParams

      protected static org.apache.solr.common.params.ModifiableSolrParams createGeodeticLookupParams()
      Creates a generic spatial query for up to first 25 rows.
      Returns:
      default params
    • createGeodeticLookupParams

      protected static org.apache.solr.common.params.ModifiableSolrParams createGeodeticLookupParams(int rows)
      For larger areas choose a higher number of Rows to return. If you choose to use Solr spatial score-by-distance for sorting or anything, then Solr appears to want to load entire index into memory. So this sort mechanism is off by default.
      Parameters:
      rows - rows to include in spatial lookups
      Returns:
      solr params
    • createDefaultSearchParams

      public static org.apache.solr.common.params.ModifiableSolrParams createDefaultSearchParams(int rows)
    • close

      public void close()
      Close or release all resources.
    • getCountries

      public Map<String,org.opensextant.data.Country> getCountries()
      List all country names, official and variant names. Distinct territories (whose own ISO codes are unique) are listed as well. Territories owned by other countries -- their ISO code is their owning nation -- are attached as Country.territory (call Country.getTerritories() to list them). Name aliases are listed as Country.getAliases() The hash map returned contains all 260+ country listings keyed by ISO2 and ISO3. Odd commonly used variant codes are added as well.
      Returns:
      the countries
    • getCountry

      public org.opensextant.data.Country getCountry(String isocode)
      Get Country by the default ISO digraph returns the Unknown country if you are not using an ISO2 code. TODO: throw a GazetteerException of some sort. for null query or invalid code.
      Parameters:
      isocode - the isocode
      Returns:
      the country
    • getCountryByFIPS

      public org.opensextant.data.Country getCountryByFIPS(String fips)
      Gets the country by fips.
      Parameters:
      fips - the fips
      Returns:
      the country by fips
    • loadCountries

      public static Map<String,org.opensextant.data.Country> loadCountries(org.apache.solr.client.solrj.SolrClient index) throws org.apache.solr.client.solrj.SolrServerException, IOException
      This only returns Country objects that are names; It does not produce any abbreviation variants. TODO: allow caller to get all entries, including abbreviations.
      Parameters:
      index - solr instance to query
      Returns:
      country data hash
      Throws:
      org.apache.solr.client.solrj.SolrServerException - the solr server exception
      IOException - on err, if country metadata file is not found in classpath
    • search

      public List<org.opensextant.data.Place> search(String place_string) throws org.apache.solr.client.solrj.SolrServerException, IOException
        Search the gazetteer using a phrase. The phrase will be quoted
       internally as it searches Solr
      
       e.g., search( "\"Boston City\"" )
      
       Solr Gazetteer uses OR as default joiner for clauses. Without quotes the
       above search would be "Boston" OR "City" effectively.
       
      Parameters:
      place_string - the place_string
      Returns:
      places List of place entries
      Throws:
      org.apache.solr.client.solrj.SolrServerException - the solr server exception
      IOException
    • search

      public List<org.opensextant.data.Place> search(String place, boolean as_solr) throws org.apache.solr.client.solrj.SolrServerException, IOException
      Instance method that reuses a set of SolrParams for optimized search.
        Search the gazetteer using one of the following:
      
       a name or keyword a Solr style fielded query, which by default includes
       bare keyword searches
      
       search( "\"Boston City\"" )
      
       Solr Gazetteer uses OR as default joiner for clauses.
       
      Parameters:
      place - the place
      as_solr - the as_solr
      Returns:
      places List of place entries
      Throws:
      org.apache.solr.client.solrj.SolrServerException - related connectivity or Solr integrity
      IOException - related connectivity or Solr integrity
    • search

      public List<org.opensextant.data.Place> search(org.apache.solr.common.params.SolrParams p) throws org.apache.solr.client.solrj.SolrServerException, IOException
      Parameters:
      p -
      Returns:
      Throws:
      org.apache.solr.client.solrj.SolrServerException - -- connectivity or Solr integrity
      IOException - -- connectivity or Solr integrity
    • placesAt

      public List<org.opensextant.data.Place> placesAt(org.opensextant.data.LatLon yx, int withinKM) throws org.apache.solr.client.solrj.SolrServerException, IOException
      Find places located at a particular location.
      Parameters:
      yx - location
      withinKM - positive distance radius is required.
      Returns:
      unsorted list of places near location
      Throws:
      org.apache.solr.client.solrj.SolrServerException - on err
      IOException
    • placesAt

      public List<org.opensextant.data.Place> placesAt(org.opensextant.data.LatLon yx, int withinKM, String feature) throws org.apache.solr.client.solrj.SolrServerException, IOException
      Variation on placesAt().
      Parameters:
      yx - location
      withinKM - distance - required.
      feature - feature class
      Returns:
      unsorted list of places near location
      Throws:
      org.apache.solr.client.solrj.SolrServerException - on err
      IOException
    • closest

      public static org.opensextant.data.Place closest(org.opensextant.data.LatLon yx, List<org.opensextant.data.Place> places)
      Iterate through a list and choose a place closest to the given point
      Parameters:
      yx - point of interest
      places - list of places
      Returns:
      closest place
    • placeAt

      public org.opensextant.data.Place placeAt(org.opensextant.data.LatLon yx, int withinKM, String feature) throws org.apache.solr.client.solrj.SolrServerException, IOException
      This is a reasonable guess. CAVEAT: This does not use Solr Spatial location sorting.
      Parameters:
      yx - location
      withinKM - distance in KM
      feature - feature type
      Returns:
      closest place to given location.
      Throws:
      org.apache.solr.client.solrj.SolrServerException - on err
      IOException
    • findPlaces

      public List<org.opensextant.data.Place> findPlaces(String name, String parametricQuery, int lenTolerance) throws org.opensextant.extraction.ExtractionException
      Given a name, find all locations matching that. Matches may be +/- 1 or 2 characters different.
        search for "Fafu" Is "Fafu'" acceptable? then len tolerance is
       about +1 IS "Fafu Airport" acceptable, then use feat_class:S and a much
       longer tolerance.
       
      Parameters:
      name -
      parametricQuery -
      lenTolerance - your choice for how much longer a valid matching name can be.
      Returns:
      list of matching places
      Throws:
      org.opensextant.extraction.ExtractionException - if search fails
    • findPlacesById

      public List<org.opensextant.data.Place> findPlacesById(String placeID) throws org.opensextant.extraction.ExtractionException
      Find all places for a given gazetteer Place ID. You'll find all the variants vary by name only. same place ID should have same feature coding, lat/lon and other metadata.
      Parameters:
      placeID -
      Returns:
      Throws:
      org.opensextant.extraction.ExtractionException - if findPlaces query fails
    • findPlacesRomanizedNameOf

      public List<org.opensextant.data.Place> findPlacesRomanizedNameOf(String name, String parametricQuery, int lenTolerance) throws org.opensextant.extraction.ExtractionException
      NOTE: This yields primarily ASCII transliterations/romanized versions of the given place. You may indeed find multiple locations with the same name. Your parametric query should include feature type (feat_code:P, etc.) and country code (cc:AB) to yield the most relevant locations for a given name.
      Parameters:
      name -
      parametricQuery -
      lenTolerance - your choice for how much longer a valid matching name can be.
      Returns:
      matched places, with only variants that are Romanized (Angloicized, ASCII, etc)
      Throws:
      org.opensextant.extraction.ExtractionException - if findPlaces query fails