Package org.opensextant.extractors.geo
Class SolrGazetteer
java.lang.Object
org.opensextant.extractors.geo.SolrGazetteer
Connects to a Solr sever via HTTP and tags place names in document. The
SOLR_HOME
environment variable must be set to the location of the Solr server.- Author:
- David Smiley - dsmiley@mitre.org, Marc Ubaldino - ubaldino@mitre.org
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringstatic final org.opensextant.data.CountryThe Constant UNK_Country. -
Constructor Summary
ConstructorsConstructorDescriptionInstantiates a new solr gazetteer.SolrGazetteer(String solrHome) Instantiates a new solr gazetteer with the specified Solr Home location.SolrGazetteer(SolrProxy currentIndex) -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Close or release all resources.static org.opensextant.data.PlaceIterate through a list and choose a place closest to the given pointstatic org.apache.solr.common.params.ModifiableSolrParamscreateDefaultSearchParams(int rows) protected static org.apache.solr.common.params.ModifiableSolrParamsCreates a generic spatial query for up to first 25 rows.protected static org.apache.solr.common.params.ModifiableSolrParamscreateGeodeticLookupParams(int rows) For larger areas choose a higher number of Rows to return.List<org.opensextant.data.Place>findPlaces(String name, String parametricQuery, int lenTolerance) Given a name, find all locations matching that.List<org.opensextant.data.Place>findPlacesById(String placeID) Find all places for a given gazetteer Place ID.List<org.opensextant.data.Place>findPlacesRomanizedNameOf(String name, String parametricQuery, int lenTolerance) NOTE: This yields primarily ASCII transliterations/romanized versions of the given place.List all country names, official and variant names.org.opensextant.data.CountrygetCountry(String isocode) Get Country by the default ISO digraph returns the Unknown country if you are not using an ISO2 code.org.opensextant.data.CountrygetCountryByFIPS(String fips) Gets the country by fips.Returns the SolrProxy used internally.loadCountries(org.apache.solr.client.solrj.SolrClient index) This only returns Country objects that are names; It does not produce any abbreviation variants.static StringNormalize country name.org.opensextant.data.PlaceThis is a reasonable guess.List<org.opensextant.data.Place>placesAt(org.opensextant.data.LatLon yx, int withinKM) Find places located at a particular location.List<org.opensextant.data.Place>Variation on placesAt().List<org.opensextant.data.Place>Search the gazetteer using a phrase.List<org.opensextant.data.Place>Instance method that reuses a set of SolrParams for optimized search.List<org.opensextant.data.Place>search(org.apache.solr.common.params.SolrParams p)
-
Field Details
-
DEFAULT_FIELDS
- See Also:
-
UNK_Country
public static final org.opensextant.data.Country UNK_CountryThe Constant UNK_Country.
-
-
Constructor Details
-
SolrGazetteer
public SolrGazetteer() throws org.opensextant.ConfigExceptionInstantiates a new solr gazetteer.- Throws:
org.opensextant.ConfigException- Signals that a configuration exception has occurred.
-
SolrGazetteer
Instantiates a new solr gazetteer with the specified Solr Home location.- Parameters:
solrHome- the location of solrHome.- Throws:
org.opensextant.ConfigException- Signals that a configuration exception has occurred.
-
SolrGazetteer
- Throws:
org.opensextant.ConfigException
-
-
Method Details
-
getSolrProxy
Returns the SolrProxy used internally.- Returns:
- the solr proxy
-
normalizeCountryName
Normalize country name.- Parameters:
c- the c- Returns:
- the string
-
createGeodeticLookupParams
protected static org.apache.solr.common.params.ModifiableSolrParams createGeodeticLookupParams()Creates a generic spatial query for up to first 25 rows.- Returns:
- default params
-
createGeodeticLookupParams
protected static org.apache.solr.common.params.ModifiableSolrParams createGeodeticLookupParams(int rows) For larger areas choose a higher number of Rows to return. If you choose to use Solr spatial score-by-distance for sorting or anything, then Solr appears to want to load entire index into memory. So this sort mechanism is off by default.- Parameters:
rows- rows to include in spatial lookups- Returns:
- solr params
-
createDefaultSearchParams
public static org.apache.solr.common.params.ModifiableSolrParams createDefaultSearchParams(int rows) -
close
public void close()Close or release all resources. -
getCountries
List all country names, official and variant names. Distinct territories (whose own ISO codes are unique) are listed as well. Territories owned by other countries -- their ISO code is their owning nation -- are attached as Country.territory (call Country.getTerritories() to list them). Name aliases are listed as Country.getAliases() The hash map returned contains all 260+ country listings keyed by ISO2 and ISO3. Odd commonly used variant codes are added as well.- Returns:
- the countries
-
getCountry
Get Country by the default ISO digraph returns the Unknown country if you are not using an ISO2 code. TODO: throw a GazetteerException of some sort. for null query or invalid code.- Parameters:
isocode- the isocode- Returns:
- the country
-
getCountryByFIPS
Gets the country by fips.- Parameters:
fips- the fips- Returns:
- the country by fips
-
loadCountries
public static Map<String,org.opensextant.data.Country> loadCountries(org.apache.solr.client.solrj.SolrClient index) throws org.apache.solr.client.solrj.SolrServerException, IOException This only returns Country objects that are names; It does not produce any abbreviation variants. TODO: allow caller to get all entries, including abbreviations.- Parameters:
index- solr instance to query- Returns:
- country data hash
- Throws:
org.apache.solr.client.solrj.SolrServerException- the solr server exceptionIOException- on err, if country metadata file is not found in classpath
-
search
public List<org.opensextant.data.Place> search(String place_string) throws org.apache.solr.client.solrj.SolrServerException, IOException Search the gazetteer using a phrase. The phrase will be quoted internally as it searches Solr e.g., search( "\"Boston City\"" ) Solr Gazetteer uses OR as default joiner for clauses. Without quotes the above search would be "Boston" OR "City" effectively.
- Parameters:
place_string- the place_string- Returns:
- places List of place entries
- Throws:
org.apache.solr.client.solrj.SolrServerException- the solr server exceptionIOException
-
search
public List<org.opensextant.data.Place> search(String place, boolean as_solr) throws org.apache.solr.client.solrj.SolrServerException, IOException Instance method that reuses a set of SolrParams for optimized search.Search the gazetteer using one of the following: a name or keyword a Solr style fielded query, which by default includes bare keyword searches search( "\"Boston City\"" ) Solr Gazetteer uses OR as default joiner for clauses.
- Parameters:
place- the placeas_solr- the as_solr- Returns:
- places List of place entries
- Throws:
org.apache.solr.client.solrj.SolrServerException- related connectivity or Solr integrityIOException- related connectivity or Solr integrity
-
search
public List<org.opensextant.data.Place> search(org.apache.solr.common.params.SolrParams p) throws org.apache.solr.client.solrj.SolrServerException, IOException - Parameters:
p-- Returns:
- Throws:
org.apache.solr.client.solrj.SolrServerException- -- connectivity or Solr integrityIOException- -- connectivity or Solr integrity
-
placesAt
public List<org.opensextant.data.Place> placesAt(org.opensextant.data.LatLon yx, int withinKM) throws org.apache.solr.client.solrj.SolrServerException, IOException Find places located at a particular location.- Parameters:
yx- locationwithinKM- positive distance radius is required.- Returns:
- unsorted list of places near location
- Throws:
org.apache.solr.client.solrj.SolrServerException- on errIOException
-
placesAt
public List<org.opensextant.data.Place> placesAt(org.opensextant.data.LatLon yx, int withinKM, String feature) throws org.apache.solr.client.solrj.SolrServerException, IOException Variation on placesAt().- Parameters:
yx- locationwithinKM- distance - required.feature- feature class- Returns:
- unsorted list of places near location
- Throws:
org.apache.solr.client.solrj.SolrServerException- on errIOException
-
closest
public static org.opensextant.data.Place closest(org.opensextant.data.LatLon yx, List<org.opensextant.data.Place> places) Iterate through a list and choose a place closest to the given point- Parameters:
yx- point of interestplaces- list of places- Returns:
- closest place
-
placeAt
public org.opensextant.data.Place placeAt(org.opensextant.data.LatLon yx, int withinKM, String feature) throws org.apache.solr.client.solrj.SolrServerException, IOException This is a reasonable guess. CAVEAT: This does not use Solr Spatial location sorting.- Parameters:
yx- locationwithinKM- distance in KMfeature- feature type- Returns:
- closest place to given location.
- Throws:
org.apache.solr.client.solrj.SolrServerException- on errIOException
-
findPlaces
public List<org.opensextant.data.Place> findPlaces(String name, String parametricQuery, int lenTolerance) throws org.opensextant.extraction.ExtractionException Given a name, find all locations matching that. Matches may be +/- 1 or 2 characters different.search for "Fafu" Is "Fafu'" acceptable? then len tolerance is about +1 IS "Fafu Airport" acceptable, then use feat_class:S and a much longer tolerance.
- Parameters:
name-parametricQuery-lenTolerance- your choice for how much longer a valid matching name can be.- Returns:
- list of matching places
- Throws:
org.opensextant.extraction.ExtractionException- if search fails
-
findPlacesById
public List<org.opensextant.data.Place> findPlacesById(String placeID) throws org.opensextant.extraction.ExtractionException Find all places for a given gazetteer Place ID. You'll find all the variants vary by name only. same place ID should have same feature coding, lat/lon and other metadata.- Parameters:
placeID-- Returns:
- Throws:
org.opensextant.extraction.ExtractionException- if findPlaces query fails
-
findPlacesRomanizedNameOf
public List<org.opensextant.data.Place> findPlacesRomanizedNameOf(String name, String parametricQuery, int lenTolerance) throws org.opensextant.extraction.ExtractionException NOTE: This yields primarily ASCII transliterations/romanized versions of the given place. You may indeed find multiple locations with the same name. Your parametric query should include feature type (feat_code:P, etc.) and country code (cc:AB) to yield the most relevant locations for a given name.- Parameters:
name-parametricQuery-lenTolerance- your choice for how much longer a valid matching name can be.- Returns:
- matched places, with only variants that are Romanized (Angloicized, ASCII, etc)
- Throws:
org.opensextant.extraction.ExtractionException- if findPlaces query fails
-