Package org.opensextant.extractors.geo
Class SolrGazetteer
java.lang.Object
org.opensextant.extractors.geo.SolrGazetteer
Connects to a Solr sever via HTTP and tags place names in document. The
SOLR_HOME
environment variable must be set to the location of the Solr server.- Author:
- David Smiley - dsmiley@mitre.org, Marc Ubaldino - ubaldino@mitre.org
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
static final org.opensextant.data.Country
The Constant UNK_Country. -
Constructor Summary
ConstructorDescriptionInstantiates a new solr gazetteer.SolrGazetteer
(String solrHome) Instantiates a new solr gazetteer with the specified Solr Home location.SolrGazetteer
(SolrProxy currentIndex) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Close or release all resources.static org.opensextant.data.Place
Iterate through a list and choose a place closest to the given pointstatic org.apache.solr.common.params.ModifiableSolrParams
createDefaultSearchParams
(int rows) protected static org.apache.solr.common.params.ModifiableSolrParams
Creates a generic spatial query for up to first 25 rows.protected static org.apache.solr.common.params.ModifiableSolrParams
createGeodeticLookupParams
(int rows) For larger areas choose a higher number of Rows to return.List<org.opensextant.data.Place>
findPlaces
(String name, String parametricQuery, int lenTolerance) Given a name, find all locations matching that.List<org.opensextant.data.Place>
findPlacesById
(String placeID) Find all places for a given gazetteer Place ID.List<org.opensextant.data.Place>
findPlacesRomanizedNameOf
(String name, String parametricQuery, int lenTolerance) NOTE: This yields primarily ASCII transliterations/romanized versions of the given place.List all country names, official and variant names.org.opensextant.data.Country
getCountry
(String isocode) Get Country by the default ISO digraph returns the Unknown country if you are not using an ISO2 code.org.opensextant.data.Country
getCountryByFIPS
(String fips) Gets the country by fips.Returns the SolrProxy used internally.loadCountries
(org.apache.solr.client.solrj.SolrClient index) This only returns Country objects that are names; It does not produce any abbreviation variants.static String
Normalize country name.org.opensextant.data.Place
This is a reasonable guess.List<org.opensextant.data.Place>
placesAt
(org.opensextant.data.LatLon yx, int withinKM) Find places located at a particular location.List<org.opensextant.data.Place>
Variation on placesAt().List<org.opensextant.data.Place>
Search the gazetteer using a phrase.List<org.opensextant.data.Place>
Instance method that reuses a set of SolrParams for optimized search.List<org.opensextant.data.Place>
search
(org.apache.solr.common.params.SolrParams p)
-
Field Details
-
DEFAULT_FIELDS
- See Also:
-
UNK_Country
public static final org.opensextant.data.Country UNK_CountryThe Constant UNK_Country.
-
-
Constructor Details
-
SolrGazetteer
public SolrGazetteer() throws org.opensextant.ConfigExceptionInstantiates a new solr gazetteer.- Throws:
org.opensextant.ConfigException
- Signals that a configuration exception has occurred.
-
SolrGazetteer
Instantiates a new solr gazetteer with the specified Solr Home location.- Parameters:
solrHome
- the location of solrHome.- Throws:
org.opensextant.ConfigException
- Signals that a configuration exception has occurred.
-
SolrGazetteer
- Throws:
org.opensextant.ConfigException
-
-
Method Details
-
getSolrProxy
Returns the SolrProxy used internally.- Returns:
- the solr proxy
-
normalizeCountryName
Normalize country name.- Parameters:
c
- the c- Returns:
- the string
-
createGeodeticLookupParams
protected static org.apache.solr.common.params.ModifiableSolrParams createGeodeticLookupParams()Creates a generic spatial query for up to first 25 rows.- Returns:
- default params
-
createGeodeticLookupParams
protected static org.apache.solr.common.params.ModifiableSolrParams createGeodeticLookupParams(int rows) For larger areas choose a higher number of Rows to return. If you choose to use Solr spatial score-by-distance for sorting or anything, then Solr appears to want to load entire index into memory. So this sort mechanism is off by default.- Parameters:
rows
- rows to include in spatial lookups- Returns:
- solr params
-
createDefaultSearchParams
public static org.apache.solr.common.params.ModifiableSolrParams createDefaultSearchParams(int rows) -
close
public void close()Close or release all resources. -
getCountries
List all country names, official and variant names. Distinct territories (whose own ISO codes are unique) are listed as well. Territories owned by other countries -- their ISO code is their owning nation -- are attached as Country.territory (call Country.getTerritories() to list them). Name aliases are listed as Country.getAliases() The hash map returned contains all 260+ country listings keyed by ISO2 and ISO3. Odd commonly used variant codes are added as well.- Returns:
- the countries
-
getCountry
Get Country by the default ISO digraph returns the Unknown country if you are not using an ISO2 code. TODO: throw a GazetteerException of some sort. for null query or invalid code.- Parameters:
isocode
- the isocode- Returns:
- the country
-
getCountryByFIPS
Gets the country by fips.- Parameters:
fips
- the fips- Returns:
- the country by fips
-
loadCountries
public static Map<String,org.opensextant.data.Country> loadCountries(org.apache.solr.client.solrj.SolrClient index) throws org.apache.solr.client.solrj.SolrServerException, IOException This only returns Country objects that are names; It does not produce any abbreviation variants. TODO: allow caller to get all entries, including abbreviations.- Parameters:
index
- solr instance to query- Returns:
- country data hash
- Throws:
org.apache.solr.client.solrj.SolrServerException
- the solr server exceptionIOException
- on err, if country metadata file is not found in classpath
-
search
public List<org.opensextant.data.Place> search(String place_string) throws org.apache.solr.client.solrj.SolrServerException, IOException Search the gazetteer using a phrase. The phrase will be quoted internally as it searches Solr e.g., search( "\"Boston City\"" ) Solr Gazetteer uses OR as default joiner for clauses. Without quotes the above search would be "Boston" OR "City" effectively.
- Parameters:
place_string
- the place_string- Returns:
- places List of place entries
- Throws:
org.apache.solr.client.solrj.SolrServerException
- the solr server exceptionIOException
-
search
public List<org.opensextant.data.Place> search(String place, boolean as_solr) throws org.apache.solr.client.solrj.SolrServerException, IOException Instance method that reuses a set of SolrParams for optimized search.Search the gazetteer using one of the following: a name or keyword a Solr style fielded query, which by default includes bare keyword searches search( "\"Boston City\"" ) Solr Gazetteer uses OR as default joiner for clauses.
- Parameters:
place
- the placeas_solr
- the as_solr- Returns:
- places List of place entries
- Throws:
org.apache.solr.client.solrj.SolrServerException
- related connectivity or Solr integrityIOException
- related connectivity or Solr integrity
-
search
public List<org.opensextant.data.Place> search(org.apache.solr.common.params.SolrParams p) throws org.apache.solr.client.solrj.SolrServerException, IOException - Parameters:
p
-- Returns:
- Throws:
org.apache.solr.client.solrj.SolrServerException
- -- connectivity or Solr integrityIOException
- -- connectivity or Solr integrity
-
placesAt
public List<org.opensextant.data.Place> placesAt(org.opensextant.data.LatLon yx, int withinKM) throws org.apache.solr.client.solrj.SolrServerException, IOException Find places located at a particular location.- Parameters:
yx
- locationwithinKM
- positive distance radius is required.- Returns:
- unsorted list of places near location
- Throws:
org.apache.solr.client.solrj.SolrServerException
- on errIOException
-
placesAt
public List<org.opensextant.data.Place> placesAt(org.opensextant.data.LatLon yx, int withinKM, String feature) throws org.apache.solr.client.solrj.SolrServerException, IOException Variation on placesAt().- Parameters:
yx
- locationwithinKM
- distance - required.feature
- feature class- Returns:
- unsorted list of places near location
- Throws:
org.apache.solr.client.solrj.SolrServerException
- on errIOException
-
closest
public static org.opensextant.data.Place closest(org.opensextant.data.LatLon yx, List<org.opensextant.data.Place> places) Iterate through a list and choose a place closest to the given point- Parameters:
yx
- point of interestplaces
- list of places- Returns:
- closest place
-
placeAt
public org.opensextant.data.Place placeAt(org.opensextant.data.LatLon yx, int withinKM, String feature) throws org.apache.solr.client.solrj.SolrServerException, IOException This is a reasonable guess. CAVEAT: This does not use Solr Spatial location sorting.- Parameters:
yx
- locationwithinKM
- distance in KMfeature
- feature type- Returns:
- closest place to given location.
- Throws:
org.apache.solr.client.solrj.SolrServerException
- on errIOException
-
findPlaces
public List<org.opensextant.data.Place> findPlaces(String name, String parametricQuery, int lenTolerance) throws org.opensextant.extraction.ExtractionException Given a name, find all locations matching that. Matches may be +/- 1 or 2 characters different.search for "Fafu" Is "Fafu'" acceptable? then len tolerance is about +1 IS "Fafu Airport" acceptable, then use feat_class:S and a much longer tolerance.
- Parameters:
name
-parametricQuery
-lenTolerance
- your choice for how much longer a valid matching name can be.- Returns:
- list of matching places
- Throws:
org.opensextant.extraction.ExtractionException
- if search fails
-
findPlacesById
public List<org.opensextant.data.Place> findPlacesById(String placeID) throws org.opensextant.extraction.ExtractionException Find all places for a given gazetteer Place ID. You'll find all the variants vary by name only. same place ID should have same feature coding, lat/lon and other metadata.- Parameters:
placeID
-- Returns:
- Throws:
org.opensextant.extraction.ExtractionException
- if findPlaces query fails
-
findPlacesRomanizedNameOf
public List<org.opensextant.data.Place> findPlacesRomanizedNameOf(String name, String parametricQuery, int lenTolerance) throws org.opensextant.extraction.ExtractionException NOTE: This yields primarily ASCII transliterations/romanized versions of the given place. You may indeed find multiple locations with the same name. Your parametric query should include feature type (feat_code:P, etc.) and country code (cc:AB) to yield the most relevant locations for a given name.- Parameters:
name
-parametricQuery
-lenTolerance
- your choice for how much longer a valid matching name can be.- Returns:
- matched places, with only variants that are Romanized (Angloicized, ASCII, etc)
- Throws:
org.opensextant.extraction.ExtractionException
- if findPlaces query fails
-