Xponents Data ModelThe key constructs here are the GeoBase and Geocoding. GeoBase provides a base class for anything that has an ID, name or label, and a coordinate. Geocoding provides an interface for any heuristic that helps ground some data to a coordinate, while providing additional metadata about the geocoding itself. For example, beyond an actual coordinate useful geocoding attributes include:
- precision and confidence
- country code and province code or name
- method or source for geocoding, e.g., derivation or rote
Country and Place objects are extensions of GeoBase. Country is
used extensively in place name extraction, reverse geocoding, and
general country name/code lookups. See GeonamesUtility for
more country metadata tools.
Language object helps tie language code and name.
LangDetect and LangId (
provide some tools for language detection. Language ID does not always line
up with a known Language code literally, as LangID may yield language + locale.
So there is a need to be able to parse and manage explicit and inferred language/locale codings.
Java SDK Locale classes appear to only cover those used for computer internationalization.
ICU4J libraries, for example, do not have a simple clear API.
So, I created language lookup tables around ISO-639 codes (sourced from Library of Congress)
which are found in
org.opensextant.util.TextUtility: getLanguage(), getLanguageCode(), getLanguageMap().
Class Summary Class Description CountryCountry metadata provided on this class includes: ISO-3166 country code 2-char and 3-char forms, aligned with US standard FIPS 10-4 codes Country aliases: nick names, variant names, abbreviations Affiliated territories Timezone and UTC offset for temporal calculations Primary and Secondary languages Country.TZ DocInputUse only for cases where you have document inputs instead of raw records. GeoBaseAn intermediary between the simple LatLon and other conceptual classes: Place, Country, etc. LanguageSimple mapping of ISO 639 id to display name for languages PlacePlace class represents all the metadata about a location. TaxonA Taxon is an entry in a taxonomy, which could be as simple as a flat word list or something with lots of structure. TextInputTextInput is a unit of data -- a tuple that represents the text and its language and an identifier for downstream processing, export formatting, databasing results keyed by text identifier, etc.