org.opensextant.util (Xponents Core API)

package org.opensextant.util

Utilities for Extraction

Don't get me wrong -- there are a lot of good utilities already out there for NLP work. I found using Apache Commons StringUtils, File*Utils and other APIs very helpful. However, there are some oddities in the Java and Unicode world that need handling, as well as in the standards world where reasonable metadata is just missing.

FileUtility -- provides some simpler method calls and macro-like calls for common things. Most often used resource is just readFile(path, encoding)
GeodeticUtility -- simple geo math, validation and some geohash utilities
TextUtils text buffer cleanup routines; Language metadata and simple language detection.
- isASCII, isEnglish, isLatin, isJapanese...: detect Language codes and simple text detection
- checkCase, measureCase, isUpper, isLower: operations for character/text case metrics
- hasDiacritics, replaceDiacritics, removeDiacritics..: work with diacritics
- removeAny, removeAnyLeft, removeEmoticons, removeSymbols,...: non-text removals
- tokens, tokensRight, tokensLeft: split whitespace and return normalized tokens
- parseHashtTags, parseNaturalLanguage: work with jargon text or social media
GeonamesUtility -- a helper for working with country metadata: ISO, FIPS, names and codes.
SolrProxy and SolrUtil -- SolrProxy is a catch-all for interfacing with Solr index or server. The primary use cases here are interfacing Extractors with their underlying SolrTextTagger. SolrUtil supports some general and specific schema interaction for OpenSextant gazetteer Solr schema.

Related Packages

Package

Description

org.opensextant

org.opensextant.annotations

DeepEye is an approach for simplifying typical NLP annotation exchanges.

org.opensextant.data

Xponents Data Model

org.opensextant.extraction

Extraction Fundamentals

org.opensextant.output

Xponents Output Formatting using GISCore

org.opensextant.processing

Processing Basics: Parameters, Results Handlers, Pipelining
Classes

Class

Description

AnyFilenameFilter

FileUtility

GeodeticUtility

A collection of geodetic routines used within OpenSextant.

GeonamesUtility

TextUtils

Unimap

Package org.opensextant.util

Utilities for Extraction