Extraction fundamentals include
TextEntity, a span in free text, and
TextEntity generated by an extractor, matcher, or rule. A span is defined as a character start offset
and end offset. A TextEntity provides basic reasoning for span logic and math: compare spans before, after
within, overlap, etc.
Beyond that, the extraction helpers here provide specific Solr tagger support, match filteration, match navigation, and match metrics.
Interface Summary Interface Description ExtractorFor now, this interface is closer to an AbstractExtractor where a clean interface might be output = Extractor.extract(input) This interface specifies more
Class Summary Class Description ExtractionMetricsThis is a holder for tracking various common measures: No. ExtractionResult MatcherUtils MatchFilterThe Class MatchFilter. TextEntityA very simple struct to hold data useful for post-processing entities once found. TextMatchA variation on TextEntity that also records pattern metadata