Class XtractorGroup


  • public class XtractorGroup
    extends java.lang.Object
    A Group of Xponent Extractors. An Extractor has a simple interface:
     +configure() + extract()
     

    Configure any Extractor; add it to the stack here; Once you have added Extractors to your XtractorGroup, call XtractorGroup.setup() Since a single processor of several may throw an exception, while others succeed, The API does not throw exceptions failing a document completely. If you need access to exceptions thrown by each processor or formatter, then you would adapt the XtractorGroup here, but re-implementing the internal loops.

    Author:
    ubaldino
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected java.util.List<java.lang.String> currErrors
      API: child implementations have access to accumulated errors; reset() clears errors and other state.
      protected java.util.List<Extractor> extractors
      API: child implementations have access to the core list of extractors.
      protected java.util.List<ResultsFormatter> formatters
      API: child implementations have access to the core list of extractors.
      protected org.slf4j.Logger log
      API: child implementations should recreate their own logger.
    • Constructor Summary

      Constructors 
      Constructor Description
      XtractorGroup()  
    • Method Summary

      Modifier and Type Method Description
      void addExtractor​(Extractor xprocessor)  
      void addFormatter​(ResultsFormatter formatter)  
      void cleanupAll()
      Use only if you intend to shutdown.
      int format​(ExtractionResult compilation)
      Format each result; Some formatters may pass on results For example, Shapefile formatter accepts only Geocoding-capable TextMatch.
      java.util.List<TextMatch> process​(TextInput input)
      Process one input.
      int processAndFormat​(TextInput input)
      Processes input content against all extractors and all formatters This does not throw exceptions, as some processing may fail, while others succeed.
      void reset()
      DRAFT: still figuring out the rules for 'reset' between processing or inputs.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • extractors

        protected java.util.List<Extractor> extractors
        API: child implementations have access to the core list of extractors.
      • formatters

        protected java.util.List<ResultsFormatter> formatters
        API: child implementations have access to the core list of extractors.
      • log

        protected org.slf4j.Logger log
        API: child implementations should recreate their own logger.
      • currErrors

        protected java.util.List<java.lang.String> currErrors
        API: child implementations have access to accumulated errors; reset() clears errors and other state.
    • Constructor Detail

      • XtractorGroup

        public XtractorGroup()
    • Method Detail

      • addExtractor

        public void addExtractor​(Extractor xprocessor)
      • process

        public java.util.List<TextMatch> process​(TextInput input)
        Process one input. If you have no need for formatting output at this time use this. If you have complext ExtractionResults where you want to add meta attributes, then you would use this approach
      • format

        public int format​(ExtractionResult compilation)
        Format each result; Some formatters may pass on results For example, Shapefile formatter accepts only Geocoding-capable TextMatch.
      • cleanupAll

        public void cleanupAll()
        Use only if you intend to shutdown.
      • reset

        public void reset()
        DRAFT: still figuring out the rules for 'reset' between processing or inputs.
      • processAndFormat

        public int processAndFormat​(TextInput input)
        Processes input content against all extractors and all formatters This does not throw exceptions, as some processing may fail, while others succeed.

        TODO: Processing/Formatting details would have to be retrieved by calling some other method that is statefully tracking such things.

        Parameters:
        input -
        Returns:
        status -1 failure, 0 nothing found, 1 found matches and formatted; 2 found content but nothing formatted. them.