Class Record

java.lang.Object
org.opensextant.annotations.DeepEyeData
org.opensextant.annotations.Record

public class Record extends DeepEyeData
A record is a representation of the raw original. It records a processing date (aka ingest date), metadata, source ID, record ID, and a payload. Conventions:
  • Record Identity is very important in the context of a full pipeline. If you can leverage the given identity of data, maintain that consistently. MD5 digest or UUID has been used often to create a compact identifier, for examples. If left null, database systems can assign object IDs, but transactional webservices are not typically responsible for generating missing identifiers. The lesson is that we should not ignore the use of identifiers. For Record processing a record ID, if for nothing else, is practical for debugging and logging.
  • The metadata "attributes" are considered optional, but usually helpful. Record the raw metadata as-is when you can.
  • If metadata attributes can be conditioned or normalized easily do that, e.g., tag data with ISO2 country code, rather than with country name or FIPS code.
  • The proc_date is usually determined at ingest time; It makes a good shard key for balancing the load of records across distributed storage/database.
  • Record "value" vs. "content": content was intended to capture the textual content of files, knowing that trying to store raw binary content quickly leads to performance problems. For file-based sources (file system/folder crawls, web crawls, etc) content would store a compressed UTF-8 encoded byte-array; Record value would be the filepath to the original. However for non-file based records, the use of .value may make more sense to record the most obvious innate value.
Author:
ubaldino
  • Field Details

    • source_id

      public String source_id
      Source ID
    • procdate

      public String procdate
      a processing date/time key that has as much resolution as you need This is a string because the lexical sort is likely easier to manage than using actual date/time field with date/time math.
    • content

      public byte[] content
    • state

      public Map<String,Object> state
      State flags indicate what state of processing the record is in or what processing has been applied to it.
    • tags

      public Map<String,Object> tags
    • stateMask

      public int stateMask
    • notes

      public String notes
      Notes are any text messages you wish to attach to a record DeepEye is not responsible for how such a buffer is maintained. Not indexed.
  • Constructor Details

    • Record

      public Record()
    • Record

      public Record(String recid, String sid)
  • Method Details

    • toString

      public String toString()
      Overrides:
      toString in class Object
    • addState

      public void addState(String s)
    • addState

      public void addState(String s, int v)
    • addStates

      public void addStates(Collection<String> s)
    • addStates

      public void addStates(Map<String,Object> map)
    • addCollectionTag

      public void addCollectionTag(String s)
      "tags" are meant to be used at a data set or collection level. I.e, a source may have records
      Parameters:
      s - tag string
    • addCollectionTags

      public void addCollectionTags(Collection<String> tlist)
      Parses the given "a;b;c;..." format of tags into a Set.
      Parameters:
      tlist - list of tags
    • addCollectionTags

      public void addCollectionTags(Map<String,Object> map)
      Parameters:
      map - list of tags
    • getMap

      public Map<String,Object> getMap()
      Specified by:
      getMap in class DeepEyeData