Class Record


  • public class Record
    extends DeepEyeData
    A record is a representation of the raw original. It records a processing date (aka ingest date), metadata, source ID, record ID, and a payload. Conventions:
    • Record Identity is very important in the context of a full pipeline. If you can leverage the given identity of data, maintain that consistently. MD5 digest or UUID has been used often to create a compact identifier, for examples. If left null, database systems can assign object IDs, but transactional webservices are not typically responsible for generating missing identifiers. The lesson is that we should not ignore the use of identifiers. For Record processing a record ID, if for nothing else, is practical for debugging and logging.
    • The metadata "attributes" are considered optional, but usually helpful. Record the raw metadata as-is when you can.
    • If metadata attributes can be conditioned or normalized easily do that, e.g., tag data with ISO2 country code, rather than with country name or FIPS code.
    • The proc_date is usually determined at ingest time; It makes a good shard key for balancing the load of records across distributed storage/database.
    • Record "value" vs. "content": content was intended to capture the textual content of files, knowing that trying to store raw binary content quickly leads to performance problems. For file-based sources (file system/folder crawls, web crawls, etc) content would store a compressed UTF-8 encoded byte-array; Record value would be the filepath to the original. However for non-file based records, the use of .value may make more sense to record the most obvious innate value.
    Author:
    ubaldino
    • Field Summary

      Fields 
      Modifier and Type Field Description
      byte[] content  
      java.lang.String notes
      Notes are any text messages you wish to attach to a record DeepEye is not responsible for how such a buffer is maintained.
      java.lang.String procdate
      a processing date/time key that has as much resolution as you need This is a string because the lexical sort is likely easier to manage than using actual date/time field with date/time math.
      java.lang.String source_id
      Source ID
      java.util.Map<java.lang.String,​java.lang.Object> state
      State flags indicate what state of processing the record is in or what processing has been applied to it.
      int stateMask  
      java.util.Map<java.lang.String,​java.lang.Object> tags  
    • Constructor Summary

      Constructors 
      Constructor Description
      Record()  
      Record​(java.lang.String recid, java.lang.String sid)  
    • Method Summary

      Modifier and Type Method Description
      void addCollectionTag​(java.lang.String s)
      "tags" are meant to be used at a data set or collection level.
      void addCollectionTags​(java.util.Collection<java.lang.String> tlist)
      Parses the given "a;b;c;..." format of tags into a Set.
      void addCollectionTags​(java.util.Map<java.lang.String,​java.lang.Object> map)  
      void addState​(java.lang.String s)  
      void addState​(java.lang.String s, int v)  
      void addStates​(java.util.Collection<java.lang.String> s)  
      void addStates​(java.util.Map<java.lang.String,​java.lang.Object> map)  
      java.util.Map<java.lang.String,​java.lang.Object> getMap()  
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • source_id

        public java.lang.String source_id
        Source ID
      • procdate

        public java.lang.String procdate
        a processing date/time key that has as much resolution as you need This is a string because the lexical sort is likely easier to manage than using actual date/time field with date/time math.
      • content

        public byte[] content
      • state

        public java.util.Map<java.lang.String,​java.lang.Object> state
        State flags indicate what state of processing the record is in or what processing has been applied to it.
      • tags

        public java.util.Map<java.lang.String,​java.lang.Object> tags
      • stateMask

        public int stateMask
      • notes

        public java.lang.String notes
        Notes are any text messages you wish to attach to a record DeepEye is not responsible for how such a buffer is maintained. Not indexed.
    • Constructor Detail

      • Record

        public Record()
      • Record

        public Record​(java.lang.String recid,
                      java.lang.String sid)
    • Method Detail

      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • addState

        public void addState​(java.lang.String s)
      • addState

        public void addState​(java.lang.String s,
                             int v)
      • addStates

        public void addStates​(java.util.Collection<java.lang.String> s)
      • addStates

        public void addStates​(java.util.Map<java.lang.String,​java.lang.Object> map)
      • addCollectionTag

        public void addCollectionTag​(java.lang.String s)
        "tags" are meant to be used at a data set or collection level. I.e, a source may have records
        Parameters:
        s - tag string
      • addCollectionTags

        public void addCollectionTags​(java.util.Collection<java.lang.String> tlist)
        Parses the given "a;b;c;..." format of tags into a Set.
        Parameters:
        tlist - list of tags
      • addCollectionTags

        public void addCollectionTags​(java.util.Map<java.lang.String,​java.lang.Object> map)
        Parameters:
        map - list of tags
      • getMap

        public java.util.Map<java.lang.String,​java.lang.Object> getMap()
        Specified by:
        getMap in class DeepEyeData