java.lang.Object

org.opensextant.annotations.DeepEyeData

org.opensextant.annotations.Record

public class Record extends DeepEyeData

A record is a representation of the raw original. It records a processing date (aka ingest date), metadata, source ID, record ID, and a payload. Conventions:

Record Identity is very important in the context of a full pipeline. If you can leverage the given identity of data, maintain that consistently. MD5 digest or UUID has been used often to create a compact identifier, for examples. If left null, database systems can assign object IDs, but transactional webservices are not typically responsible for generating missing identifiers. The lesson is that we should not ignore the use of identifiers. For Record processing a record ID, if for nothing else, is practical for debugging and logging.
The metadata "attributes" are considered optional, but usually helpful. Record the raw metadata as-is when you can.
If metadata attributes can be conditioned or normalized easily do that, e.g., tag data with ISO2 country code, rather than with country name or FIPS code.
The proc_date is usually determined at ingest time; It makes a good shard key for balancing the load of records across distributed storage/database.
Record "value" vs. "content": content was intended to capture the textual content of files, knowing that trying to store raw binary content quickly leads to performance problems. For file-based sources (file system/folder crawls, web crawls, etc) content would store a compressed UTF-8 encoded byte-array; Record value would be the filepath to the original. However for non-file based records, the use of .value may make more sense to record the most obvious innate value.

Author:: ubaldino

Field Summary

Fields

Modifier and Type

Field

Description

byte[]

content

String

notes

Notes are any text messages you wish to attach to a record DeepEye is not responsible for how such a buffer is maintained.

String

procdate

a processing date/time key that has as much resolution as you need This is a string because the lexical sort is likely easier to manage than using actual date/time field with date/time math.

String

source_id

Source ID

Map<String,Object>

state

State flags indicate what state of processing the record is in or what processing has been applied to it.

int

stateMask

Map<String,Object>

tags

Fields inherited from class org.opensextant.annotations.DeepEyeData
attrs, id, value
Constructor Summary

Constructors

Constructor

Description

Record()

Record(String recid, String sid)
Method Summary

Modifier and Type

Method

Description

void

addCollectionTag(String s)

"tags" are meant to be used at a data set or collection level.

void

addCollectionTags(Collection<String> tlist)

Parses the given "a;b;c;..." format of tags into a Set.

void

addCollectionTags(Map<String,Object> map)

void

addState(String s)

void

addState(String s, int v)

void

addStates(Collection<String> s)

void

addStates(Map<String,Object> map)

Map<String,Object>

getMap()

String

toString()

Methods inherited from class org.opensextant.annotations.DeepEyeData
addAttribute, asMap, getAttributeNames, getAttributes, isValue, isValue, list, list, map, newAttributes

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- source_id
  
  public String source_id
  
  Source ID
- procdate
  
  public String procdate
  
  a processing date/time key that has as much resolution as you need This is a string because the lexical sort is likely easier to manage than using actual date/time field with date/time math.
- content
  
  public byte[] content
- state
  
  public Map<String,Object> state
  
  State flags indicate what state of processing the record is in or what processing has been applied to it.
- tags
  
  public Map<String,Object> tags
- stateMask
  
  public int stateMask
- notes
  
  public String notes
  
  Notes are any text messages you wish to attach to a record DeepEye is not responsible for how such a buffer is maintained. Not indexed.
Constructor Details
- Record
  
  public Record()
- Record
  
  public Record(String recid, String sid)
Method Details
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- addState
  
  public void addState(String s)
- addState
  
  public void addState(String s, int v)
- addStates
  
  public void addStates(Collection<String> s)
- addStates
  
  public void addStates(Map<String,Object> map)
- addCollectionTag
  
  public void addCollectionTag(String s)
  
  "tags" are meant to be used at a data set or collection level. I.e, a source may have records
  
  Parameters:
  
  s - tag string
- addCollectionTags
  
  public void addCollectionTags(Collection<String> tlist)
  
  Parses the given "a;b;c;..." format of tags into a Set.
  
  Parameters:
  
  tlist - list of tags
- addCollectionTags
  
  public void addCollectionTags(Map<String,Object> map)
  
  Parameters:
  
  map - list of tags
- getMap
  
  public Map<String,Object> getMap()
  
  Specified by:
  
  getMap in class DeepEyeData

Class Record

Field Summary

Fields inherited from class org.opensextant.annotations.DeepEyeData

Constructor Summary

Method Summary

Methods inherited from class org.opensextant.annotations.DeepEyeData

Methods inherited from class java.lang.Object

Field Details

source_id

procdate

content

state

tags

stateMask

notes

Constructor Details

Record

Record

Method Details

toString

addState

addState

addStates

addStates

addCollectionTag

addCollectionTags

addCollectionTags

getMap