Database Specification
Legend
keys
orfiles
shape
data type
<variable>
HDF5 structure
key
: group, or dtype if members have same shape and no identifiersif group,
<sub-key-variable>
: dtype variable sub-key without typeif dtype: shape same shape for all members
other-key
: group<clustering>
dtype shape definition for all group membersspecific-member
: properties for some specific member of group
General Data Specification
all data is stored as HDF5
file names with underscores for spaces and dashes as key-separators
all keys are in singular
each data instance is one large file
input data is only one per dataset and model
attribution is one per dataset, model and attribution method
analysis is one per dataset, model, attribution method and analysis topic
Model Input Data
shape for image data is samples x channel x height x width
since preprocessing depends on the model, we supply a file
<model>.input.h5
with all preprocessing steps appliedHDF5 structure
data
: group, or float32 if every sample has same dimensionsif group:
<data-id>
float32 channel x height x width if group,<data-id>
can be a filename or an identifierif float32: float32 samples x channel x height x width
label
: group, or uint if data is groupif group:
<data-id>
uint 1 for single label, or bool classes for multi labelif uint: samples for single label, bool samples x classes for multi label if
data
is float32
index
: group<data-id>
uint 1, optionalif
data
is group, assign indices to keysotherwise natural sort order of keys is assumed
Attribution of Input Data
<attribution-strategy>
can be:true
: for true labelmodel
: for model prediction<integer>
for choosing a fixed label<else>
for something I did not think of
HDF5 structure
index
: uint samples indices of attributed input samples in the input fileattribution
: group or float32 attributions with full channel informationif group:
<data-id>
float32 channel x height x widthif float32: samples x channel x height x width
label
group or float32, attribution assigned for the model output, governed by<attribution-strategy>
if group:
<data-id>
samples x {1, <classes>}if uint: samples x {1, <classes>}
prediction
: group or float32if group:
<data-id>
float32 classesif float32: samples x classes
Analysis Output Data
HDF5 structure
<analysis-identifier>
group with name of the analysis as sub-keys (not necessarily classes!, WordNet-id for class-wise ImageNet Analysis)name
: string verbose name of analysisindex
: uint32 samples sample indices in the input attribution fileembedding
: group<embedding-id>
spectral
: groupname
: string, verbose name of embeddingroot
: float32 samples x eigenvalues Eigenvectors of Eigen Decompositionbase
: link, if not model input, link to the embedding usedregion
: region reference, if not model input or not full embedding, region reference to the features used as inputeigenvalue
float32 eigenvalues Eigenvalues for the spectral embedding
tsne
: groupname
: string, verbose name of embeddingroot
: float32 samples x 2 t-SNE Embeddingbase
: link, if not model input, link to the embedding usedregion
: region reference, if not model input or not full embedding, region reference to the features used as input
clustering
: group<clustering>
: group label for clusters on an embeddingname
: string, verbose name of embeddingroot
: uint32 samples labels for clustering on embeddingbase
: link link to the embedding used for clusteringregion
: region reference, if not model input or not full embedding, region reference to the features used as input#clusters
: int, optional if not applying, number of clusters for this clusteringprototype
: group multiple prototypes for each clusteraverage
: group member average prototypes for all clustersname
: string, verbose name of prototyperoot
: float32 <#clusters> x channel x height x width prototype payload