====================== Database Specification ====================== Legend ====== * ``keys`` or ``files`` * *shape* * **data type** * ```` * HDF5 structure * ``key``: **group**, or **dtype** if members have same shape and no identifiers * if **group**, ````: **dtype** variable sub-key without type * if **dtype**: *shape* same shape for all members * ``other-key``: **group** ```` **dtype** *shape* definition for all group members * ``specific-member``: properties for some specific member of group General Data Specification ========================== * all data is stored as HDF5 * file names with underscores for spaces and dashes as key-separators * all keys are in singular * each data instance is one large file * input data is only one per dataset and model * attribution is one per dataset, model and attribution method * analysis is one per dataset, model, attribution method and analysis topic Model Input Data ================ * shape for image data is *samples x channel x height x width* * since preprocessing depends on the model, we supply a file ``.input.h5`` with all preprocessing steps applied * HDF5 structure * ``data``: **group**, or **float32** if every sample has same dimensions * if **group**: ```` **float32** *channel x height x width* if group, ```` can be a filename or an identifier * if **float32**: **float32** *samples x channel x height x width* * ``label``: **group**, or **uint** if data is **group** * if **group**: ```` **uint** *1* for single label, or **bool** *classes* for multi label * if **uint**: *samples* for single label, **bool** *samples x classes* for multi label if ``data`` is **float32** * ``index``: **group** ```` **uint** *1*, optional * if ``data`` is **group**, assign indices to keys * otherwise natural sort order of keys is assumed Attribution of Input Data ========================= * ```` can be: * ``true``: for true label * ``model``: for model prediction * ```` for choosing a fixed label * ```` for something I did not think of * HDF5 structure * ``index``: **uint** *samples* indices of attributed input samples in the input file * ``attribution``: **group** or **float32** attributions with full channel information * if **group**: ```` **float32** *channel x height x width* * if **float32**: *samples x channel x height x width* * ``label`` **group** or **float32**, attribution assigned for the model output, governed by ```` * if **group**: ```` *samples x {1, }* * if **uint**: *samples x {1, }* * ``prediction``: **group** or **float32** * if **group**: ```` **float32** *classes* * if **float32**: *samples x classes* Analysis Output Data ==================== * HDF5 structure * ```` **group** with name of the analysis as sub-keys (not necessarily classes!, WordNet-id for class-wise ImageNet Analysis) * ``name``: **string** verbose name of analysis * ``index``: **uint32** *samples* sample indices in the input attribution file * ``embedding``: **group** ```` * ``spectral``: **group** * ``name``: **string**, verbose name of embedding * ``root``: **float32** *samples x eigenvalues* Eigenvectors of Eigen Decomposition * ``base``: **link**, if not model input, link to the embedding used * ``region``: **region reference**, if not model input or not full embedding, region reference to the features used as input * ``eigenvalue`` **float32** *eigenvalues* Eigenvalues for the spectral embedding * ``tsne``: **group** * ``name``: **string**, verbose name of embedding * ``root``: **float32** *samples x 2* t-SNE Embedding * ``base``: **link**, if not model input, link to the embedding used * ``region``: **region reference**, if not model input or not full embedding, region reference to the features used as input * ``clustering``: **group** * ````: **group** label for clusters on an embedding * ``name``: **string**, verbose name of embedding * ``root``: **uint32** *samples* labels for clustering on embedding * ``base``: **link** link to the embedding used for clustering * ``region``: **region reference**, if not model input or not full embedding, region reference to the features used as input * ``#clusters``: **int**, optional if not applying, number of clusters for this clustering * ``prototype``: **group** multiple prototypes for each cluster * ``average``: **group** member average prototypes for all clusters * ``name``: **string**, verbose name of prototype * ``root``: **float32** *<#clusters> x channel x height x width* prototype payload