=================== Project File Format =================== A project file serves as a unified entity, binding together various components of a project, including label maps, datasets, attributions, and analyses. These files are created in the `YAML `_ format, facilitating readability and easy management. For instance, a sample project file for a VGG16 classifier trained on the CIFAR-10 dataset can be viewed below. .. code-block:: yaml project: name: VGG16 CIFAR-10 model: VGG16 label_map: label-map.json dataset: name: CIFAR-10 type: hdf5 path: cifar-10.h5 input_width: 32 input_height: 32 up_sampling_method: none down_sampling_method: none attributions: attribution_method: LRP Epsilon Gamma Box attribution_strategy: true_label sources: - attributions.h5 analyses: - analysis_method: Spectral sources: - analysis.h5 The paths of referenced files in the project are relative to the project file itself. HDF5 files adhere to a specific structure as outlined in the :doc:`database-specification`. A demonstration of how to structure HDF5 files for use with ViRelAy is presented in :repo:`docs/examples/hdf5_structure.py`. A project YAML file consists of several key components, including a project name, a model name, a reference to the dataset file, a reference to the label map file, a reference to the attribution files, and a reference to the analysis files. The definition for each property is as follows: * ``project`` → ``name``: The name of the project. Can be chosen arbitrarily and is only used for informational purposes. * ``project`` → ``model``: The name of the classifier model. Can be chosen arbitrarily and is only used for informational purposes. * ``project`` → ``label_map``: The path to the label map file. * ``project`` → ``dataset``: The dataset that the classifier was trained on. * ``project`` → ``dataset`` → ``name``: The name of the dataset. Can be chosen arbitrarily and is only used for informational purposes. * ``project`` → ``dataset`` → ``type``: The type of the dataset, which is used to distinguish between image directory datasets and datasets that are stored in HDF5 files. Possible values are: - ``hdf5``: the dataset is stored in an HDF5 file, please refer to :doc:`database-specification` for more information. - ``image_directory``: the dataset is stored in a hierarchal directory structure, where the top-level directory contains directories for the classes and each class directory contains the samples. * ``project`` → ``dataset`` → ``path``: The path to the HDF5 file or the directory containing the dataset. * ``project`` → ``dataset`` → ``input_width``: The input width determines the width to which the images have to be re-sampled before feeding them into the classifier. This is needed when the dataset images have varying sizes or the classifier needs a different input size than the images of the dataset. * ``project`` → ``dataset`` → ``input_height``: The input height determines the height to which the images have to be re-sampled before feeding them into the classifier. This is needed when the dataset images have varying sizes or the classifier needs a different input size than the images of the dataset. * ``project`` → ``dataset`` → ``up_sampling_method``: The up-sampling methods determine how the images are scaled up when they are smaller than the specified input size. Possible values are: - ``none``: No up-sampling is performed. - ``fill_zeros``: A border of zeros will be added. - ``fill_ones``: A border of ones will be added. - ``edge_repeat``: The pixels at the edge of the image will be repeated to fill up the remaining space. - ``mirror_edge``: The pixels at the edge of the image will be mirrored to fill up the remaining space. - ``wrap_around``: The pixels from the opposite edge will be mirrored to fill up the remaining space. - ``resize``: The image will be scaled to the desired size. * ``project`` → ``dataset`` → ``down_sampling_method``: The down-sampling methods determine how the images are scaled down when they are bigger than the specified input size. Possible values are: - ``none``: No down-sampling is performed. - ``center_crop``: A central part of the image with the desired size will be cut out. - ``resize``: The image will be scaled to the desired size. * ``project`` → ``dataset`` → ``label_index_regex``: A regular expression, which is used to parse the path of a sample for the label index. The sample index must be captured in the first group. Can be ``None``, but if the dataset type is ``image_directory``, then either ``label_index_regex`` or ``label_word_net_id_regex`` must be specified. * ``project`` → ``dataset`` → ``label_word_net_id_regex``: A regular expression, which is used to parse the path of a sample for the WordNet ID of the label. The WordNet ID must be captured in the first group. Can be ``None``, but if the dataset type is ``image_directory``, then either ``label_index_regex`` or ``label_word_net_id_regex`` must be specified. * ``project`` → ``attributions``: The attributions that were computed for the entire dataset using the classifier model. * ``project`` → ``attributions`` → ``attribution_method``: The name of the method that was used to compute the attributions, e.g., the name of an LRP variant. * ``project`` → ``attributions`` → ``attribution_strategy``: - ``true_label``: The attribution was computed for the ground-truth label. - ``predicted_label``: The attribution was computed for the label predicted by the classifier. * ``project`` → ``attributions`` → ``sources``: A list of the attribution source HDF5 files. There can be one or more attribution databases in a project, e.g., one attribution database per dataset class could be created. * ``project`` → ``analyses``: A list of the analyses that were performed on the attributions. There can be multiple different analyses with their own analysis files in a project. * ``project`` → ``analyses`` → ``analysis_method``: The name of the method that was used to perform the analysis, e.g., "Spectral". * ``project`` → ``analyses`` → ``sources``: A list of the analysis source HDF5 files. Each analysis can consist of one or more analysis databases, e.g., one analysis file could be created per embedding or attribution method. A label map is a separate file that contains a mapping between label indices, class names, and optional WordNet IDs. This file enables accurate mapping between label indices or WordNet IDs and human-readable class names within the ViRelAy UI. The label map consists of an array of objects, each representing a single class with its index, name, and optional WordNet ID. An example label map for the CIFAR-10 dataset can be viewed below. .. code-block:: json [ { "index": 0, "word_net_id": "n02691156", "name": "Airplane" }, { "index": 1, "word_net_id": "n02958343", "name": "Automobile" }, { "index": 2, "word_net_id": "n01503061", "name": "Bird" }, { "index": 3, "word_net_id": "n02121620", "name": "Cat" }, { "index": 4, "word_net_id": "n02430045", "name": "Deer" }, { "index": 5, "word_net_id": "n02084071", "name": "Dog" }, { "index": 6, "word_net_id": "n01639765", "name": "Frog" }, { "index": 7, "word_net_id": "n02374451", "name": "Horse" }, { "index": 8, "word_net_id": "n04194289", "name": "Ship" }, { "index": 9, "word_net_id": "n04490091", "name": "Truck" } ] This label map demonstrates the structure of a typical label map file.