How to Analyze Classifiers and Datasets

In this guide we will explore how to use ViRelAy to analyze a real-world classifier and the dataset it was trained on. We will systematically use ViRelAy’s features to explore the prediction behavior of the classifier and uncover Clever Hans classification strategies, which are based on defects in the training dataset. The term Clever Hans goes back to a German horse from the 19th century that was claimed to be able to perform simple arithmetic. It was later discovered that the horse used subtle and involuntary cues in the body language of its trainer to reach the correct solution. In machine learning, Clever Hans classification strategies are characterizes by the exploitation of spurious correlations in the training data that are usually not available in the real-world data. Thus the classifier fails to generalize and will likely not be useful when it is deployed in the real world.

This guide is based on an analysis that was performed on a Fisher vector classifier trained on the Pascal VOC 2007 dataset. A detailed analysis of this classifier can be found in [Lapuschkin et al., 2019]. When opening the project, the familiar user interface is displayed, which can be seen in Figure 1.

Fisher Vector Classifier Pascal VOC Project — Figure 1: The initial screen when opening the ViRelAy project for the Fisher vector classifier trained on Pascal VOC 2007.

When first starting the analysis of the data, it is often a good starting point to get a feel for the data, e.g., by looking at the different categories (in this project, the categories correspond to the classes of the Pascal VOC 2007 dataset) using different kinds of embeddings. Depending on the project, different embedding methods may be better suited to find defects in the classifier or the dataset. In this case the t-SNE embedding method seems to produce a more sensible form of embedding, while the sub-dimensions of the spectral embedding are generally rather uninformative.

Also, one should examine the different clusterings of the embeddings. In this example, the spectral embeddings were clustered using k-means clustering with different values for k. Since the embeddings are over the attributions, the different clusters can be interpreted as different classification strategies of the classifier. In some projects, the clusterings may better highlight different classification strategies of the classifier, while in other projects, the visual clustering implicitly generated by the embedding may be more informative. In this case, the k-means clustering does not seem to contain much information, so the visual clustering of the t-SNE embedding should be examined further. Therefore, in the next step the different categories are explored by looking for small clusters in the t-SNE embedding that stick out. Some of the categories have a very homogeneous embedding, while others are more heterogeneous and seem to have outliers. For example, compare the t-SNE embedding for the class bird with the embedding for the class horse in Figure 2.

Comparison of Class Bird vs. Class Horse — Figure 2: A comparison of the embeddings of the classes bird and the horse.

While the class bird is very homogeneous and has no clear outliers, the class horse has multiple smaller outlier clusters. Since the embedding of the bird class is very homogeneous, the classifier seems to have learned a coherent and robust classification strategy. On the other hand, the classifier seems to have learned multiple distinct classification strategies for the horse class. Especially, when the outlier clusters are small, this may hint at a Clever Hans classification strategy, because the strategy was learned for only a small subset of training samples, thus suggesting that these samples may have a special feature in common that can be easily exploited by the classifier.

Now, that a likely candidate for an anomalous classification strategy has been identified, the samples need to be investigated further to find the features in the data that the classifier bases its decision on. To do this, the mouse pointer can be hovered over the samples of the outlier classes to visually inspect the input images. For example, have a look at Figure 3, which shows the input image of one of the samples in the tear-shaped outlier cluster of the horse class.

Visual Inspection of an Input Sample of Class Horse — Figure 3: The visual inspection of the input image of a sample of the class horse.

At first glance the image looks like a regular photo of a horse and there is no obvious feature in the data that could be exploited by the classifier to learn a Clever Hans classification strategy. Therefore, the other samples in the outlier cluster need to be inspected further. This can be done by left clicking and dragging the mouse pointer to draw a selection around the samples. Figure 4 shows a subset of the samples from the tear-shaped outlier cluster.

Input Images of an Outlier Cluster in the Class Horse — Figure 4: The input images of the samples in an outlier cluster of the class horse.

Again, at first glance, the images don’t seem to have anything in common feature-wise, that could be exploited by the classifier. But when looking closer, something catches the eye: each of the images has a copyright notice at the bottom. Now the hypothesis, that the classifier exploits the copyright notice as a feature to classify the horses, can be formed. To verify this hypothesis, the actual features that the classifier used for the classification need to be identified. Right now, the input images are viewed in the sample viewer, but there are two other modes of viewing the samples: attribution and overlay (cf. Figure 5).

Sample Viewer Display Mode — Figure 5: The display mode of the sample viewer is set to overlay.

The attribution view shows a heatmap in input space that highlights which pixels contributed positively or negatively to the classification result (the heatmap is a visual representation of the numerical attribution, the color map used to render the heatmaps can be selected in the toolbox at the top of the ViRelAy user interface). Positive attribution means that the image region contributed positively towards the classification result, whereas negative attribution means that the image region contributed negatively, i.e., it speaks against the class that was the classification result. The overlay mode superimposes the heatmap onto the input image, thus enabling us to directly see the underlying image features. When the attributions are fine and detailed it usually makes sense to directly view the heatmaps, as the image details can be seen in the heatmaps. When the attributions are coarse, it is harder to correlate the heatmap to the corresponding image regions, therefore, the overlay mode makes it easier to find the actual image features that were attributed. In this example, the attributions are rather coarse, therefore the overlay mode should be favored. Figure 6 shows the attribution heatmaps superimposed onto the input images.

Input Images of an Outlier Cluster in the Class Horse with Attribution Overlay — Figure 6: The input images of the samples in an outlier cluster of the class *horse* with the attribution heatmap superimposed onto them (sample viewer display mode overlay).

This overlay provides strong evidence in favor of the hypothesis, that the classifier bases its classification decisions on the copyright notices at the bottom of the images, because the positive attribution (bright red and yellow regions) are on the image region that contains the copyright notice. In fact, the classifier seems to almost exclusively rely on the copyright notice as there is very little to no positive attribution on the image regions that contain the horses. In some cases, the horses even have some negative attribution on them.

If we now go on to select some of the other outlier clusters, we will find that each of these clusters represent the same kind of Clever Hans effect, although the copyright notice changes from cluster to cluster.

Finally, it is recommended and good practice to save findings for later. This can be done by clicking the export button in the upper right corner of the toolbar. This will generate a JSON file containing the entire current state of ViRelAy. The JSON file can later be imported using the import button. Saving the findings also makes it easier to share and document your findings later. The export, import, and share buttons can be seen in Figure 7.