Scores on benchmarks

Model rank shown below is with respect to all public models.

.127	average_vision rank 368 81 benchmarks	.127 0 ceiling best median

.254	behavior_vision rank 152 43 benchmarks	.254 0 ceiling best median

.191	Geirhos2021-error_consistency [reference] rank 143 17 benchmarks	.191 0 ceiling best median

.300	Geirhos2021cueconflict-error_consistency v1 [reference] rank 60	.300 0 ceiling best median

.175	Geirhos2021edge-error_consistency v1 [reference] rank 45	.175 0 ceiling best median

.445	Geirhos2021eidolonI-error_consistency v1 [reference] rank 82	.445 0 ceiling best median

.463	Geirhos2021eidolonII-error_consistency v1 [reference] rank 80	.463 0 ceiling best median

.505	Geirhos2021falsecolour-error_consistency v1 [reference] rank 62	.505 0 ceiling best median

.139	Geirhos2021highpass-error_consistency v1 [reference] rank 53	.139 0 ceiling best median

.252	Geirhos2021lowpass-error_consistency v1 [reference] rank 81	.252 0 ceiling best median

.195	Geirhos2021phasescrambling-error_consistency v1 [reference] rank 81	.195 0 ceiling best median

.222	Geirhos2021powerequalisation-error_consistency v1 [reference] rank 74	.222 0 ceiling best median

.548	Geirhos2021silhouette-error_consistency v1 [reference] rank 87	.548 0 ceiling best median

.519	Baker2022 rank 48 3 benchmarks	.519 0 ceiling best median

.524	Baker2022fragmented-accuracy_delta v1 [reference] rank 80	.524 0 ceiling best median

.666	Baker2022frankenstein-accuracy_delta v1 [reference] rank 53	.666 0 ceiling best median

.367	Baker2022inverted-accuracy_delta v1 [reference] rank 38	.367 0 ceiling best median

.460	Maniquet2024 rank 123 2 benchmarks	.460 0 ceiling best median

.262	Maniquet2024-confusion_similarity v1 [reference] rank 156	.262 0 ceiling best median

.659	Maniquet2024-tasks_consistency v1 [reference] rank 78	.659 0 ceiling best median

.291	Hebart2023-match v1 rank 91	.291 0 ceiling best median

.209	BMD2024 rank 57 4 benchmarks	.209 0 ceiling best median

.125	BMD2024.dotted_1Behavioral-accuracy_distance v1 rank 109	.125 0 ceiling best median

.258	BMD2024.texture_1Behavioral-accuracy_distance v1 rank 45	.258 0 ceiling best median

.315	BMD2024.texture_2Behavioral-accuracy_distance v1 rank 28	.315 0 ceiling best median

.137	BMD2024.dotted_2Behavioral-accuracy_distance v1 rank 94	.137 0 ceiling best median

.360	Coggan2024_behavior-ConditionWiseAccuracySimilarity v1 rank 79	.360 0 ceiling best median

.342	engineering_vision rank 154 25 benchmarks	.342 0 ceiling best median

.732	ImageNet-top1 v1 [reference] rank 82	.732 0 ceiling best median

.467	ImageNet-C-top1 [reference] rank 47 4 benchmarks	.467 0 ceiling best median

.444	ImageNet-C-noise-top1 v2 [reference] rank 54	.444 0 ceiling best median

.357	ImageNet-C-blur-top1 v2 [reference] rank 74	.357 0 ceiling best median

.501	ImageNet-C-weather-top1 v2 [reference] rank 54	.501 0 ceiling best median

.566	ImageNet-C-digital-top1 v2 [reference] rank 39	.566 0 ceiling best median

.415	Geirhos2021-top1 [reference] rank 231 17 benchmarks	.415 0 ceiling best median

.228	Geirhos2021cueconflict-top1 v1 [reference] rank 92	.228 0 ceiling best median

.269	Geirhos2021edge-top1 v1 [reference] rank 135	.269 0 ceiling best median

.494	Geirhos2021eidolonI-top1 v1 [reference] rank 144	.494 0 ceiling best median

.533	Geirhos2021eidolonII-top1 v1 [reference] rank 113	.533 0 ceiling best median

.975	Geirhos2021falsecolour-top1 v1 [reference] rank 39	.975 0 ceiling best median

.503	Geirhos2021highpass-top1 v1 [reference] rank 61	.503 0 ceiling best median

.463	Geirhos2021lowpass-top1 v1 [reference] rank 79	.463 0 ceiling best median

.627	Geirhos2021phasescrambling-top1 v1 [reference] rank 100	.627 0 ceiling best median

.804	Geirhos2021powerequalisation-top1 v1 [reference] rank 66	.804 0 ceiling best median

.519	Geirhos2021silhouette-top1 v1 [reference] rank 106	.519 0 ceiling best median

.631	Geirhos2021sketch-top1 v1 [reference] rank 93	.631 0 ceiling best median

.450	Geirhos2021stylized-top1 v1 [reference] rank 61	.450 0 ceiling best median

.554	Geirhos2021uniformnoise-top1 v1 [reference] rank 69	.554 0 ceiling best median

.094	Hermann2020 [reference] rank 271 2 benchmarks	.094 0 ceiling best median

.188	Hermann2020cueconflict-shape_match v1 [reference] rank 98	.188 0 ceiling best median

How to use

from brainscore_vision import load_model
model = load_model("inception_v3_pytorch")
model.start_task(...)
model.start_recording(...)
model.look_at(...)

Model API

Code examples

Benchmarks bibtex

@article{geirhos2021partial,
              title={Partial success in closing the gap between human and machine vision},
              author={Geirhos, Robert and Narayanappa, Kantharaju and Mitzkus, Benjamin and Thieringer, Tizian and Bethge, Matthias and Wichmann, Felix A and Brendel, Wieland},
              journal={Advances in Neural Information Processing Systems},
              volume={34},
              year={2021},
              url={https://openreview.net/forum?id=QkljT4mrfs}
        }
        @article{BAKER2022104913,
                title = {Deep learning models fail to capture the configural nature of human shape perception},
                journal = {iScience},
                volume = {25},
                number = {9},
                pages = {104913},
                year = {2022},
                issn = {2589-0042},
                doi = {https://doi.org/10.1016/j.isci.2022.104913},
                url = {https://www.sciencedirect.com/science/article/pii/S2589004222011853},
                author = {Nicholas Baker and James H. Elder},
                keywords = {Biological sciences, Neuroscience, Sensory neuroscience},
                abstract = {Summary
                A hallmark of human object perception is sensitivity to the holistic configuration of the local shape features of an object. Deep convolutional neural networks (DCNNs) are currently the dominant models for object recognition processing in the visual cortex, but do they capture this configural sensitivity? To answer this question, we employed a dataset of animal silhouettes and created a variant of this dataset that disrupts the configuration of each object while preserving local features. While human performance was impacted by this manipulation, DCNN performance was not, indicating insensitivity to object configuration. Modifications to training and architecture to make networks more brain-like did not lead to configural processing, and none of the networks were able to accurately predict trial-by-trial human object judgements. We speculate that to match human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition.}
        }
        @article {Maniquet2024.04.02.587669,
	author = {Maniquet, Tim and de Beeck, Hans Op and Costantino, Andrea Ivan},
	title = {Recurrent issues with deep neural network models of visual recognition},
	elocation-id = {2024.04.02.587669},
	year = {2024},
	doi = {10.1101/2024.04.02.587669},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/04/10/2024.04.02.587669},
	eprint = {https://www.biorxiv.org/content/early/2024/04/10/2024.04.02.587669.full.pdf},
	journal = {bioRxiv}
}
        @INPROCEEDINGS{5206848,  
                                                author={J. {Deng} and W. {Dong} and R. {Socher} and L. {Li} and  {Kai Li} and  {Li Fei-Fei}},  
                                                booktitle={2009 IEEE Conference on Computer Vision and Pattern Recognition},   
                                                title={ImageNet: A large-scale hierarchical image database},   
                                                year={2009},  
                                                volume={},  
                                                number={},  
                                                pages={248-255},
                                            }
        @ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}
        @article{hermann2020origins,
              title={The origins and prevalence of texture bias in convolutional neural networks},
              author={Hermann, Katherine and Chen, Ting and Kornblith, Simon},
              journal={Advances in Neural Information Processing Systems},
              volume={33},
              pages={19000--19015},
              year={2020},
              url={https://proceedings.neurips.cc/paper/2020/hash/db5f9f42a7157abe65bb145000b5871a-Abstract.html}
        }

Layer Commitment

No layer commitments found for this model. Older submissions might not have stored this information but will be updated when evaluated on new benchmarks.

Visual Angle

None degrees