Scores on benchmarks
Model rank shown below is with respect to all public models..590 |
average_language
rank 8
3 benchmarks |
|
.818 |
neural_language
rank 7
2 benchmarks |
|
.818 |
Pereira2018-linear
rank 7
2 benchmarks |
|
.815 |
Pereira2018.384sentences-linear
v1
rank 7
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.822 |
Pereira2018.243sentences-linear
v1
rank 7
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.361 |
behavior_language
rank 3
1 benchmark |
|
.361 |
Futrell2018-pearsonr
v1
[reference]
rank 3
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.503 |
engineering_language
rank 9
30 benchmarks |
|
.503 |
SyntaxGym
[reference]
rank 9
30 benchmarks |
|
.000 |
syntaxgym-npi_src_ever
v1
[reference]
rank 10
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.000 |
syntaxgym-reflexive_orc_fem
v1
[reference]
rank 10
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.579 |
syntaxgym-number_prep
v1
[reference]
rank 8
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.368 |
syntaxgym-reflexive_orc_masc
v1
[reference]
rank 9
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.105 |
syntaxgym-number_orc
v1
[reference]
rank 9
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.026 |
syntaxgym-npi_orc_any
v1
[reference]
rank 10
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.026 |
syntaxgym-npi_orc_ever
v1
[reference]
rank 10
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.158 |
syntaxgym-reflexive_src_fem
v1
[reference]
rank 8
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.474 |
syntaxgym-reflexive_prep_masc
v1
[reference]
rank 9
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.725 |
syntaxgym-cleft_modifier
v1
[reference]
rank 9
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.667 |
syntaxgym-npz_ambig_mod
v1
[reference]
rank 9
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.786 |
syntaxgym-mvrr_mod
v1
[reference]
rank 5
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.875 |
syntaxgym-npz_obj_mod
v1
[reference]
rank 8
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.929 |
syntaxgym-center_embed_mod
v1
[reference]
rank 2
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.875 |
syntaxgym-fgd_pp
v1
[reference]
rank 4
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.667 |
syntaxgym-npz_ambig
v1
[reference]
rank 6
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.105 |
syntaxgym-reflexive_prep_fem
v1
[reference]
rank 10
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.875 |
syntaxgym-fgd_object
v1
[reference]
rank 9
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.542 |
syntaxgym-fgd_subject
v1
[reference]
rank 3
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.821 |
syntaxgym-mvrr
v1
[reference]
rank 2
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.000 |
syntaxgym-fgd_hierarchy
v1
[reference]
rank 1
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
1.0 |
syntaxgym-cleft
v1
[reference]
rank 1
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.964 |
syntaxgym-center_embed
v1
[reference]
rank 3
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.478 |
syntaxgym-subordination_pp-pp
v1
[reference]
rank 9
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.957 |
syntaxgym-subordination_orc-orc
v1
[reference]
rank 6
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.217 |
syntaxgym-subordination
v1
[reference]
rank 9
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.526 |
syntaxgym-reflexive_src_masc
v1
[reference]
rank 7
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.565 |
syntaxgym-subordination_src-src
v1
[reference]
rank 9
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.000 |
syntaxgym-npi_src_any
v1
[reference]
rank 10
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
.789 |
syntaxgym-number_src
v1
[reference]
rank 2
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
How to use
from brainscore_language import load_model model = load_model("distilgpt2") model.start_task(...) model.start_recording(...) model.look_at(...)
Benchmarks bibtex
@proceedings{futrell2018natural, title={The Natural Stories Corpus}, author={Futrell, Richard and Gibson, Edward and Tily, Harry J. and Blank, Idan and Vishnevetsky, Anastasia and Piantadosi, Steven T. and Fedorenko, Evelina}, conference={International Conference on Language Resources and Evaluation (LREC)}, url={http://www.lrec-conf.org/proceedings/lrec2018/pdf/337.pdf}, year={2018} } @inproceedings{gauthier-etal-2020-syntaxgym, title = "{S}yntax{G}ym: An Online Platform for Targeted Evaluation of Language Models", author = "Gauthier, Jon and Hu, Jennifer and Wilcox, Ethan and Qian, Peng and Levy, Roger", booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations", month = jul, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.acl-demos.10", pages = "70--76", abstract = "Targeted syntactic evaluations have yielded insights into the generalizations learned by neural network language models. However, this line of research requires an uncommon confluence of skills: both the theoretical knowledge needed to design controlled psycholinguistic experiments, and the technical proficiency needed to train and deploy large-scale language models. We present SyntaxGym, an online platform designed to make targeted evaluations accessible to both experts in NLP and linguistics, reproducible across computing environments, and standardized following the norms of psycholinguistic experimental design. This paper releases two tools of independent value for the computational linguistics community: 1. A website, syntaxgym.org, which centralizes the process of targeted syntactic evaluation and provides easy tools for analysis and visualization; 2. Two command-line tools, {`}syntaxgym{`} and {`}lm-zoo{`}, which allow any user to reproduce targeted syntactic evaluations and general language model inference on their own machine.", }