Kevin Gimpel

Below are resources that I helped develop in or before 2017. For more recent resources, see links under relevant paper entries.

Training and validation data created for LAMBADA word prediction task; described here.  [lambada-train-valid.tar.gz (330MB)]
Manual analysis of 100 LAMBADA instances from paper above.  [lambada-analysis.tar.gz]

Code for training charagram models and pre-trained models from this EMNLP16 paper (developed by John Wieting)  [link]

Who-did-What reading comprehension dataset from this EMNLP16 paper  [link]

Resources for commonsense knowledge representation from ACL16 paper  [link]

Code for training paragram phrase embeddings and other models from ICLR16 paper (developed by John Wieting)  [link]

Pretrained paragram word embeddings and annotated phrase similarity datasets (developed by John Wieting)  [link]

Rampion, a framework for training statistical machine translation models [link]

Twitter part-of-speech tagger and tweets manually annotated with part-of-speech tags [link]

NFL game data and aligned tweets [link]

Code for performing inference for monolingual and bilingual gappy pattern models [link] [sample patterns]

Code to find trigger word pairs using mutual information (reimplementation of Rosenfeld, 1994) [code]

Factoid question-answer pairs from Wikipedia articles with difficulty ratings [link]

Scripts for performing bootstrap resampling for BLEU significance testing [link]