Below are resources that I helped develop in or before 2017. For more recent resources, see links under relevant
paper entries.
Training and validation data created for LAMBADA word prediction task; described
here. [
lambada-train-valid.tar.gz (330MB)]
Manual analysis of 100 LAMBADA instances from paper above. [
lambada-analysis.tar.gz]
Code for training charagram models and pre-trained models from
this EMNLP16 paper (developed by
John Wieting) [
link]
Who-did-What reading comprehension dataset from
this EMNLP16 paper [
link]
Resources for commonsense knowledge representation from
ACL16 paper [
link]
Code for training paragram phrase embeddings and other models from
ICLR16 paper (developed by
John Wieting) [
link]
Pretrained paragram word embeddings and annotated phrase similarity datasets (developed by
John Wieting) [
link]
Rampion, a framework for training statistical machine translation models
[
link]
Twitter part-of-speech tagger and tweets manually annotated with part-of-speech tags
[
link]
NFL game data and aligned tweets
[
link]
Code for performing inference for monolingual and bilingual gappy pattern models
[
link] [
sample patterns]
Code to find trigger word pairs using mutual information (reimplementation of
Rosenfeld, 1994)
[
code]
Factoid question-answer pairs from Wikipedia articles with difficulty ratings [
link]
Scripts for performing bootstrap resampling for BLEU significance testing
[
link]