Below are resources that I helped develop in or before 2017. For more recent resources, see links under relevant 
paper entries.
			
			Training and validation data created for LAMBADA word prediction task; described 
here.  [
lambada-train-valid.tar.gz (330MB)]
			Manual analysis of 100 LAMBADA instances from paper above.  [
lambada-analysis.tar.gz]
			
			Code for training charagram models and pre-trained models from 
this EMNLP16 paper (developed by 
John Wieting)  [
link]
			
			Who-did-What reading comprehension dataset from 
this EMNLP16 paper  [
link]
			
			Resources for commonsense knowledge representation from 
ACL16 paper  [
link]
			
			Code for training paragram phrase embeddings and other models from 
ICLR16 paper (developed by 
John Wieting)  [
link]
			
			Pretrained paragram word embeddings and annotated phrase similarity datasets (developed by 
John Wieting)  [
link]
			
			Rampion, a framework for training statistical machine translation models 
			 [
link]
			
			Twitter part-of-speech tagger and tweets manually annotated with part-of-speech tags
			 [
link]
			
			NFL game data and aligned tweets
                         [
link]
			
			Code for performing inference for monolingual and bilingual gappy pattern models
			 [
link] [
sample patterns]
			
			Code to find trigger word pairs using mutual information (reimplementation of 
Rosenfeld, 1994)
			 [
code]
			
        		Factoid question-answer pairs from Wikipedia articles with difficulty ratings [
link]
			Scripts for performing bootstrap resampling for BLEU significance testing
                         [
link]