Publications and presentations from tinlab are listed below. Lab members are highlighted.
A systematic framework for generating novel experimental hypotheses from language models
arXiv
RExBench: Can coding agents autonomously implement AI research extensions?
ACL 2026
Are they lovers or friends? Evaluating LLMs’ Social Reasoning in English and Korean Dialogue
ACL 2026
Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity
ICLR 2026
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
NeurIPS 2025
CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists
EMNLP 2025
Mechanistic Understanding of Entity Tracking in Natural Language involving Multiple Operations
NEMI 2025
Is analogy enough to draw novel adjective-noun inferences?
SCiL 2025
Implicit mechanisms for symbol manipulation in RNNs
NENLP 2025
Transformers Struggle to Learn to Search Without In-context Exploration
ICLR 2025
Fake reefs are sometimes reefs and sometimes not, but are always compositional
ELM 3 (2025)
Semantic Training Signals Promote Hierarchical Syntactic Generalization in Neural Networks
EMNLP 2024
Code Pretraining Improves Entity Tracking Abilities of Language Models
arXiv
Personas as a Way to Model Truthfulness in Language Models
EMNLP 2024
Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don’t mimic the full human distribution
GenBench @ EMNLP 2024 👑 Best paper award
Structural Generalization of Modification in Adult Learners of an Artificial Language
CogSci 2024
Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation
AIES 2024
Syn-(QA)^2: Evaluating False Assumptions in Long-tail Questions with Synthetic QA Datasets
arXiv
Abstraction via exemplars? A representational case study on lexical category inference in BERT
BUCLD 47 (2023)
SLOG: A Structural Generalization Benchmark for Semantic Parsing
EMNLP 2023
Inverse scaling can become U-shaped
EMNLP 2023
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
NeurIPS 2023
BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information
NeurIPS 2023 (Datasets and Benchmarks)
Inverse Scaling: When Bigger Isn’t Better
TMLR 2023 👑 Featured Certification
Finding Structure in One Child’s Linguistic Experience
Cognitive Science 2023
(QA)^2: Question Answering with Questionable Assumptions
ACL 2023
Entity Tracking in Language Models
ACL 2023 👑 Area Chair Award
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language
ACL 2023
Reconstruction Probing
Findings of ACL 2023
Uncontrolled Lexical Exposure Leads to Overestimation of Compositional Generalization in Pretrained Models
arXiv (2022)
Compositional Linguistic Generalization in Artificial Neural Networks
PhD Dissertation, Johns Hopkins University (2021)
Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering
ACL 2021
Testing for Grammatical Category Abstraction in Neural Language Models
SCiL 2021
COGS: A Compositional Generalization Challenge Based on Semantic Interpretation
EMNLP 2020
Implicit Discourse Relation Classification: We Need to Talk About Evaluation
ACL 2020
Maximize presupposition and the Korean demonstrative ku
LSA 2020
Compositionality as Directional Consistency in Sequential Neural Networks
NeurIPS 2019 Workshop on Context and Compositionality
Probing What Different NLP Tasks Teach Machines About Function Word Comprehension
*SEM 2019 👑 Best Paper Award
How to Get Past Sesame Street: Sentence-Level Pretraining Beyond Language Modeling
ACL 2019
Automatic Scoring of Semantic Fluency
Frontiers in Psychology (2019)
Predicting the Argumenthood of English Prepositional Phrases
AAAI 2019
Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and Cognition
Interspeech 2018
Enhanced Sign Language Transcription System via Hand Tracking and Pose Estimation
Journal of Computing Science and Engineering, vol 10.3
A Morphological Approach to the Longitudinal Detection of Dementia
HCI Korea 2016