Yekyung Kim

I am a first year Ph.D candidate at UMass NLP, advised by Mohit Iyyer with a research focus on natural language processing.

Before joining UMass, I worked at Hyundai Motors Group and LG Electronics as a research engineer. I was fortunate to be selected as a specialist in AI and researched at CMU LTI as a visiting scientist mentored by Jaime Carbonell.

Email  /  Google Scholar  /  Github

profile photo
Research

My research goal is to build an efficient and trustworthy system that connects humans and machines leveraging language as a bridge. I am interested in systems that make reliable communication between humans and machines and learn knowledge with as little human supervision as possible. In particular, my research directions include:

1) Efficiency: Not all data are equally valuable in the real world for learning. How do we accurately calibrate the importance of data? How can we adjust models to new distribution resources efficiently?

2) Generalization and Robustness: Models prefer to learn by shortcuts, but often face unexpected wild inputs. How can we detect various out-of-distribution data and make models generalized better from limited data?

3) Trustfulness: Logical reasoning is a human’s hallmark capability that enables reliable communication. How can we control the model to give reliable answers expressing uncertainty? Is it possible for models to solve logic puzzles (ex., Einstein's Riddle) by comprehending problems and utilizing logical deduction? How can we teach models to reason logically with human guidance?

LINDA: Unsupervised Learning to Interpolate in Natural Language Processing
Yekyung Kim, Seohyeong Jeong, Kyunghyun Cho

code

We propose an unsupervised learning approach to text interpolation for the purpose of data augmentation that does not require any heuristics nor manually crafted resources but learns to interpolate between any pair of natural language sentences over a natural language manifold.

A Universal Framework for Dataset Characterization with Multidimensional Meta-information
Jaehyung Kim Yekyung Kim, Karin Johanna Denton de Langis, Jinwoo Shin, Dongyeop Kang
ACL 2023
code

We propose a data-centric framework to construct a new feature space that can capture various characteristics of datasets and novel sampling method to select a set of data points that maximizes the group informativeness.

Meta-Crafting: Improved Detection of Out-of-distributed Texts via Crafting Metadata Space
Ryan Koo, Yekyung Kim, Dongyeop Kang, Jaehyung Kim
AAAI 2024 Student Abstract and Poster Program (to appear)

We propose Meta-Crafting: a unified method that can capture both background and semantic shifts. Specifically, we construct a new discriminative feature space to detect OOD samples

Deep Active Learning for Sequence Labeling Based on Diversity and Uncertainty in Gradient
Yekyung Kim
Workshop on Life-long Learning for Spoken Language Systems at AACL, 2021

I demonstrate that the amount of labeled training data can be reduced using active learning when it incorporates both uncertainty and diversity in the sequence labeling task.

Learning Sub-Character level representation forKorean Named Entity Recognition
Yejin Kim, Yekyung Kim (equal contributions)
The International FLAIRS Conference Proceedings , 2020

We propose a improved unigram-level Korean NER model with sub-character level representation, jamo, which can represent a unique linguistic structure of Korean and its syntactic properties and morphological variations.

#Nowplaying the Future Billboard: Mining Music Listening Behaviors of Twitter Users for Hit Song Prediction
Yekyung Kim, Bongwon Suh, Kyogu Lee
Workshop on Social Media Retrieval and Analysis (SoMeRA) at SIGIR , 2014

We collect users' music listening behavior from Twitter using music-related hashtags (eg,# nowplaying). We then build a predictive model to forecast the Billboard rankings and hit music.

A Visual Analytics Approach to Summarizing Tweets
Ramik Sadana, Yekyung Kim , Bongwon Suh, Eunyee Koh
Industry day at SIGIR , 2014

We build on key principles of visual analytics and describe an end-to-end, visual exploration system for tweets that both presents overall summaries and supports analysis of any variations that exists in the activity

Industry Project
Airstar - Incheon Airport Robot, LG Electronics
AI assistant for car, Hyundai
Chatbot for home-appliances, LG Electronics