Siri-ously biased: Computer science major examines language models showing negativity toward LGBTQ+ terms

While studying at his Vietnamese high school, Bang Nguyen ’22 wanted to pursue a liberal arts education in the U.S. that empowered him to combine his interests in math and social sciences. At The College of Wooster, he bridged the gap between the two, applying his technical skills as a computer science major and infusing the social sciences with minors in statistical and data sciences and communication studies.

“My research looks at the deeper layout of how technology replicates social biases against the LGBTQ+ community. Sometimes we are hurting members of queer identities, and it’s replicated in these technologies that we use every day.”

—Bang Nguyen ’22, I.S. Title: Queering NLP: A non-heteronormative approach to quantifying and investigating sentiment bias against LGBTQ+ identities in word embeddings

“Wooster was really generous with its financial aid package, but what resonated with me most was what I heard about the Independent Study and mentored research,” said Nguyen. “The interdisciplinary perspective has been helpful, and it culminates with my I.S. interest in computer science.”

Nguyen’s I.S. project examines the social issues within natural language processing (NLP) technologies that attempt to generate and understand human language. “My research looks at the deeper layout of how technology replicates social biases against the LGBTQ+ community,” said Nguyen. “Sometimes we are hurting members of queer identities, and it’s replicated in these technologies that we use every day.”

NLP is used in machines—including virtual assistants like Apple’s Siri and Amazon’s Alexa—to understand and respond to text or voice data. Their responses are based on a language model that converts each word in human language to a numerical vector (a word embedding) for the computer/machine to understand.

Twitter and other platforms can prevent certain posts from going out if the technology identifies the word embeddings as toxic material. Nguyen said that society—and language—operates with a world view that promotes heterosexuality as the normal or preferred sexual orientation, so when this bias is captured in word embeddings, words like ‘gay’ and ‘lesbian’ often get flagged as toxic.

“I looked at five specific identities in my research including lesbian, gay, bisexual, transgender, and queer,” said Nguyen. “I found biases against all five, and the word embeddings typically associate these with more negative words. ‘Gay’ for example, is close to weird, ugly, dumb, stupid, and pervert, among other terms.”

He compared an existing language model trained on data pre-2015 with embeddings trained by him on data post-2015 to see whether time has normalized the bias against the LGBTQ+ community. Nguyen found the older model to contain more extreme bias while the newer model showed more inclusivity. He called that progress a reflection of social change and the LGBTQ+ experience. “It’s interesting that when machines learn these things, they replicate whatever the human does,” said Nguyen. “Do we want to use these biases, or can we find ways to remove them computationally?”

Kowshik Bhowmik, visiting instructor of computer science, mentored Nguyen throughout the study. He pointed out that formulating the research question was tricky, but his mentee came up with two hypotheses that captured the goal of the work very well.

“Existing research in the field mostly looks at bias from a heteronormative point of view, such as how are biases regarding genders (male/female) embedded in existing word embeddings,” said Bhowmik. “Bang, on the other hand, investigates whether words with negative sentiments are more closely related to LGBTQ+ terms than words with positive sentiments.”

This I.S. project wasn’t Nguyen’s first venture into research. He invested in experiential learning opportunities across two summers in the Applied Methods and Research Experience (AMRE) program. AMRE gives Wooster students the ability to become consultants for local companies. First, he worked as a consultant for a STEM success initiative at the College, analyzing data on students who took STEM courses. The second year, Nguyen worked on quality control processes for The Goodyear Tire & Rubber Company. His team proposed a new system that uses machine learning to show how Goodyear can better control tires.

Between the I.S. and AMRE experiences, Nguyen recognizes that he built transferable research and writing skills that he’ll use in grad school and his career. He also knows the importance of clearly communicating tech knowledge.

“I want to make science more accessible,” said Nguyen. “Science is already complicated enough, and when you add social issues on top of that it gets worse.” Nguyen plans to focus his studies on fairness and non-discrimination in machine learning this fall when he joins the University of Notre Dame as a Ph.D. student in computer science and engineering.

Posted in Independent Study on July 15, 2022.

Helpful Links

Info For

Locations

Guides

Siri-ously biased: Computer science major examines language models showing negativity toward LGBTQ+ terms

Related Posts

Neuroscience major researches effects of drug and alcohol use on young brain development

Biology and sociology major assesses PFAS risk for various demographic groups

Biology major broadens animal knowledge with post-reproductive lifespan research in vertebrates

Related Areas of Study

Statistical & Data Sciences

Computer Science

Communication Studies

Connect with Wooster

What can we help you find?

Helpful Links

Info For

Locations

Guides

Siri-ously biased: Computer science major examines language models showing negativity toward LGBTQ+ terms

Related Posts

Neuroscience major researches effects of drug and alcohol use on young brain development

Biology and sociology major assesses PFAS risk for various demographic groups

Biology major broadens animal knowledge with post-reproductive lifespan research in vertebrates

Related Areas of Study

Statistical & Data Sciences

Computer Science

Communication Studies

Connect with Wooster