Date of Submission

6-2025

Document Type

Thesis

Department

Engineering and Applied Science Education

Advisor

Vahid Behzadan, Ph.D.

Committee Member

Khaled Sayed, Ph.D.

Committee Member

Muhammad Aminul Islam, Ph.D.

Committee Member

Binod Bhattarai, Ph.D.

LC Subject Headings

Artificial intelligence, Cross-language information retrieval, Multilingual communication, Misinformation, Cyber intelligence (Computer security)

Abstract

Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating natural language, excelling in tasks such as question-answering, text generation, and classification. However, these models are predominantly monolingual, exhibiting limited proficiency in non-English and low-resource languages. Despite their problem-solving abilities and potential to enhance productivity, LLMs pose substantial risks by generating harmful, toxic, biased, or nonfactual content. Furthermore, even robust proprietary models have been susceptible to jailbreaks through sophisticated prompt engineering, particularly leveraging low-resource languages. This dissertation investigates the critical intersection of multilingual capabilities and security vulnerabilities in LLMs. Through five interconnected studies, I address these dual challenges by introducing novel methodologies for enhancing multilingual performance while systematically exposing and mitigating security vulnerabilities in multilingual contexts.

First, I presented Translation-Assisted Chain-of-Thought (TaCo), demonstrating how curriculum learning and translation processes can efficiently adapt instruction-tuned models to low-resource languages without extensive pretraining. Subsequently, I investigated three novel attack vectors, the "Sandwich Attack" exploiting multi-language mixtures, the "Working Memory Attack" targeting contextual processing limitations, and the "Tongue-Tied" attack revealing how fine-tuning with new languages compromises safety alignments. Finally, I introduced "X-Guard," a transparent multilingual safety agent that effectively defends against sophisticated language-based attacks.

Key findings reveal that cross-lingual knowledge transfer can be significantly enhanced through translation-assisted processes, while safety mechanisms in LLMs remain language-dependent and particularly vulnerable in low-resource contexts. Empirical evaluation across commercial and open-source models demonstrate both the severity of multilingual vulnerabilities and the effectiveness of proposed defensive approaches. This research advances our understanding of multilingual LLMs’ capabilities and limitations while providing practical frameworks for developing safer, more inclusive language technologies that can serve diverse linguistic communities while minimizing potential harm.

Recommended Citation

Upadhayay, Bibek, "Efficient and Robust Language Adaptation in Multilingual LLMs" (2025). Doctoral Works at the University of New Haven. 66.
https://digitalcommons.newhaven.edu/dissertations/66

Download

COinS

Doctoral Works at the University of New Haven

Efficient and Robust Language Adaptation in Multilingual LLMs

Date of Submission

Document Type

Department

Advisor

Committee Member

Committee Member

Committee Member

LC Subject Headings

Abstract

Recommended Citation

Search

Browse

Author Corner

Links

Library Link

Doctoral Works at the University of New Haven

Efficient and Robust Language Adaptation in Multilingual LLMs

Author

Date of Submission

Document Type

Department

Advisor

Committee Member

Committee Member

Committee Member

LC Subject Headings

Abstract

Recommended Citation

Share

Search

Browse

Author Corner

Links

Library Link