Date of Submission
6-2025
Document Type
Thesis
Department
Engineering and Applied Science Education
Advisor
Vahid Behzadan, Ph.D.
Committee Member
Khaled Sayed, Ph.D.
Committee Member
Muhammad Aminul Islam, Ph.D.
Committee Member
Binod Bhattarai, Ph.D.
LC Subject Headings
Artificial intelligence, Cross-language information retrieval, Multilingual communication, Misinformation, Cyber intelligence (Computer security)
Abstract
Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating natural language, excelling in tasks such as question-answering, text generation, and classification. However, these models are predominantly monolingual, exhibiting limited proficiency in non-English and low-resource languages. Despite their problem-solving abilities and potential to enhance productivity, LLMs pose substantial risks by generating harmful, toxic, biased, or nonfactual content. Furthermore, even robust proprietary models have been susceptible to jailbreaks through sophisticated prompt engineering, particularly leveraging low-resource languages. This dissertation investigates the critical intersection of multilingual capabilities and security vulnerabilities in LLMs. Through five interconnected studies, I address these dual challenges by introducing novel methodologies for enhancing multilingual performance while systematically exposing and mitigating security vulnerabilities in multilingual contexts.
First, I presented Translation-Assisted Chain-of-Thought (TaCo), demonstrating how curriculum learning and translation processes can efficiently adapt instruction-tuned models to low-resource languages without extensive pretraining. Subsequently, I investigated three novel attack vectors, the "Sandwich Attack" exploiting multi-language mixtures, the "Working Memory Attack" targeting contextual processing limitations, and the "Tongue-Tied" attack revealing how fine-tuning with new languages compromises safety alignments. Finally, I introduced "X-Guard," a transparent multilingual safety agent that effectively defends against sophisticated language-based attacks.
Key findings reveal that cross-lingual knowledge transfer can be significantly enhanced through translation-assisted processes, while safety mechanisms in LLMs remain language-dependent and particularly vulnerable in low-resource contexts. Empirical evaluation across commercial and open-source models demonstrate both the severity of multilingual vulnerabilities and the effectiveness of proposed defensive approaches. This research advances our understanding of multilingual LLMs’ capabilities and limitations while providing practical frameworks for developing safer, more inclusive language technologies that can serve diverse linguistic communities while minimizing potential harm.
Recommended Citation
Upadhayay, Bibek, "Efficient and Robust Language Adaptation in Multilingual LLMs" (2025). Doctoral Works at the University of New Haven. 66.
https://digitalcommons.newhaven.edu/dissertations/66