Date of Submission

6-2025

Document Type

Thesis

Department

Engineering and Applied Science Education

Advisor

Vahid Behzadan, Ph.D.

Committee Member

Khaled Sayed, Ph.D.

Committee Member

Muhammad Aminul Islam, Ph.D.

Committee Member

Binod Bhattarai, Ph.D.

LC Subject Headings

Artificial intelligence, Cross-language information retrieval, Multilingual communication, Misinformation, Cyber intelligence (Computer security)

Abstract

Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating natural language, excelling in tasks such as question-answering, text generation, and classification. However, these models are predominantly monolingual, exhibiting limited proficiency in non-English and low-resource languages. Despite their problem-solving abilities and potential to enhance productivity, LLMs pose substantial risks by generating harmful, toxic, biased, or nonfactual content. Furthermore, even robust proprietary models have been susceptible to jailbreaks through sophisticated prompt engineering, particularly leveraging low-resource languages. This dissertation investigates the critical intersection of multilingual capabilities and security vulnerabilities in LLMs. Through five interconnected studies, I address these dual challenges by introducing novel methodologies for enhancing multilingual performance while systematically exposing and mitigating security vulnerabilities in multilingual contexts.

First, I presented Translation-Assisted Chain-of-Thought (TaCo), demonstrating how curriculum learning and translation processes can efficiently adapt instruction-tuned models to low-resource languages without extensive pretraining. Subsequently, I investigated three novel attack vectors, the "Sandwich Attack" exploiting multi-language mixtures, the "Working Memory Attack" targeting contextual processing limitations, and the "Tongue-Tied" attack revealing how fine-tuning with new languages compromises safety alignments. Finally, I introduced "X-Guard," a transparent multilingual safety agent that effectively defends against sophisticated language-based attacks.

Key findings reveal that cross-lingual knowledge transfer can be significantly enhanced through translation-assisted processes, while safety mechanisms in LLMs remain language-dependent and particularly vulnerable in low-resource contexts. Empirical evaluation across commercial and open-source models demonstrate both the severity of multilingual vulnerabilities and the effectiveness of proposed defensive approaches. This research advances our understanding of multilingual LLMs’ capabilities and limitations while providing practical frameworks for developing safer, more inclusive language technologies that can serve diverse linguistic communities while minimizing potential harm.

Share

COinS