← Back to Dev Anand

Recreating Vanishing Languages Through AI Companions Who Speak Them

3 min read

What Language Loss Actually Means

The extinction of a language is not like the extinction of a species, though the metaphor is commonly used. A species that goes extinct takes with it a lineage of biological innovation — metabolic pathways, behavioral adaptations, ecological relationships — that cannot be recovered. A language that goes extinct takes with it a system of conceptual categorization, a set of grammatical structures that impose a specific shape on perception and expression, and typically a body of oral literature and technical knowledge that was formulated in and through that language's particular capacities. What makes languages irreplaceable is not sentiment. It is the specificity of what each one makes possible. Every language has features that no other language has — particular ways of marking time, of distinguishing categories of relationship, of encoding spatial orientation or evidentiality (the grammatical requirement to specify how you know what you are asserting). When a language dies, the conceptual resources it made available die with it. Languages currently dying include many that have never been fully documented. There are no complete grammars, no comprehensive dictionaries, no extensive recorded corpora. In some cases there are only a handful of recordings of a single speaker, made in the final years of that speaker's life, often by researchers who did not fully understand what they were capturing.

The Revitalization Gap

Language revitalization — the effort to bring a critically endangered or dormant language back into active community use — is well-studied and shows that success is possible under specific conditions. Welsh, Hebrew, and Maori provide the canonical examples: languages that were severely diminished and have been substantially recovered through deliberate community and governmental effort over decades. The conditions for success are demanding. Revitalization requires committed communities, institutional support, economic incentives for learning and using the language, and a critical mass of fluent speakers — or, in extreme cases, learners who reach fluency through immersive education and then raise children in the language. The Welsh, Israeli, and Maori cases all involved these conditions at sufficient scale. For languages with fewer than a hundred speakers, most of them elderly, the path to revitalization that meets these conditions does not currently exist. These languages face a different situation: not revitalization but survival in some form, and the question of what survival means when full community transmission is no longer possible.

Where AI Enters the Picture

An AI system trained on substantial recordings and documentation of an endangered language can do things that static archives cannot. It can generate new text in the language, maintaining grammatical coherence and appropriate vocabulary. It can respond to questions in the language, providing practice interaction for learners who have no living interlocutors. It can serve as a reference system that speaks the language rather than merely describing it. Researchers at the MIT Media Lab's Language Preservation Initiative have worked with several language communities to develop AI tools built on existing documentation. Their assessment, published in the journal Language Documentation and Conservation, distinguishes between languages with substantial documentation — for which current AI tools can produce genuinely useful interactive resources — and languages with minimal documentation, for which AI cannot yet generate reliable output because the training data is insufficient. The gap is significant: languages with fewer than a few thousand hours of recorded speech and incomplete grammatical documentation are currently beyond the useful reach of AI-generated learning tools.

The Difference Between Dormant and Dead

An important distinction in language revitalization research is between dormant languages and dead ones. A dormant language — like Classical Hebrew before modern revitalization, or Cornish, which has undergone revival efforts — exists in substantial recorded form and in the cultural memory of a descended community. A dead language, in the stronger sense, has no descended community and no living memory. Most endangered languages today are more dormant than dead: they have communities of descent, living elder speakers, and some degree of cultural continuity even when daily use has diminished dramatically. For these languages, the AI tools currently being developed are most valuable not as replacements for human transmission but as supplements — enabling community members in diaspora to access the language, giving learners practice time between sessions with human speakers, and helping younger speakers develop facility before the elders are gone. A tangent worth noting: the decision about how a community's language is used in AI training raises the same data sovereignty questions that appear throughout indigenous knowledge preservation. The recordings and texts that form the training data belong, in a meaningful sense, to the communities that produced them. Whether those communities have agreed to their use, and whether they govern the AI system that results, are questions with significant ethical and practical implications that the research community has only recently begun to engage with seriously.

What Fluency Actually Requires

Language learning research consistently shows that reaching conversational fluency requires substantially more than exposure to vocabulary and grammar rules. It requires immersive use — extended periods of production and comprehension in the target language — and social feedback: responses from interlocutors who can signal comprehension failure, model correct usage, and provide the kind of nuanced correction that formal instruction cannot always replicate. AI can provide some of this. It can offer extended conversational practice in a target language, maintain coherent context across an interaction, and respond in ways that model fluent usage. What it currently cannot do is provide the culturally embedded social context that gives a language its full communicative function — the sense of using the language with the community for which it was developed. The goal of AI in language preservation is not to solve the problem of endangered languages. The problem is political and social and economic, not technological. The goal is to reduce the friction that stands between communities and their linguistic heritage, and to extend the window in which full revitalization remains possible.

Chat with Echo
Post on X Facebook Reddit