Like living organisms, languages evolve, and languages that come from the same ancestor (called “proto-language”) are part of the same language family. A language family can be subdivided into several subfamilies: for example, Polish and Slovak are both West Slavic languages, a subdivision of the Slavic languages, which are a branch of the larger Indo-European family.

Comparative linguistics, as the name implies, compares languages in order to establish their historical relatedness. This can be done by comparing their phonology, grammar and vocabulary, even in cases where there are no written accounts of their ancestors.

The farther languages are from each other, the more difficult it can be to determine if there's a genetic relationship between them. For example, no linguist doubts that Spanish and Italian are related, but the existence of the Altaic family (which would include Turkish and Mongolian) is controversial and not accepted by all linguists. At present it is simply impossible to know if all languages come from a common ancestor. If an original human language did exist, it would have been spoken tens of thousands of years ago (if not more). This makes comparisons extremely difficult or even impossible to perform.

Map of the main language families of the world

Source: Wikimedia Commons

List of language families

Linguists have identified more than a hundred primary language families (language families that are not known to be related to each other). Some of them include only a few languages, others more than a thousand. Here are some of the main language families of the world.

Language family Area Languages
Indo-European Europe to India, nowadays all the continents More than 400 languages spoken by almost 3 billion people. They include the Romance languages (Spanish, Italian, French…), Germanic languages (English, German, Swedish…), Baltic and Slavic languages (Russian, Polish…), Indo-Aryan languages (Persian, Hindi, Kurdish, Bengali and many languages spoken from Turkey to Northern India) as well as some other languages such as Greek and Armenian.
Sino-Tibetan Asia Chinese languages, Tibetan and Burmese
Niger-Congo Sub-Saharan Africa Swahili, Yoruba, Shona, Zulu
Afroasiatic Middle East, North Africa Semitic languages (Arabic, Hebrew…), Somali
Austronesian Southeast Asia, Taiwan, Pacific, Madagascar More than 1,000 languages, including Indonesian, Filipino, Malagasy, Hawaiian, Fijian…
Uralic Central, Eastern and Northern Europe, North Asia Hungarian, Finnish, Estonian, Sami languages, some languages of Russia (Udmurt, Mari, Komi…)
Altaic (controversial) Turkey to Siberia Turkic languages (Turkish, Kazakh…), Mongolic languages (Mongolian…), Tungusic languages (Manchu…), some proponents even include Japanese and Korean
Dravidian South India Tamil, Malayalam, Kannada, Telugu
Tai–Kadai Southeast Asia Thai, Lao
Austroasiatic Southeast Asia Vietnamese, Khmer
Na-Dene North America Tlingit, Navajo
Tupian South America Guarani
Caucasian (disputed) Caucasus Three families. The most spoken Caucasian language is Georgian

Special cases

Language isolates

A language isolate is an “orphan”: a language that hasn’t been proven to belong to a known language family. The best example is the Basque language, spoken in Spain and France. Even though it is surrounded by Indo-European languages, it is very different from them. Linguists have compared Basque and other languages spoken in Europe, the Caucasus and even America, but no relationship has ever been demonstrated.

Korean is another well-known isolate, although some linguists have proposed a relationship with the Altaic languages or Japanese. Japanese itself is sometimes considered an isolate, but it is best described as belonging to the small Japonic family, which includes a few related languages such as Okinawan.

Pidgins and creoles

A pidgin is a simplified communication system that develops between two or more groups that do not have a language in common. It doesn't come directly from a single language, but can be made of features from several languages. When children begin to learn a pidgin as a native language, it becomes a full-fledged, stable language called a creole.

Most pidgins and creoles spoken today are the result of colonization and are based on English, French or Portuguese. One of the most spoken creoles is Tok Pisin, an official language of Papua New Guinea. It is based on English, but its grammar is different and its vocabulary includes words borrowed from German, Malay, Portuguese and several local languages.