Identifying European languages

Spending roughly 50% of my waking life playing GeoGuessr over the last few months has meant I’ve become, in my opinion, fairly alright at recognising written languages at a glance in order to work out where I am. This doesn’t mean speaking them — my French is alright although it has lost a bit of fluency over the years, but I don’t speak any other language (except English) above a very basic level. The skill is being able to see a road sign, place name or shop front, identify the language, and hence pinpoint the country you’re in. This is often difficult, because most road signs, place names and shop fronts don’t contain more than a few words. If you’re lucky enough to be faced with a block of text on a billboard for example, that makes things a bit easier, but you still need to grasp what a given language generally looks like so you can over-confidently go “aha! Estonian!”.

I thought I’d use this post to collate what I know so far about identifying written European languages, starting with an overview of how the languages relate to each other in the grand scheme of things. I may branch out to other continents at some point, but for people who speak English, other European languages are probably a good starting point. Most English-speaking people, I assume, already have a decent idea of what some European languages look like, even if they don’t speak them. At least in the UK, I think a fairly large proportion of people know when they’re looking at French, German, Spanish, Italian or Greek, and possibly even Portuguese, Dutch or Polish. This is roughly where I was at, but there was a lot I didn’t know. I knew when a language looked ‘Nordic’ or ‘Eastern European’ but I couldn’t really distinguish the languages within these vague groups. I didn’t know ‘Baltic’ languages were a thing, or what Albanian or Maltese looked like. I even used to think Hungarian was a Slavic language.

I’m going to cover most major European languages. I will leave aside the Celtic languages because, whilst some are still living, I’ve only really had to identify Welsh and Irish when playing GeoGuessr. If I cover those languages, I may as well also do Scottish Gaelic, Cornish, Manx and Breton at the same time, and I currently know precisely nothing about those. I will also exclude the languages of some countries often considered to be on the cusp of Europe and Asia; Turkey, Georgia, Armenia and Azerbaijan. This is only because it makes sense to group Turkish and Azerbaijani with other Turkic languages in a separate post concerning Asia, and it also makes geographical sense to group Georgia and Armenia (two ‘standalone’ languages) with these other two. I’m not going to delve into dialectical variations of single languages either, for example German versus Swiss German, or relatively minor languages such as Yiddish, because my brain is too small.

Two disclaimers: firstly, I don’t speak a word of most of these languages, so these are just my ramblings about how they appear to me on paper, and any stereotypes I come up with are likely to be incorrect, or only useful in the setting of GeoGuessr. It also means that when I whack a phrase through Google Translate to give a sample of the language, the end result might not make any sense. Secondly, I am not a linguist, so quite a lot of this post (it’s hard to estimate — 60%?) is bound to be wrong.

With all that said, allons-y.

An overview

We can see that most European languages are descended from Proto-Indo-European, a language that is thought to have been spoken between about 4500 and 2500 BC at the junction of Europe and Asia. This tongue split into different groups, which themselves split again, in a process which is still continuing today. Hungarian, Finnish and Estonian are descended from a completely separate ancestor, Proto-Uralic. Proto-Uralic was probably spoken roughly around the same time as Proto-Indo-European, but further northeast in the Ural Mountains. I haven’t included Maltese in the diagram as it fits into neither of these broad families, being a Semitic language with Proto-Afroasiatic as its ancestor (which is a different topic altogether). There seem to be a disproportionate amount of South Slavic languages, although technically Serbian, Croatian, Bosnian and Montenegrin can all be classified under the umbrella of ‘Serbo-Croatian’, of which four different standard varieties are spoken.

Upon studying the tree, you might realise that a few things surprise you. Finnish isn’t at all related to Danish, Norwegian or Swedish. Hungarian, genetically, has absolutely nothing to do with Czech or Slovak. Romanian is a closer relative of Spanish than it is of Bulgarian (admittedly there is a clue in its name). Or maybe that doesn’t surprise you and you should have skipped this entire section.

However, just because two languages aren’t related doesn’t mean they can’t share similar features — speakers of two unrelated languages in close geographical proximity to one another can pick up features of the other language. This is known as ‘language contact’, and tends to concern vocabulary in particular, although other linguistic aspects can be affected too. I think of the difference between genetic and contact linguistics as being similar to the nature-nurture debate in biology. Interestingly, there is some debate about the Italic and Celtic languages potentially having a more recent common ancestor (Italo-Celtic) due to some obvious similarities between them once you start looking, but again, this could just be due to language contact.

Below is a map of how the main language groups are distributed geographically throughout Europe, based on each country’s most spoken language.

Hungarian is now very distinct from Finnish and Estonian, but eight colours fit quite nicely in the legend and I’m all about aesthetics, so I grouped them together under Finno-Ugric. Hope that’s okay. Malta is the little grey Semitic dot just south of Italy.

Now let’s move on to the individual languages.

The Romance languages

French: Une langue est un système de communication structuré utilisé par les humains, comprenant la parole, les gestes et l’écriture. La plupart des langues ont un système d’écriture composé de glyphes pour inscrire le son ou le geste d’origine et sa signification. If a word ends in a vowel, it’s probably an e. Plenty of shortened articles like l’ and d’ before words beginning with vowels.

Spanish: Un idioma es un sistema estructurado de comunicación utilizado por los seres humanos, que incluye el habla, los gestos y la escritura. La mayoría de los idiomas tienen un sistema de escritura compuesto por glifos para inscribir el sonido o gesto original y su significado. Vowels at the end of words are often a or o. ‘And’ is y and ‘or’ is o. If you see ñ, you’re looking at Spanish.

Italian: Una lingua è un sistema strutturato di comunicazione utilizzato dagli esseri umani, inclusi la parola, i gesti e la scrittura. La maggior parte delle lingue ha un sistema di scrittura composto da glifi per iscrivere il suono o il gesto originale e il suo significato. Most words end in vowels. There are a lot of double consonants, like tt, cc, zz and ll. If you see il on its own, you’re looking at Italian.

Portuguese: Uma linguagem é um sistema estruturado de comunicação usado por humanos, incluindo fala, gestos e escrita. A maioria das línguas tem um sistema de escrita composto de glifos para inscrever o som ou gesto original e seu significado. A lot of words end in -ão. In Portuguese, nh is the equivalent of the Spanish ‘ñ’. I feel like Portuguese seems to use the letter m more than Spanish. ‘And’ is e and ‘or’ is ou.

Romanian: Un limbaj este un sistem structurat de comunicare folosit de oameni, incluzând vorbirea, gesturile și scrierea. Majoritatea limbilor au un sistem de scriere compus din glife pentru a inscrie sunetul sau gestul original și semnificația acestuia. Romanian is a Romance language which has had a fair amount of Slavic influence due to geographic proximity. A lot of words end in ea. The letter ă crops up a lot, as do ș and ț. A common word is din, as in ‘Dragostea Din Tei’, which means ‘of’ or ‘from’. I always think Romanian actually looks closer to Latin than the other Romance languages, but Wikipedia tells me it’s only the third closest relative. Such is life.

The Germanic languages

German: Eine Sprache ist ein strukturiertes Kommunikationssystem, das vom Menschen verwendet wird, einschließlich Sprache, Gesten und Schreiben. Die meisten Sprachen haben ein Schriftsystem, das aus Glyphen besteht, um den Originalton oder die Geste und ihre Bedeutung zu beschreiben. German features a lot of sch. Ein is an indefinite article. The letters ä, ö and ü are all seen, as is ß, which represents ‘ss’. German has a habit of running several nouns together to form one unnecessarily long one.

Dutch: Een taal is een gestructureerd communicatiesysteem dat door mensen wordt gebruikt, inclusief spraak, gebaren en schrijven. De meeste talen hebben een schrijfsysteem dat bestaat uit tekens om het oorspronkelijke geluid of gebaar en de betekenis ervan te beschrijven. Dutch may be confused with German but there are some obvious differences. Een is the equivalent of ‘ein’. In fact, double vowels are common, like aa, ee and oo. There is a lot of ij and k. Dutch also doesn’t have any diacritics (signs added to letters), unlike German.

Danish: Et sprog er et struktureret kommunikationssystem, der bruges af mennesker, herunder tale, gestus og skrivning. De fleste sprog har et skrivesystem sammensat af tegn for at indskrive den originale lyd eller gestus og dens betydning. In Danish, og means ‘and’. Danish also features the letters æ, ø and å.

Norwegian: Et språk er et strukturert kommunikasjonssystem brukt av mennesker, inkludert tale, gester og skriving. De fleste språk har et skrivesystem sammensatt av tegn for å skrive den originale lyden eller gesten og dens betydning. Norwegian is visually extremely similar to Danish, and in fact they use the same alphabet, so good luck distinguishing them. Again, og means ‘and’, and you’ll see æ, ø and å. Apparently spoken Norwegian sounds much closer to Swedish than it does to Danish, though.

Swedish: Ett språk är ett strukturerat kommunikationssystem som används av människor, inklusive tal, gester och skrivning. De flesta språk har ett skrivsystem som består av tecken för att skriva in det ursprungliga ljudet eller gesten och dess betydelse. Swedish looks fairly different to Danish and Norwegian. It does not feature ‘æ’ or ‘ø’, and instead makes use of å, ä and ö, reflecting the influence of German on the written language. In contrast to the previous two, in Swedish, ‘and’ is och, rather than ‘og’.

Icelandic: Tungumál er skipulagt samskiptakerfi notað af mönnum, þar með talað, látbragð og ritun. Flest tungumál hafa ritkerfi sem samanstendur af táknum til að skrifa upp á upprunalega hljóðið eða látbragðið og merkingu þess. Icelandic has retained a couple of letters from Old Norse, ð and þ, which both represent a ‘th’ sound. It is one of very few languages to use ð and the only one to use þ, so if you see these characters then Icelandic is a safe bet.

The Slavic languages

The West Slavic languages (Polish, Czech and Slovak) are written in Latin script, and the East Slavic languages (Russian, Ukranian and Belarusian) are written in Cyrillic. South Slavic orthography can vary. Serbo-Croatian is written in both scripts in Bosnia, Serbia and Montenegro, but is only written in the Latin alphabet in Croatia. Slovenian is only written in Latin script. Bulgarian and Macedonian are written in both. Following? Good. To keep it simple, I’ll demonstrate Serbo-Croatian in both of the scripts that it uses (Serbian Cyrillic and Gaj’s Latin alphabet). I’ll write Bulgarian and Macedonian in Cyrillic because I feel like it’s more commonly used.

West Slavic

Polish: Język to ustrukturyzowany system komunikacji używany przez ludzi, obejmujący mowę, gesty i pisanie. Większość języków ma system pisma złożony z glifów, które wpisują oryginalny dźwięk lub gest i jego znaczenie. Polish is the easiest of the Latin Slavic languages to identify. It has a lot of consonant clusters that include z, like szcz. The letter ł only appears in Polish. Polish also features ą and ę, and words often end in -ów or -ie. Polish doesn’t use the caron (the inverted hat) seen in other Slavic languages, using ż in place of ‘ž’.

Czech: Jazyk je strukturovaný systém komunikace používaný lidmi, včetně řeči, gest a psaní. Většina jazyků má systém psaní složený z glyfů, které umožňují popsat původní zvuk nebo gesto a jeho význam. Czech has two unique characters, ř and ů.

Slovak: Jazyk je štruktúrovaný systém komunikácie používaný ľuďmi, vrátane reči, gest a písma. Väčšina jazykov má systém písania zložený z glyfov, ktorý umožňuje vpísať pôvodný zvuk alebo gesto a ich význam. Slovak appears extremely similar to Czech, but also has two unique characters within this group, ä and ŕ. Slovak words end in -ov where Czech words end in ‘ů’.

South Slavic

Serbo-Croatian (Latin): Jezik je strukturirani sustav komunikacije koji ljudi koriste, uključujući govor, geste i pisanje. Većina jezika ima sustav pisanja sastavljen od glifa za upis izvornog zvuka ili geste i njenog značenja. The Latin sample is written in the Croatian variant of Serbo-Croatian specifically, but I’m too much of a philistine to go into the differences between each variant. The unique character that you might see in Serbo-Croatian is đ. The frequency of the letter j is greater in Serbo-Croatian and Slovenian than in other Slavic languages.

Serbo-Croatian (Cyrillic): Језик је структурирани систем комуникације који користе људи, укључујући говор, кретње и писање. Већина језика има систем писања састављен од глифа за уписивање изворног звука или геста и његовог значења. The Cyrillic sample is written in the Serbian variant of Serbo-Croatian in particular. The unique characters in Serbo-Croatian when written in Cyrillic are ђ and ћ. Fortunately, ј is a character in the Cyrillic alphabet as well, so its common use in Serbo-Croatian still shows through. As above, j is common in both Serbo-Croatian and Slovenian, however Slovenian does not use the Cyrillic alphabet. As a result, if you see Cyrillic with a lot of јs, you’re probably looking at Serbo-Croatian written in Cyrillic. The version of Cyrillic found in Montenegro uses two additional characters: з́and с́.

Slovenian: Jezik je strukturiran sistem komunikacije, ki ga uporabljajo ljudje, vključno z govorom, kretnjami in pisanjem. Večina jezikov ima sistem za pisanje, sestavljen iz glifov, da vpiše izvirni zvok ali potezo in njen pomen. Unfortunately, Slovenian doesn’t have any identifying characters, however again, j occurs quite frequently which may help you distinguish it from the other Slavic languages. Except Serbo-Croatian. Unlucky.

Bulgarian: Езикът е структурирана система за комуникация, използвана от хората, включително реч, жестове и писане. Повечето езици имат система за писане, съставена от глифи за вписване на оригиналния звук или жест и неговото значение. Bulgarian has a couple of clues which I was made aware of by this useful page. Firstly, ъ, although not unique to Bulgarian, is used more commonly than in other Cyrillic alphabets. Secondly, a common word ending is -ата, which just transliterates to ‘-ata’.

Macedonian: Јазикот е структуриран систем на комуникација што го користат луѓето, вклучувајќи говор, гестови и пишување. Повеќето јазици имаат систем за пишување составен од глифи за да го напишат оригиналниот звук или гест и неговото значење. The letters unique to Macedonian are ѓ, ќ and ѕ. This means that if you see Cyrillic text with acute (upturned) accents scattered throughout, it’s probably Macedonian. Unless, in a very cruel twist of fate, you’re looking at Montenegrin with their two additional characters.

East Slavic

Russian: Язык — это структурированная система общения, используемая людьми, включая речь, жесты и письмо. В большинстве языков есть система письма, состоящая из глифов для обозначения исходного звука или жеста и его значения. Russian doesn’t have any identifying characters, so if you can’t see any of the ones discussed here then you’re probably looking at Russian. Then again, Russian is far more widely spoken than any other Cyrillic Slavic language so if you see Cyrillic text, it’s likely to be Russian anyway.

Ukrainian: Мова — це структурована система спілкування, що використовується людиною, включаючи мовлення, жести та письмо. У більшості мов є система письма, що складається з гліфів, щоб вписати оригінальний звук або жест та його значення. Ukrainian’s unique letters are ї, є and ґ.

Belarusian: Мова — гэта структураваная сістэма зносін, якая выкарыстоўваецца людзьмі, уключаючы маўленне, жэсты і пісьмо. У большасці моў ёсць сістэма пісьма, якая складаецца з гліфаў, каб упісаць арыгінальны гук альбо жэст і яго значэнне. If you see ў, it’s Belarusian.

Thank God that’s over.

The Baltic languages

Latvian: Valoda ir strukturēta saziņas sistēma, ko lieto cilvēki, ieskaitot runu, žestus un rakstīšanu. Lielākajai daļai valodu ir rakstīšanas sistēma, kas sastāv no glifiem, lai ierakstītu oriģinālo skaņu vai žestu un tā nozīmi. The letters ā, ē and ī are giveaways for Latvian. It also uses ū, which it shares with Lithuanian. Latvian is more likely than Lithuanian to have word endings containing only one vowel, for example -i and -as, although -ai comes up a lot as well.

Lithuanian: Kalba yra struktūrizuota žmonių naudojama komunikacijos sistema, įskaitant kalbą, gestus ir rašymą. Daugumoje kalbų yra rašymo sistema, susidedanti iš glifų, kad būtų galima įrašyti originalų garsą ar gestą ir jo reikšmę. Lithuanian may be thought of as an ‘older’ version of Latvian, although I’m not sure how welcome that comparison is amongst Latvians or Lithuanians. In Lithuanian, it is common to find vowel combinations which have been contracted in Latvian to give shorter sounds. Lithuanian is more likely to have word endings like -iu, -iai and -aus. Like Latvian, Lithuanian may contain ū. It shares ą and ę with Polish, but is fairly unique in also using į, ų and ė.

The Finno-Ugric languages

Finnish: Kieli on jäsennelty viestintäjärjestelmä, jota ihmiset käyttävät, mukaan lukien puhe, eleet ja kirjoittaminen. Useimmilla kielillä on glyfistä koostuva kirjoitusjärjestelmä alkuperäisen äänen tai eleen ja sen merkityksen kirjoittamiseksi. Finnish is a weird-looking language. It’s immediately recognisable as quite distinct from those of its neighbouring Nordic countries. There are a lot of double vowels aa, ee, ii, oo and uu can all be seen — as well as double consonants, including kk (Kimi Räikkönen), ll, mm and tt. Finnish uses ä and ö, and will often ram two äs together to give ää. Visually, I think the best way to describe Finnish would be a cross between Swedish and Dutch, although of course it’s genetically disparate.

Estonian: Keel on struktureeritud suhtlussüsteem, mida inimesed kasutavad, sealhulgas kõne, žestid ja kirjutamine. Enamikus keeltes on tähestikest koosnev kirjutamissüsteem originaalse heli või žesti ja selle tähenduse sisestamiseks. Estonian is closely related to Finnish and so looks pretty similar, i.e. weird. It shares ä and ö with Finnish, but to distinguish them, look for ü, õ, š and ž in Estonian.

Hungarian: A nyelv egy strukturált kommunikációs rendszer, amelyet az emberek használnak, beleértve a beszédet, a gesztusokat és az írást. A legtöbb nyelv rendelkezik írásjelekkel, amelyek karakterjelekből állnak, hogy felírják az eredeti hangot vagy gesztust és annak jelentését. Fairly different from the other two Finno-Ugric languages, Hungarian contains a lot of sz, gy, ly, ny and ty. It often has acute accents peppered throughout, so if á sénténcé lóóks líké thís, it’s probably Hungarian. Hungarian is also the only language to make use of ő.

Now just the standalone languages left to cover.

Greek

Albanian

Maltese

That’s it