Names of places in Alsace are sometimes described as unpronounceable for French people. Most names have a Germanic root and sound pretty exotic for French speakers. I recommend taking a regional train and listening to the voice announcing the Alsatian stations with a very French pronunciation, quite a funny experience. A recent trip on the train made me think about the similarity of suffixes in Alsatian place names and I was interested to know how many places actually end on “-heim”, “-willer” and so on.
This post uses data from a scraped Wikipedia table from another post. It contains information on communes in the département Bas-Rhin. Here, we only need the names of the places. We start by loading packages, data and pulling out the column we are interested in.
First thing to clean up is possible extensions of the place names, such as river names (e.g. -sur-Zorn or -les-Bains).
As we are interested in the suffixes of the place names, we need a method to split the strings
into syllables. We will use the method hyphen() from the sylly package that
takes vectors of character strings and applies an hyphenation algorithm to each word
(the algorithm was originally developed for automatic word hyphenation in LaTeX).
The algorithm needs a set of hyphenation patterns which are provided in dictionaries
for different languages. Let’s try out the French (Alsace is in France!) and German
(Alsatian is a German dialect) dictionaries for the place names.
There are 255 names split in the same way, 258 differently. Indeed, the choice of the dictionary affects the outcomes. After manually inspecting the differences, I opt for the German dictionary
because it splits syllables better than the French one for the names under study
(e.g. -heim vs. -sheim, e.g. -wil-ler vs. -swiller).
However, there are still some manual corrections necessary, for example reuniting some syllables (like -wil and -ler) to get the correct suffixes.
Now we are ready to extract the suffix and count their appearences.
By far, the most common suffix in place names in Bas-Rhin is “-heim” (cf. also the graph below). More than a third of all the communes have this suffix. Other common suffixes are “-willer” (12%), “-bach” (7%), and “-dorf” (4%). As a last piece, here’s a bar chart visualizing this information.
This analysis was limited to places in the département Bas-Rhin, but I might extend it to the second Alsatian département Haut-Rhin in the near future.