Young Sicilian Manifesto

Lingua e Dialettu

Diventa poviru e servu,
quannu i paroli non figghianu paroli
e si mancianu tra d'iddi.

A people becomes poor and servile,
when words do not spawn words
and they devour themselves instead.

— Gnazziu Buttitta

Natural Language Processing

neural networks

Sicilian language

papers, presentations and code

Google Translate can now translate over 100 languages, but not Sicilian. Microsoft's Bing Translator and Yandex Translate cannot translate Sicilian either.

To my knowledge, the only effort to create a machine translator for the Sicilian language was the Sicilian-Spanish translator that Uliana Sentsova created for Apertium during the 2016 GSoC.

The dictionaries that she developed to translate the Sicilian language are very different from the statistical translators developed by Google, Microsoft and Yandex.

Statistical translation works well for language pairs that are frequently translated (like the official languages of the European Union). But statistical translation does not work well when few parallel texts are available and it does not work well with morphologically rich languages, like Sicilian, because a statistical translator does not have a human ear.

A statistical translator does not hear the similarity between "mèttiri" and "mìettiri" that a human ear hears, so the statistical translator identifies them as two different words, whereas the human ear identifies them as two variants of the same word. If we had enough parallel texts, we could train a statistical translator to recognize that similarity, but Sicilian does not have enough parallel texts.

What Sicilian has in abundance is people who love the Sicilian language.

Arthur Dieli recorded over 12,000 Sicilian words and phrases. Giuseppe Presicce recorded over 8000 words from the Salentino dialect spoken in Scorrano. Orlando Accetta recorded over 1000 words from the dialect spoken in Pizzo Calabro. And the Sicilian Wiktionary project has recorded over 18,000.

So it makes sense to build upon their dictionaries by developing dictionaries for a rule-based translator – like the ones that Uliana Sentsova developed for Apertium – because rule-based translators do not require parallel texts. They only require dictionaries.

Tying all of their work together, we can develop machine translators for the Sicilian language. And begin translating documents into the Sicilian language.

The translator that we develop might not translate English into Sicilian as well as Google translates English into Italian, but that is not the relevant comparison. The relevant comparison is the quality of our translator relative to what is available now. And there is nothing else available right now.

The absence of tools to translate documents into the Sicilian language prevents the Sicilian language from growing. By contrast, the abundance of tools to translate documents into English and Italian keeps them growing. That's why  (as of 18 May 2018)  English Wikipedia has 5,651,597 articles, Italian Wikipedia has 1,437,899 and Sicilian Wikipedia only has 25,990.

Quannu palori ngrisi non figghianu palori siciliani,
mancu palori siciliani figghianu palori siciliani.

When English words do not spawn Sicilian words,
not even Sicilian words will spawn Sicilian words.

The Sicilian language has great poetry, but a language cannot survive on poetry alone. To develop the Sicilian language, we must develop tools that translate documents into the Sicilian language.

English is a language of scholarship because documents are translated into English. To make Sicilian a language of scholarship, we must translate documents into Sicilian. Then when Sicilian is a language of scholarship, people will have lots of reasons to learn the Sicilian language.

To get started, we need a machine translator. It does not have to be a good translator. Its results can be filled with errors. If fixing those errors consumes less time than a fully human translation would consume, then the machine translator will reduce the cost of translation.

Suppose our machine translator reduces the work of a human translator by 80 percent. If so, then a document which previously required five hours to translate would now only require one hour. And our human translators could translate five times more documents than they could before.

All else equal, the lower marginal cost of producing documents in the Sicilian language should increase the number of documents published in the Sicilian language.

Quannu palori ngrisi figghianu palori siciliani,
puru palori siciliani figghianu palori ngrisi.

When English words spawn Sicilian words,
Sicilian words will spawn English words.

Wikipedia provides a great source of material to translate because its CC BY-SA license expressly permits us to share and adapt the work for commercial purposes, provided that we attribute the work to its authors and allow others to freely copy our work. They encourage sharing and remixing for commercial purposes, because if it is profitable to develop material for Wikipedia, then people will contribute more material to Wikipedia.

Development of a machine translator will reduce our costs, making it profitable for us to translate works into Sicilian and publish books that readers will purchase, read and enjoy.

And when it is profitable to develop the Sicilian language, then people will speak the Sicilian language more frequently – with friends, in business, for science and for poetry.

Accussì i palori figghianu sempri chiù palori.

Then words will spawn more and more words.

Copyright © 2014-2024 Eryk Wdowiak