Can Computers Invent New Hebrew Words? AI meets the Akademiya, or: “Eliezer Bot Yehudah”
By Dr. Jeremy Benstein, HATC Senior Consultant
Like France and Spain, Israel has a national institution to guide language policy: the Academy of the Hebrew Language – האקדמיה ללשון העברית. One of the tasks it is charged with is coming up with good Hebrew equivalents for foreign words that have made their way into Israeli Hebrew from English or other tongues – known as lo’azit, all languages that are not Hebrew. For this task they have a special committee, composed of members of the Academy, along with representative writers, teachers and other language mavens. They also have started asking the public what they think about new ideas and suggestions for Hebrew neologisms (new words, coinages).
Indeed, in the end it is the public that will decide the fate of a new word, and sometimes it’s hard to predict what will catch on, and what will die an ignominious death of neglect. For instance, no Israeli would ever say kompyuter, since the universally accepted (Academy-coined) word is machshev (from the root ch-sh-v, meaning both think and compute). However, the Academy’s attempt at creating a Hebrew equivalent for telefon in sach-rachok (from words meaning ‘speak at a distance’) failed completely, and t-l-f-n has been accepted as a “Hebrew root,” and a person can even metalfen (call) someone else, though of course now that’s mostly on the nayad (portable), sometimes still called the cellulari.
This important mission of adapting the language to changing realities and needs has been basic since the beginning of bringing Hebrew back as a spoken language. (Actually, even before, since Hebrew in antiquity and the medieval eras also needed new words and concepts to deal with foreign novelties and changing times). One of Eliezer Ben Yehudah’s central tasks in his journalistic and lexicographical work was indeed to enrich the contemporary vocabulary – either with “repurposed” words from the sources, or with new coinages based on earlier roots, or borrowings from the closely related language of Arabic.
Neologizing, or creating a new Hebrew word requires several different skill sets and knowledge bases. A thorough grounding in general linguistics and the structure of the Hebrew language is essential. Familiarity with the different historical strata and classic texts of the language is up there too, but so is imagination and creativity, as well as a common touch, a more emotional intelligence about how words function in society, and what will actually “work.”
Given that, could this complex task be entrusted to a computer? Even with the amazing progress that machine learning and artificial intelligence have demonstrated recently (in everything from self-driving cars, to chat-bots that are almost human), could they acquire and apply the various skills necessary to come up with new Hebrew words?
The answer seems to be yes. Hebrew University computer science students Moran Mizrahi and Stav Yardeni Seelig (under the direction of Prof. Dafna Shahaf) undertook to design a program that can suggest neologisms that are at least as likely and potentially attractive as what the Academy serves up. This brief description is based on their work (see below for reference).
What did they do? They designed a process whereby they take an original English word, tease out its semantic components, translate those components into Hebrew, identify the equivalent roots, run those through a generator with the relevant mishkalim, the nominal or verbal forms that the roots are expressed in, and voilá – out comes a (potential) new word. One example they go into in depth is the word palette (a board artists use to mix colors). Israelis generally say paleta, though there does exist a rarely-used Academy-coined word p’techa (from a Talmudic root p-t-ch that meant “to mix”). The word palette connects to: color (צבע), mix (ערבב, ערבל), and board (לוח, קרש). There are several possible mishkalim, but the most relevant is maf’ela, which can be used for tools. So out comes matzbe’ah, and also ma’arbelah. The generator also will make up compound words – such as luach tzeva, a color board – and even “portmanteau words,” mushing two words into one, like kaduregel for “football” (kadur + regel) or ramzor for “traffic light” (remez + or). In this case the result was irbuluach, “a mixing board.” They then submit the ideas to a rating process done by actual humans, who grade each idea on three scales: suitability (as a Hebrew translation of the original idea); likability, and creativity.
Some words their generator came up with rated quite highly. For instance, Israelis use the word kapkeyk (ie cupcake) even though there is a “proper” word that has been proposed – עוגונית, oogonit, a diminutive of ‘ooga, “cake.” Their suggestion? גביעוגה, gevi’oogah, combining a word for cup gevi’a with cake, which may be a better proposal than the experts. And occasionally they come up with a whole list of possibilities. You’d think that an argumentative culture like Israel has a good word for debate, but most Israelis use that – dibeyt. The Academy has proposed ma’amat, from ‘imut, “confrontation.” But that has not been accepted at all. Their program suggests: sichuach (from sicha, conversation), pilmus (from the originally Greek pulmus, polemic or argument), krav diyun (a compound meaning “discussion battle”) and others.
Who knows? Maybe some day we will have a self-driving Academy.
The Hebrew essay Eliezer Bot Yehudah, by Moran Mizrahi and Stav Yardeni Seelig (under the direction of Prof. Dafna Shahaf) of the Hebrew University is a “popular” Hebrew language version of their academic paper, “Coming to Terms: Automatic Formation of Neologisms in Hebrew,” (published in Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4918–4929). The “bot” can be tried here.