Natural language handling

I am a (very amateur) enthusiast about natural languages, and have started to write some software related to this interest. However, I'm not a computational linguist -- this is just at a hobby level as far as the natural language side goes.

mulvo.el This has become MuLVoc on SourceForge.
Show localization for source files, e.g. translations of comments and perhaps names. Intended for (typically open-source) developers in countries where knowledge of English is not typically good; particularly where there may be large organizations who could have translators translating the comments on open-source code that their programmers then work on.
Switches input modes according to which column of a spreadsheet point is in. This requires a special row of the spreadsheet to set up its input methods. Useful for entering data for use with mulvo. Builds on csv-mode.el by Francis J. Wright.
Convert pinyin data coming in from files.
List the words occurring in the symbols defined in the currently running emacs. Meant to give an idea of what to translate if preparing for emacs work using localized-source.el.
Look up language codes, using the data from on Ethnologue.
Read a Swadesh list file from Wiktionary, and convert it into mulvo format data.

Here is an extract from wiktionary's description of the Swadesh List:

The below list of words was devised by the linguist Morris Swadesh. He used it as a means of determining the closeness of any pair of languages. It is a useful list of the most common words, which are essential to most languages and may be used in learning basic communication in other languages and even multiple languages at once since, for basic communication, vocabulary is generally more useful than a knowledge of the target language syntax. Sometimes it is even possible to learn basic communication with no knowledge of the target language syntax.

I think I'll have to produce a parts-of-speech table for the Swadesh list for myself, and will put it in my languages web pages when I've done it.

Add parts of speech to a Swadesh list.
Read a phrase list, such as the one in Basque language - Wikipedia, and produce a CSV file from it, for mulvo to use.
Support for the Irish (Gaelic) language. Modifies capitalization to understand uru (eclipsis).
Convert escape sequences to the characters they represent.

[My languages page] [My elisp index] [My emacs index] [My computing index] [My home page]

Contact me
Last modified: Thu Sep 6 16:05:46 IST 2007