----------------------------- README ------------------------------------ This software was originally envisioned as an English-Esperanto translator. Due to time constraints, the author was not able to achieve his goal. Nevertheless, various components of this system may still be useful as independent programs for sentence parsing and analysis. Please be warned that the author does not claim to be proficient in Esperanto. Unexpected failures of programs can be attributed to author's poor programming skills as well as his poor language understanding (including English. sigh) The author, however, assumes no responsibility for possible hardware, software, or mental damages induced by this package. For greater details, please read the file COPYING. I am sorry for not being able to provide an Esperanto version of this README file. I welcome comments and suggestions on programs as well as this very document. May 3, 1995 Jui-Yuan Fred Hsu, juiyuan@cs.cornell.edu, (607) 844-3697 (607) 255-1041 (Upson Hall 323) http://www.cs.cornell.edu/Info/People/fred/ HARDWARE AND SOFTWARE PLATFORM I have only run this software under my own environment. I am working on a Sun Sparc station running SunOS 4.1. C++ programs are compiled using gnu g++ 2.5.8 and Lisp programs run under Lucid Common Lisp/Sparc 4.1. INTRODUCTION The translation of a sentence between two languages involves basically two processes. The first part takes a sentence from the source language and turn it into some sort of logical form, and then convert it from logical form to target language. I have made use of a Bottom-Up chart parser that comes with "Natural Language Understanding" by James Allen. Given lexicon entries, grammar rules and semantic information for a specific language, the BU parser can parse a sentence, and produce corresponding logical form. The NLP package also includes a simple sentence generator that converts logical form into original sentence. The parser and the generator are writen in LISP, and work fine under Lucid Common Lisp. Lexicon and Grammar rules for English processing is taken from a class assignment where YiChen Chen, my brother Richard and I worked as a group. The grammars understand a great deal of English sentences, and produce human readable logical forms. But the lexicon was too small. So far, one cannot generate sentences from above mentioned logical forms. Lexicon and Grammar rules for Esperanto processing is fully working. It does not encompass all Esperanto rules, but the lexicon is fairly complete (relatively), and it parses and generates Esperanto sentences. The parser, however, cannot process affixes. It lackes the ability to break words into basic morphemes. This weakness can be tolerated for English sentences, but for Esperanto, it is a fatal blow, as Esperanto relies heavily on affixes. I have thus written a program to parse "words" into morphemes. I have taken the dictionary of Esp-Eng words compiled by Neal McBurnett in 1992 as a base for "word" parser and Esperanto BU parsing. This list contains grammatical tags for each word. At this point in time, the translator as a whole does not work, as one can easily observe. But the components can be used independently, and will be described in coming sections. INSTALLATION 1. ftp and download a copy of the software under the name "translator.tar.Z" 2. run "uncompress translator.tar.Z", and the file becomes "translator.tar" 3. run "tar xvf translator.tar", and a new directory translator/ will be created and files are unpacked from the tar file onto this directory. 4. "cd translator" 5. Customize "./Makefile" Specifically, you may need to modify CC, INCLUDE_PATH, and LIBS for the C++ programs to compile and link correctly. Make sure your Common Lisp executable name is correctly identified by LISP_BIN 7. "make all". (optionally, if this is not the first you "make", you may want to run "make clean" to make sure old object files are removed) DIRECTORIES AND FILES esperanto/ main directory README this file COPYING GNU public license information TODO to-do list Makefile main makefile allen/ BU parser and generator from James Allen's book I have modified some of his code vortaro/ Dictionary directory esp-angla-vortoj.txt main Esperanto dictionary, from Neal McBurnett suppl.vortaro supplementary dictionary filter.cc.txt filter file to retrieve useful lexical entries filter.lisp.txt exclude.cc.txt filter file to exclude unwanted entries exclude.lisp.txt lex.for.cc [final usable lexicon entries] lex.for.lisp tools/ tool C++ libraries src/ C++ programs common.h common definitions and routines common.cc buildlex.cc build lexicon files in the vortaro/ directory lispify.cc PARSE WORDS INTO BASIC MORPHEMES (C++) glue.cc GLUE MORPHEMES BACK TO WORDS (C++) eng/ ENGLISH GRAMMAR for BU parser (Lisp) eng.lisp esp/ ESPERANTO GRAMMAR for BU parse (Lisp) esp.lisp LOOK AT THIS FILE! bin/ Executable directory buildlex* one-time setup of lexicon files lispify* filter: input Esperanto sentence, output morphemes. Output can be fed into Lisp translate* filter(script): send morphenes to BU parser, obtain logical form, then apply Esperanto sentence generator to get back the original Esperanto sentence (Esperanto-Esperanto). glue* filter: glue morphemes back to words run* given a file containing an Esperanto sentence, does lispify | translate | glue sentence? Esperanto testing sentences lispify?.test testing files for lispify "lispify" can parse more sentences than BU parser can understand doc/ Postscript documentation. english.semantics.ps on English grammar and lexicon translator.ps on Esperanto-English translator SAMPLE RUNS While trying out this software, please keep in mind that it is just a classwork. Do not keep your expectations too high. To find out the kinds of sentences it can process, please look at the testing section at the bottom of file esp.lisp. Following are illustrations of parsings. Some samples contain very long sentences. Users are not encouraged to try sentences of such length, unless a great deal of patience resides in user's heart. ------------------------------------------------------------------------ ~translator/bin> run sentence1 -- runing Esperanto-Esperanto on sentence < de mi sur tablo al ni letero estas skribita > -- result: LETERO ESTAS SKRIBITA AL NI DE MI SUR TABLO ------------------------------------------------------------------------ ~translator/bin> cat sentence2 la libroj estas bonaj ~translator/bin> cat sentence2 | lispify la libr +o +j est +as bon +a +j ~translator/bin> cat sentence2 | lispify | translate ;;; Lucid Common Lisp/SPARC ;;; Development Environment Version 4.1 DBCS, 12 October 1992 ;;; Copyright (C) 1985, 1986, 1987, 1988, 1992 by Lucid, Inc. ;;; All Rights Reserved [ommitted...] # > (LA LIBR +O +J EST +AS BON +A +J) > ;;; Loading source file "../allen/loadFunction" ;;; Loading source file "/amd/sundown/b/juiyuan/cornell/674/translator/allen [ommitted...] Semantic Interpretation (LA LIBR +O +J EST +AS BON +A +J) [ommitted...] result LA LIBR +O +J EST +AS BON +A +J ~translator/bin> cat sentence2 | lispify | translate | glue [ommitted...] LA LIBR +O +J EST +AS BON +A +J LA LIBROJ ESTAS BONAJ ------------------------------------------------------------------------ ~translator/bin> run sentence4 -- runing Esperanto-Esperanto on sentence < l' lernanto estas esperanta l' libron > -- result: LA LERNANTO ESTAS ESPERANTA LA LIBRON --------------------------------------------------------------- ~translator/bin> cat sentence3 viaj eksgepatretoj estas donintaj tiun cxi en mia malgrandega domego ~translator/bin> cat sentence3 | lispify vi +a +j eks+ ge+ patr +et +o +j est +as don +int +a +j tiu +n cxi en mi +a mal+ grand +eg +a dom +eg +o ~translator/bin> cat sentence3 | lispify | translate [ommitted...] Semantic Interpretation (VI +A +J EKS+ GE+ PATR +ET +O +J EST +AS DON +INT +A +J TIU +N CXI EN MI +A MAL+ GRAND +EG +A DOM +EG +O) ~translator/bin> cat sentence3 | lispify | translate | glue ~translator/bin> run sentence3 -- runing Esperanto-Esperanto on sentence < viaj eksgepatretoj estas donintaj tiun cxi en mia malgrandega domego > -- result: VIAJ EKSGEPATRETOJ ESTAS DONINTAJ CXI TIUN EN MIA MALGRANDEGA DOMEGO --------------------------------------------------------------- ~translator/bin> cat sentence5 | lispify | translate [ommitted...] Semantic Interpretation (DE EN LA TABL +O SALT +OS LA KAT +O) result LA KAT +O SALT +OS EL LA TABL +O --------------------------------------------------------------- ~translator/bin> run sentence6 -- runing Esperanto-Esperanto on sentence < ili iras okcidente > -- result: ILI IRAS OKCIDENTE --------------------------------------------------------------- ~translator/bin> run sentence7 -- runing Esperanto-Esperanto on sentence < al fred hsu de la universitato de cornell viaj eksgepatretoj estas donintaj tiun cxi en mia malgrandega domego > -- result: VIAJ EKSGEPATRETOJ ESTAS DONINTAJ CXI TIUN AL FRED HSU DE LA UNIVERSITATO DE CORNELL EN MIA MALGRANDEGA DOMEGO --------------------------------------------------------------- ~translator/bin> run sentence8 -- runing Esperanto-Esperanto on sentence < sur la tablo al ni skribas leteron mi > -- result: MI SKRIBAS LETERON AL NI SUR LA TABLO --------------------------------------------------------------- ~translator/bin> run sentence9 -- runing Esperanto-Esperanto on sentence < letero estas skribita de mi al ni sur la tablo > -- result: LETERO ESTAS SKRIBITA AL NI DE MI SUR LA TABLO