BATMAN

Description

BilinguAl TerM AligNer (BATMAN) (Arcan et al, 2014) is an open-source tool for aligning monolingual terminology, extracted from parallel texts, across different languages. BATMAN requires in input monolingual terms from the source and target language and the parallel documents from where the terms were extracted. As a result, it provides a list of aligned bilingual terminology.

The tool performs the extraction of bilingual terms in two phases. In the first one, a set of possible translations is obtained for each term using a translation system and a word aligner trained on the same data from which the bilingual terminology is extracted. This enhances the possibility of obtaining good term translations also with a small amount of parallel data. The second step consists in identifying the best translation. Given a set of possible translations for each term, the correct translation is retrieved taking advantage of the parallelism between source and target sentences, whereby two methods are investigated: sentence lookup or term lookup. With the first, a target translation from the candidate list is accepted as correct if it matches a span in the target sentence. With the term lookup strategy, a translation is accepted only if it has also been identified as a term in the target sentence. The term lookup method reduces the number of extracted bilingual terms, but guarantees a better quality of the term alignments, whereby the sentence lookup strategies are more tolerant, identifying more bilingual terms.

Acknowledgment

The software development was supported by the EU-funded project MateCat (ICT-2011.4.2-287688).

License

BATMAN is distributed under the GNU Lesser General Public License (LGPL).

Manual

Installation, configuration and usage instructions are available:

available soon

Source code

Source code is available here

available soon

Reference

If you intend to use BATMAN, please cite:

Arcan, Mihael, Marco Turchi, Sara Tonelli and Paul Buitelaar. 2014. “Enhancing Statistical Machine Translation with Bilingual Terminology in a CAT Environment”. Proceedings of the Association for Machine Translation in the Americas (AMTA ‘14), Vancouver, Canada, pp 54-68. (pdf) (bibentry)

Contacts

For questions and support about BATMAN please contact: turchi [at] fbk [dot] eu