BinQE is a collection of binary Quality Estimation (QE) datasets for different language pairs. Each entry consists of a source sentence, a automatic translation, and a binary label automatically produced by applying the method described in (Turchi et al., 2013):
This kind of judgements is particularly useful to train QE models for specific applications such as the integration in a Computer-assisted translation environment where a sharp distinction between "good” and “bad” translation suggestions is needed.
More specifically, BinQE contains:
The creation of BinQE was supported by the EU-funded project MateCat (ICT-2011.4.2-287688).BinQE is freely available for research purposes, and is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike (BY-NC-SA) license.
Click the button to get BinQE (a request form must be filled).
Whenever making reference to this resource, please cite the following paper:
Marco Turchi, Matteo Negri. "Automatic Annotation of Machine Translation Datasets with Binary Quality Judgements". In Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik, Iceland, 2014, pp. 1788-1792. (pdf)
Additional ReferenceMarion Potet, Emmanuelle Esperana-Rodier, Laurent Besacier, and Herv Blanchon. 2012. "Collection of a Large Database of French-English SMT Output Corrections". In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 2012.
Marco Turchi, Matteo Negri, and Marcello Federico. "Coping with the Subjectivity of Human Judgements in MT Quality Estimation". In Proceedings of the 8th Workshop on Statistical Machine Translation (WMT’13), Sofia, Bulgaria. 2013
ContactsFor questions and support about BinQe please contact: negri [at] fbk [dot] eu or turchi [at] fbk [dot] eu