Skip to content

Sentence Level Pre-trained Models

We have released several pre-trained TransQuest models on two aspects in sentence-level quality estimation. We will be keep releasing new models. So please keep in touch.

Predicting Direct Assessment

The current practice in MT evaluation is the so-called Direct Assessment (DA) of MT quality, where raters evaluate the machine translation on a continuous 1-100 scale. This method has been shown to improve the reproducibility of manual evaluation and to provide a more reliable gold standard for automatic evaluation metrics

We have released several quality estimation models for this aspect. We have also released a couple of multi-language pair models that would work on any language pair in any domain.

Available Models

Language Pair NMT/SMT Domain Algorithm Model Link
Romanian-English NMT Wikipedia MonoTransQuest TransQuest/monotransquest-da-ro_en-wiki
SiameseTransQuest TransQuest/siamesetransquest-da-ro_en-wiki
Estonian-English NMT Wikipedia MonoTransQuest TransQuest/monotransquest-da-et_en-wiki
SiameseTransQuest TransQuest/siamesetransquest-da-et_en-wiki
Nepalese-English NMT Wikipedia MonoTransQuest TransQuest/monotransquest-da-ne_en-wiki
SiameseTransQuest TransQuest/siamesetransquest-da-ne_en-wiki
Sinhala-English NMT Wikipedia MonoTransQuest TransQuest/monotransquest-da-si_en-wiki
SiameseTransQuest TransQuest/siamesetransquest-da-si_en-wiki
Russian-English NMT Wikipedia MonoTransQuest TransQuest/monotransquest-da-ru_en-reddit_wikiquotes
SiameseTransQuest TransQuest/siamesetransquest-da-ru_en-reddit_wikiquotes
English-German NMT Wikipedia MonoTransQuest TransQuest/monotransquest-da-en_de-wiki
SiameseTransQuest TransQuest/siamesetransquest-da-en_de-wiki
English-Chinese NMT Wikipedia MonoTransQuest TransQuest/monotransquest-da-en_zh-wiki
SiameseTransQuest TransQuest/siamesetransquest-da-en_zh-wiki
English-* Any Any MonoTransQuest TransQuest/monotransquest-da-en_any
SiameseTransQuest
*-English Any Any MonoTransQuest TransQuest/monotransquest-da-any_en
SiameseTransQuest
*-* Any Any MonoTransQuest TransQuest/monotransquest-da-multilingual
SiameseTransQuest

Note

* denotes any language. (*-* means any language to any language)

Predicting HTER

The performance of QE systems has typically been assessed using the semiautomatic HTER (Human-mediated Translation Edit Rate). HTER is an edit-distance-based measure which captures the distance between the automatic translation and a reference translation in terms of the number of modifications required to transform one into another. In light of this, a QE system should be able to predict the percentage of edits required in the translation.

We have released several quality estimation models for this aspect. We have also released a couple of multi-language pair models that would work on any language pair in any domain.

Available Models

Language Pair NMT/SMT Domain Algorithm Model Link
English-German NMT Wikipedia MonoTransQuest TransQuest/monotransquest-hter-en_de-wiki
SiameseTransQuest
NMT IT MonoTransQuest TransQuest/monotransquest-hter-en_de-it-nmt
SiameseTransQuest
SMT IT MonoTransQuest TransQuest/monotransquest-hter-en_de-it-smt
SiameseTransQuest
English-Latvian SMT Life Sciences MonoTransQuest TransQuest/monotransquest-hter-en_lv-it-nmt
SiameseTransQuest
NMT Life Sciences MonoTransQuest TransQuest/monotransquest-hter-en_lv-it-smt
SiameseTransQuest
English-Czech SMT IT MonoTransQuest TransQuest/monotransquest-hter-en_cs-pharmaceutical
SiameseTransQuest
German-English SMT Life Sciences MonoTransQuest TransQuest/monotransquest-hter-de_en-pharmaceutical
SiameseTransQuest
English-Chinese NMT Wikipedia MonoTransQuest TransQuest/monotransquest-hter-en_zh-wiki
SiameseTransQuest
English-* Any Any MonoTransQuest
SiameseTransQuest
*-* Any Any MonoTransQuest
SiameseTransQuest

Note

* denotes any language. (*-* means any language to any language)

If you are using the MonoTransQuest architecture, you can use the following code to load the model. The full notebook is available here. Let's consider loading monotransquest-da-ro_en-wiki.

1
2
3
4
5
6
7
import torch
from transquest.algo.sentence_level.monotransquest.run_model import MonoTransQuestModel


model = MonoTransQuestModel("xlmroberta", "TransQuest/monotransquest-da-ro_en-wiki", num_labels=1, use_cuda=torch.cuda.is_available())
predictions, raw_outputs = model.predict([["Reducerea acestor conflicte este importantă pentru conservare.", "Reducing these conflicts is not important for preservation."]])
print(predictions)

If you are using the SiameseTransQuest architecture, you can use the following code to load the model. The full notebook is available here. Let's consider loading siamesetransquest-da-ro_en-wiki.

1
2
3
4
5
6
7
import torch
from transquest.algo.sentence_level.siamesetransquest.run_model import SiameseTransQuestModel


model = SiameseTransQuestModel("TransQuest/siamesetransquest-da-ro_en-wiki")
predictions = model.predict([["Reducerea acestor conflicte este importantă pentru conservare.", "Reducing these conflicts is not important for preservation."]])
print(predictions)