Word Level Examples
We have provided several examples on how to use TransQuest in recent WMT word-level quality estimation shared tasks. They are included in the repository but are not shipped with the library. Therefore, if you need to run the examples, please clone the repository.
Warning
Please don't use the same environment that you used to install TransQuest to run the examples. Create a new environment.
1 2 3 |
|
In the examples/word_level folder you will find the following tasks.
WMT 2020 QE Task 2: Word-Level Post-editing Effort
This task consists predicting Word-level quality for a given source and a target. It requires predicting word level quality in source and target as OK, BAD and also the quality of the "gaps" in target as Ok, BAD.
To run the experiments for each language please run this command from the root directory of TransQuest.
1 |
|
Language Pair options : en_zh (English-Chinese), en_de (English-German)
Architecture Options : microtransquest (MicroTransQuest)
As an example to run the experiments on English-Chinese with MicroTransQuwst architecture, run the following command.
1 |
|
Results
MicroTransQuest architecture in TransQuest outperforms OpenKiwi in all the language pairs.
Language Pair | Algorithm | Source F1 Multi |
Target F1 Multi |
---|---|---|---|
English-German | MicroTransQuest | 0.5456 | 0.6013 |
OpenKiwi | 0.3717 | 0.4111 | |
English-Chinese | MicroTransQuest | 0.4440 | 0.6402 |
OpenKiwi | 0.3729 | 0.5583 |
WMT 2019 QE Task 2: Word-Level QE
The participating systems are expected to predict the Word-level quality for a given source and a target.
To run the experiments for each language, please run this command from the root directory of TransQuest.
1 |
|
Language Pair options : en_ru (English-Russian)
Architecture Options : microtransquest (MicroTransQuest)
As an example to run the experiments on English-Russian with MicroTransQuest architecture, run the following command.
1 |
|
Results
MicroTransQuest architecture in TransQuest outperforms OpenKiwi in En-Ru.
Language Pair | Algorithm | Source F1 Multi |
Target F1 Multi |
---|---|---|---|
English-Russian | MicroTransQuest | 0.5543 | 0.5592 |
OpenKiwi | 0.2647 | 0.2412 |
WMT 2018 QE Task 2: Word-Level QE
The participating systems are expected to predict the Word-level quality for a given source and a target.
To run the experiments for each language, please run this command from the root directory of TransQuest. If both NMT and SMT is available for a certain language pair, specify that too.
1 |
|
Language Pair options : en_de (English-German) (both NMT and SMT), en_lv(English-Latvian) (both NMT and SMT), en_cs(English-Czech), de_en
Architecture Options : microtransquest (MicroTransQuest)
As an example to run the experiments on English-Latvian NMT with MicroTransQuest architecture, run the following command.
1 |
|
To run the English-Czech experiments with MicroTransQuest architecture,, run the following command
1 |
|
Results
MicroTransQuest architecture in TransQuest outperforms Marmot in all the language pairs.
Language Pair | Algorithm | Source F1 Multi |
Target F1 Multi |
Gaps F1 Multi |
---|---|---|---|---|
English-German (NMT) | MicroTransQuest | 0.2957 | 0.4421 | 0.1672 |
Marmot | 0.0000 | 0.1812 | 0.0000 | |
English-German (SMT) | MicroTransQuest | 0.5269 | 0.6348 | 0.4927 |
Marmot | 0.0000 | 0.3630 | 0.0000 | |
English-Latvian (NMT) | MicroTransQuest | 0.4880 | 0.5868 | 0.1664 |
Marmot | 0.0000 | 0.4208 | 0.0000 | |
English-Latvian (SMT) | MicroTransQuest | 0.4945 | 0.5939 | 0.2356 |
Marmot | 0.0000 | 0.3445 | 0.0000 | |
English-Czech | MicroTransQuest | 0.5327 | 0.6081 | 0.2018 |
Marmot | 0.0000 | 0.4449 | 0.0000 | |
German-English | MicroTransQuest | 0.4824 | 0.6485 | 0.4203 |
Marmot | 0.0000 | 0.4373 | 0.0000 |
Note
Please note that in WMT 2018 the organisers evaluated the gaps and the words in MT separately. This is different from WMT 2019 and WMT 2020.
Note
Please note that the baseline used in WMT 2018; Marmot does not support predicting quality for words in source and gaps in target. Hence, those values are set to 0.0000 in all the language pairs.
Tip
Too tired to train QE models? Checkout our model zoo.