Phrasal verb translation - tools

Last modified by Carlos Ramisch on 2014-08-22


Setting up a human judgment experiment requires a number of steps. We wrote scripts which can be used to set up an evaluation experiment (generate a Blast file) for any type of multi-word expressions. If you are only going to employ automatic metrics (BLEU, METEOR etc.) for your evaluation purposes, you can use them just to prepare a corpus to build and test a machine translation system. In order to set up an experiment, you will need three scripts: csv2text.exe (optional), extractsent.exe, and mweblast.exe. There are also two additional scripts, filterblast.exe and mixblast.exe, which can improve your evaluation experiment. Each script folder contains test data and a README file describing each script in detail. There is also a manual which shows how to set up an experiment. You are supposed to use mwetoolkit ( to identify multi-word expressions, Moses ( to build a machine translation system, and Blast ( or to conduct your evaluation experiment.


1.1. Prerequisites

All scripts are written in C# and are compatible with .NET Framework 3.5 (csv2text with 2.0) and later. They can be run on Windows, Linux, and Mac OS X.

1.2. Description

  • csv2text extracts plain text corpus after it has been preprocessed by mwetoolkit.
  • splitcorpus divides a corpus into train and test sets. The latter contains all the sentences having MWE.
  • mwe2blast generates a Blast file based on a Moses-generated translation and word-to-word alignment information.
  • filterblast deletes sentences in a Blast files based on a number of criteria – wordcount, annotation and patterns for wrongly identified phrasal verbs in split word order (optional).
  • mergeblast joins two Blast files into a single one based on a user-defined dissimilarity criterion
  • Manual.txt explains how to use the tools
Created by Carlos Ramisch on 2014-02-07

Copyright 2004-2017 XWiki