Setting up a human judgment experiment requires a number of steps. We wrote scripts which can be used to set up an evaluation experiment (generate a Blast file) for any type of multi-word expressions. If you are only going to employ automatic metrics (BLEU, METEOR etc.) for your evaluation purposes, you can use them just to prepare a corpus to build and test a machine translation system. In order to set up an experiment, you will need three scripts: csv2text.exe (optional), extractsent.exe, and mweblast.exe. There are also two additional scripts, filterblast.exe and mixblast.exe, which can improve your evaluation experiment. Each script folder contains test data and a README file describing each script in detail. There is also a manual which shows how to set up an experiment. You are supposed to use mwetoolkit (http://mwetoolkit.sourceforge.net) to identify multi-word expressions, Moses (www.statmt.org/moses) to build a machine translation system, and Blast (http://cameleon.imag.fr/xwiki/bin/view/Main/Phrasal_verbs_annotation or www.ida.liu.se/~sarst/blast) to conduct your evaluation experiment.
All scripts are written in C# and are compatible with .NET Framework 3.5 (csv2text with 2.0) and later. They can be run on Windows, Linux, and Mac OS X.