1. INTRODUCTION

Setting up a human judgment experiment requires a number of steps. We wrote scripts which can be used to set up an evaluation experiment (generate a Blast file) for any type of multi-word expressions. If you are only going to employ automatic metrics (BLEU, METEOR etc.) for your evaluation purposes, you can use them just to prepare a corpus to build and test a machine translation system. In order to set up an experiment, you will need three scripts: csv2text.exe (optional), extractsent.exe, and mweblast.exe. There are also two additional scripts, filterblast.exe and mixblast.exe, which can improve your evaluation experiment. Each script folder contains test data and a README file describing each script in detail. There is also a manual which shows how to set up an experiment. You are supposed to use mwetoolkit (http://mwetoolkit.sourceforge.net) to identify multi-word expressions, Moses (www.statmt.org/moses) to build a machine translation system, and Blast (http://cameleon.imag.fr/xwiki/bin/view/Main/Phrasal_verbs_annotation or www.ida.liu.se/~sarst/blast) to conduct your evaluation experiment.

2. SCRIPTS

1.1. Prerequisites

All scripts are written in C# and are compatible with .NET Framework 3.5 (csv2text with 2.0) and later. They can be run on Windows, Linux, and Mac OS X.

1.2. Description

csv2text extracts plain text corpus after it has been preprocessed by mwetoolkit.
splitcorpus divides a corpus into train and test sets. The latter contains all the sentences having MWE.
mwe2blast generates a Blast file based on a Moses-generated translation and word-to-word alignment information.
filterblast deletes sentences in a Blast files based on a number of criteria – wordcount, annotation and patterns for wrongly identified phrasal verbs in split word order (optional).
mergeblast joins two Blast files into a single one based on a user-defined dissimilarity criterion
Manual.txt explains how to use the tools

Tags:

Created by Carlos Ramisch on 2014-02-07

Phrasal verb translation - tools

1. INTRODUCTION

2. SCRIPTS

1.1. Prerequisites

1.2. Description

Navigation

My Recent Modifications