Metadata changes | ||
---|---|---|
Property | Previous value | New value |
Document author | Alexander Kobzar | Carlos Ramisch |
Content changes |
---|
(% style="font-size: 25px; line-height: 1.2em; color: rgb(77, 77, 77);" %)1. INTRODUCTION= 1. INTRODUCTION = (% style="font-family: sans-serif; font-style: normal;" %)= 2. SCRIPTS = = (% style="font-size: 25px;" %)2. SCRIPTS(%%) = == (% style="font-size: 23px; line-height: 1.2em; color: rgb(77, 77, 77);" %)1.1. Prerequisites(%%)1.1. Prerequisites == == (% style="font-size: 23px; line-height: 1.2em; color: rgb(77, 77, 77);" %)1.2. Description(%%)1.2. Description == * (% style="font-size: 14px;" %)**csv2text**[[csv2text>>attach:csv2text.zip]] extracts plain text corpus after it has been preprocessed by mwetoolkit. * (% style="font-size: 14px;" %)**splitcorpus**[[splitcorpus>>attach:splitcorpus.zip]] divides a corpus into train and test sets. The latter contains all the sentences having MWE. * (% style="font-size: 14px;" %)**mwe2blast**[[mwe2blast>>attach:mwe2blast.zip]] generates a Blast file based on a Moses-generated translation and word-to-word alignment information. * (% style="font-size: 14px;" %)**filterblast**[[filterblast>>attach:filterblast.zip]] deletes sentences in a Blast files based on a number of criteria – wordcount, annotation and patterns for wrongly identified phrasal verbs in split word order (optional). * (% style="font-size: 14px;" %)**mergeblast**[[mergeblast>>attach:mergeblast.zip]] joins two Blast files into a single one based on a user-defined dissimilarity criterion * [[Manual.txt>>attach:Manual.txt]] explains how to use the tools |