This project aims at the creation of lexical resources annotated with semantic roles for Portuguese.
Up til now, we annotated more than six thousand sentences in two corpora. One of the corpora contains Cardiology papers from three Brazilian journals, while the other is composed by newspaper articles from the Diário Gaúcho newspaper. The annotated sentences are organized according to verb and subcategorization frames. There are also verbs and sentences without semantic role annotation, since the annotation was made through sampling.
More information about the developments of the projects can be found on the following publications:
Zanette, Adriano, Carolina Scarton e Leonardo Zilio. 2012. Automatic extraction of subcategorization frames from corpora: an approach to Portuguese. In: Proceedings of PROPOR 2012 - Demonstration Session. Coimbra, Portugal. (link)
Zilio, Leonardo, Adriano Zanette e Carolina Scarton. 2012. Extração automática de estruturas de subcategorização a partir de corpora em português. In: Anais do ELC 2012, XI Encontro de Linguística de Corpus, São Carlos - SP. (link)
Zilio, Leonardo; Ramisch, Carlos; Finatto, Maria José B. 2013. Desenvolvimento de um recurso léxico com papéis semânticos para o português. Linguamática (Braga), v. 5, p. 23-41.
Zilio, Leonardo, Adriano Zanette e Carolina Scarton. 2014. Automatic extraction of subcategorization frames from Portuguese corpora. In: Aluisio, S. M.; Tagnin. S. E. O.. (Org.). New Languages Technologies and Linguistic Research: a Two-Way Road. 1ed. Cambridge: Cambridge Scholars Publishing, p. 78-96.
Zilio, Leonardo. VerbLexPor: um recurso léxico com anotação de papéis semânticos para o português. Dissertation. Porto Alegre: UFRGS. (PDF)
The XML and SQL files are available for download in the links below. And both give access to all the sentences of the corpora, organized alphabetically by verb. Some of these sentences are already annotated with semantic roles, but the majority of them has only syntactic information, as explained above.
The XML files from both corpora can be downloaded using the following links:
The SQL files from both corpora are here.