Statistical Deep Parsing for Spanish: Abridged Version
DOI:
https://doi.org/10.19153/cleiej.25.1.2Abstract
This document presents the development of a statistical HPSG parser for Spanish. HPSG is a deep linguistic formalism that combines syntactic and semantic information in the same representation, and is capable of elegantly modeling many linguistic phenomena. We describe the HPSG grammar adapted to Spanish we designed and the construction of our corpus. Then we present the different parsing algorithms we implemented for our corpus and grammar: a bottom-up strategy, a CKY with supertagger approach, and a LSTM top-down approach. We then show the experimental results obtained by our parsers compared among themselves and also to other external Spanish parsers for some global metrics and for some particular phenomena we wanted to test. The LSTM top-down approach was the strategy that obtained the best results on most of the metrics (for our parsers and external parsers as well), including constituency metrics (87.57 unlabeled F1, 82.06 labeled F1), dependency metrics (91.32 UAS, 88.96 LAS), and SRL (87.68 unlabeled, 80.66 labeled), and most of the particular phenomenon metrics such as clitics reduplication, relative referents detection and coordination chain identification.
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Luis Chiruzzo
This work is licensed under a Creative Commons Attribution 4.0 International License.
CLEIej is supported by its home institution, CLEI, and by the contribution of the Latin American and international researchers community, and it does not apply any author charges whatsoever for submitting and publishing. Since its creation in 1998, all contents are made publicly accesibly. The current license being applied is a (CC)-BY license (effective October 2015; between 2011 and 2015 a (CC)-BY-NC license was used).