Data-Driven Compilation of LFG Semantic Forms

Josef van Genabith, Louisa Sadler & Andy Way

In a recent paper (Van Genabith et al. 1999) describe a semi-automatic method for annotating tree banks with high level Lexical Functional Grammar f-structure representations. First, a CF-PSG is automatically induced from the tree bank using the method described in (Charniak, 1996). The CF-PSG is then manually annotated with functional schemata. The resulting LFG is then used to deterministically ``reparse'' the original tree bank representations simply following the c-structure defined by the original annotations, thereby inducing f-structures corresponding to the original c-structures. The annotated grammars, however, are not yet stand-alone LFGs. They cannot be used to analyse strings as opposed to strings annotated with c-structures as in the original tree bank. What is missing is the LFG account of subcategorization in terms of semantic forms and completeness and coherence constraints. In the present paper we develop an automatic method for compiling LFG semantic forms from the tree banks annotated with f-structure representations, effectively turning the grammars developed in (Van Genabith et al. 1999) into stand-alone LFG grammars. In addition, our method provides full semantic forms as PRED values for the f-structures obtained from the tree bank in (Van Genabith et al. 1999).

(To appear at EACL-99, June, Bergen, Norway).