Semi-Automatic Generation of F-Structures from Treebanks
Josef Van Genabith, Andy Way & Louisa Sadler
Statistical approaches to processing Lexical Functional Grammars (LFG-DOP, Bod & Kaplan 1998) require large corpora of text annotated with c-structure and f-structure representations. To date, such corpora that exist are constructed manually or semi-automatically. Manual construction is both time-consuming and error-prone. Semi-automatic construction usually proceeds as follows: an existing LFG grammar is used to automatically parse input text. Typically, for each sentence in the input text parsing will produce a large number of c- and f-structure analyses. A linguistic expert then inspects the analyses and for each sentence in the input text selects the single best analysis for the case at hand. For large grammars this can involve inspection of hundreds or thousands of proposed analyses for a single input sentence. In the present paper we present an alternative (albeit still semi-automatic) methodology that avoids manual inspection of analyses for best fit. As input, the method requires a treebank, from which the corresponding CF-PSG is automatically induced. This CF-PSG is then manually annotated with functional information, after which the treebank representations (not the strings) are simply ``reparsed'', thereby inducing f-structures corresponding to the original c-structures. The paper reports on an initial experiment and discusses some advantages and disadvantages of this approach.
(To appear at LFG-99, July, Manchester, UK)