Show simple item record

dc.contributor.authorNorris, Jennifer Vivien
dc.contributor.otherFaculty of Science and Engineeringen_US
dc.date.accessioned2013-10-29T12:56:03Z
dc.date.available2013-10-29T12:56:03Z
dc.date.issued1998
dc.identifierNOT AVAILABLEen_US
dc.identifier.urihttp://hdl.handle.net/10026.1/2453
dc.description.abstract

This thesis concerns the COMMIX system, which automatically extracts information on what a text is about, and generates that information in the highly compacted form of compound nominal expressions. The expressions generated are complex and may include novel terms which do not appear themselves in the input text. From the practical point of view, the work is driven by the need for better representations of content: for representations which are shorter and more concise than would appear in an abstract, yet more informative and representative of the actual aboutness than commonly occurs in indexing expressions and key terms. This additional layer of representation is referred to in this work as pertaining to the essence of a particular text. From a theoretical standpoint, the thesis shows how the compound nominal as a construct can be successfully employed in these highly informative representations. It involves an exploration of the claim that there is sufficient semantic information contained within the standard dictionary glosses for individual words to enable the construction of useful and highly representative novel compound nominal expressions, without recourse to standard syntactic and statistical methods. It shows how a shallow semantic approach to content identification which is based on lexical overlap can produce some very encouraging results. The methodology employed, and described herein, is domain-independent, and does not require the specification of templates with which the input text must comply. In these two respects, the methodology developed in this work avoids two of the most common problems associated with information extraction. As regards the evaluation of this type of work, the thesis introduces and utilises the notion of percentage attainment value, which is used in conjunction with subjects' opinions about the degree to which the aboutness terms succeed in indicating the subject matter of the texts for which they were generated.

en_US
dc.language.isoenen_US
dc.publisherUniversity of Plymouthen_US
dc.titleThe Generation of Compound Nominals to Represent the Essence of Text The COMMIX Systemen_US
dc.typeThesis
plymouth.versionFull versionen_US
dc.identifier.doihttp://dx.doi.org/10.24382/4279


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV