Show simple item record

dc.contributor.authorPalomino, MA
dc.contributor.authorAider, F
dc.date.accessioned2022-09-01T12:24:53Z
dc.date.available2022-09-01T12:24:53Z
dc.date.issued2022-08-31
dc.identifier.issn2076-3417
dc.identifier.issn2076-3417
dc.identifier.otherARTN 8765
dc.identifier.urihttp://hdl.handle.net/10026.1/19592
dc.description.abstract

Practical demands and academic challenges have both contributed to making sentiment analysis a thriving area of research. Given that a great deal of sentiment analysis work is performed on social media communications, where text frequently ignores the rules of grammar and spelling, pre-processing techniques are required to clean the data. Pre-processing is also required to normalise the text before undertaking the analysis, as social media is inundated with abbreviations, emoticons, emojis, truncated sentences, and slang. While pre-processing has been widely discussed in the literature, and it is considered indispensable, recommendations for best practice have not been conclusive. Thus, we have reviewed the available research on the subject and evaluated various combinations of pre-processing components quantitatively. We have focused on the case of Twitter sentiment analysis, as Twitter has proved to be an important source of publicly accessible data. We have also assessed the effectiveness of different combinations of pre-processing components for the overall accuracy of a couple of off-the-shelf tools and one algorithm implemented by us. Our results confirm that the order of the pre-processing components matters and significantly improves the performance of naïve Bayes classifiers. We also confirm that lemmatisation is useful for enhancing the performance of an index, but it does not notably improve the quality of sentiment analysis.

dc.format.extent8765-8765
dc.languageen
dc.language.isoen
dc.publisherMDPI AG
dc.subjecttext pre-processing
dc.subjectsentiment analysis
dc.subjecttext mining
dc.subjectTwitter
dc.subjectsocial media
dc.subjectnaive Bayes classifiers
dc.subjectlemmatisation
dc.titleEvaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis
dc.typejournal-article
dc.typeArticle
plymouth.author-urlhttps://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000850941300001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue17
plymouth.volume12
plymouth.publisher-urlhttp://dx.doi.org/10.3390/app12178765
plymouth.publication-statusPublished online
plymouth.journalApplied Sciences
dc.identifier.doi10.3390/app12178765
plymouth.organisational-group/Plymouth
plymouth.organisational-group/Plymouth/Faculty of Science and Engineering
plymouth.organisational-group/Plymouth/Faculty of Science and Engineering/School of Engineering, Computing and Mathematics
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group/Plymouth/REF 2021 Researchers by UoA/UoA11 Computer Science and Informatics
plymouth.organisational-group/Plymouth/Users by role
plymouth.organisational-group/Plymouth/Users by role/Academics
dcterms.dateAccepted2022-08-27
dc.rights.embargodate2022-9-3
dc.identifier.eissn2076-3417
dc.rights.embargoperiodNot known
rioxxterms.versionofrecord10.3390/app12178765
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.typeJournal Article/Review
plymouth.funderAGE'IN (Age Independently)::Interreg 2 Seas Mers Zeeën


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record


All items in PEARL are protected by copyright law.
Author manuscripts deposited to comply with open access mandates are made available in accordance with publisher policies. Please cite only the published version using the details provided on the item record or document. In the absence of an open licence (e.g. Creative Commons), permissions for further reuse of content should be sought from the publisher or author.
Theme by 
Atmire NV