A comparison of a novel optimized GSDMM Model with K-means clustering for topic modelling of free text
dc.contributor.author | Abdelmotaleb, H | |
dc.contributor.author | Wojtys, M | |
dc.contributor.author | McNeile, C | |
dc.date.accessioned | 2023-12-04T13:29:40Z | |
dc.date.available | 2023-12-04T13:29:40Z | |
dc.date.issued | 2023-12-06 | |
dc.identifier.uri | https://pearl.plymouth.ac.uk/handle/10026.1/21768 | |
dc.description.abstract |
Statistical topic modelling has become an important tool in the text processing field, because more applications are using it to handle the increasing amount of available text data, e.g. from social media platforms. The aim of topic modelling is to discover the main themes or topics from a collection of text documents. While several models have been developed, there is no consensus on evaluating the models, and how to determine the best hyper-parameters of the model. In this research, we develop a method for evaluating topic models for short text that employs word embedding and measuring within-topic variability and separation between topics. We focus on the Dirichlet Mixture Model and tuning its hyper-parameters. We also investigate using the K-means clustering algorithm. In empirical experiments, we present a novel case study on short text datasets related to the telecommunication industry. We find that the optimal values of hyper-parameters, obtained from our evaluation method, do not agree with the fixed values typically used in the literature and lead to different clustering of the text corpora. Moreover, we compare the discovered topics with those obtained from the K-means clustering. | |
dc.title | A comparison of a novel optimized GSDMM Model with K-means clustering for topic modelling of free text | |
dc.type | journal-article | |
plymouth.journal | Journal of Machine Intelligence and Data Science | |
dc.identifier.doi | 10.11159/jmids.2023.007 | |
plymouth.organisational-group | |Plymouth | |
plymouth.organisational-group | |Plymouth|Faculty of Science and Engineering | |
plymouth.organisational-group | |Plymouth|Faculty of Science and Engineering|School of Engineering, Computing and Mathematics | |
plymouth.organisational-group | |Plymouth|REF 2021 Researchers by UoA | |
plymouth.organisational-group | |Plymouth|Users by role | |
plymouth.organisational-group | |Plymouth|Users by role|Academics | |
plymouth.organisational-group | |Plymouth|REF 2021 Researchers by UoA|UoA10 Mathematical Sciences | |
plymouth.organisational-group | |Plymouth|REF 2021 Researchers by UoA|ZZZ Extended UoA 10 - Mathematical Sciences | |
plymouth.organisational-group | |Plymouth|REF 2028 Researchers by UoA | |
plymouth.organisational-group | |Plymouth|REF 2028 Researchers by UoA|UoA10 Mathematical Sciences | |
dcterms.dateAccepted | 2023-12-02 | |
dc.date.updated | 2023-12-04T13:29:39Z | |
dc.rights.embargodate | 2024-1-10 | |
rioxxterms.versionofrecord | 10.11159/jmids.2023.007 |