Automatically Generating Natural Language Descriptions of Images by a Deep Hierarchical Framework

Huo, L; Bai, L; Zhou, Shang-Ming

dc.contributor.author	Huo, L
dc.contributor.author	Bai, L
dc.contributor.author	Zhou, Shang-Ming
dc.date.accessioned	2021-11-05T11:50:04Z
dc.date.available	2021-11-05T11:50:04Z
dc.date.issued	2022-08
dc.identifier.issn	2168-2267
dc.identifier.issn	2168-2275
dc.identifier.uri	http://hdl.handle.net/10026.1/18229
dc.description.abstract	Automatically generating an accurate and meaningful description of an image is very challenging. However, the recent scheme of generating an image caption by maximizing the likelihood of target sentences lacks the capacity of recognizing the human-object interaction (HOI) and semantic relationship between HOIs and scenes, which are the essential parts of an image caption. This article proposes a novel two-phase framework to generate an image caption by addressing the above challenges: 1) a hybrid deep learning and 2) an image description generation. In the hybrid deep-learning phase, a novel factored three-way interaction machine was proposed to learn the relational features of the human-object pairs hierarchically. In this way, the image recognition problem is transformed into a latent structured labeling task. In the image description generation phase, a lexicalized probabilistic context-free tree growing scheme is innovatively integrated with a description generator to transform the descriptions generation task into a syntactic-tree generation process. Extensively comparing state-of-the-art image captioning methods on benchmark datasets, we demonstrated that our proposed framework outperformed the existing captioning methods in different ways, such as significantly improving the performance of the HOI and relationships between HOIs and scenes (RHIS) predictions, and quality of generated image captions in a semantically and structurally coherent manner.
dc.format.extent	7441-7452
dc.format.medium	Print-Electronic
dc.language	eng
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.subject	Task analysis
dc.subject	Context modeling
dc.subject	Visualization
dc.subject	Solid modeling
dc.subject	Image recognition
dc.subject	Hybrid power systems
dc.subject	Generators
dc.subject	Human-object interaction (HOI)
dc.subject	hybrid deep learning
dc.subject	image captioning
dc.subject	image context
dc.subject	natural language processing
dc.title	Automatically Generating Natural Language Descriptions of Images by a Deep Hierarchical Framework
dc.type	journal-article
dc.type	Journal Article
plymouth.author-url	https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000733254800001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=11bb513d99f797142bcfeffcc58ea008
plymouth.issue	8
plymouth.volume	52
plymouth.publication-status	Published
plymouth.journal	IEEE Transactions on Cybernetics
dc.identifier.doi	10.1109/tcyb.2020.3041595
plymouth.organisational-group	/Plymouth
plymouth.organisational-group	/Plymouth/Faculty of Health
plymouth.organisational-group	/Plymouth/Faculty of Health/School of Nursing and Midwifery
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA
plymouth.organisational-group	/Plymouth/REF 2021 Researchers by UoA/UoA03 Allied Health Professions, Dentistry, Nursing and Pharmacy
plymouth.organisational-group	/Plymouth/Users by role
plymouth.organisational-group	/Plymouth/Users by role/Academics
dc.publisher.place	United States
dc.identifier.eissn	2168-2275
dc.rights.embargoperiod	Not known
rioxxterms.versionofrecord	10.1109/tcyb.2020.3041595
rioxxterms.licenseref.uri	http://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.type	Journal Article/Review

Files in this item

Name:: early-access-version.pdf
Size:: 2.533Mb
Format:: PDF

View/Open

Name:: UoP_Deposit_Agreement v1.1 ...
Size:: 125.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

School of Nursing and Midwifery

Show simple item record