Automatically Generating Natural Language Descriptions of Images by a Deep Hierarchical Framework

ORCID

Shang Ming Zhou: 0000-0002-0719-9353

Abstract

Automatically generating an accurate and meaningful description of an image is very challenging. However, the recent scheme of generating an image caption by maximizing the likelihood of target sentences lacks the capacity of recognizing the human–object interaction (HOI) and semantic relationship between HOIs and scenes, which are the essential parts of an image caption. This article proposes a novel two-phase framework to generate an image caption by addressing the above challenges: 1) a hybrid deep learning and 2) an image description generation. In the hybrid deep-learning phase, a novel factored three-way interaction machine was proposed to learn the relational features of the human–object pairs hierarchically. In this way, the image recognition problem is transformed into a latent structured labeling task. In the image description generation phase, a lexicalized probabilistic context-free tree growing scheme is innovatively integrated with a description generator to transform the descriptions generation task into a syntactic-tree generation process. Extensively comparing state-of-the-art image captioning methods on benchmark datasets, we demonstrated that our proposed framework outperformed the existing captioning methods in different ways, such as significantly improving the performance of the HOI and relationships between HOIs and scenes (RHIS) predictions, and quality of generated image captions in a semantically and structurally coherent manner.

DOI Link

10.1109/tcyb.2020.3041595

Publication Date

2022-01-01

Publication Title

IEEE Transactions on Cybernetics

Volume

52

Issue

8

ISSN

2168-2267

Acceptance Date

2021-01-01

Deposit Date

2021-05-11

First Page

7441

Last Page

7452

Recommended Citation

Huo, L., Bai, L., & Zhou, S. (2022) 'Automatically Generating Natural Language Descriptions of Images by a Deep Hierarchical Framework', IEEE Transactions on Cybernetics, 52(8), pp. 7441-7452. Available at: 10.1109/tcyb.2020.3041595

School of Nursing and Midwifery

Automatically Generating Natural Language Descriptions of Images by a Deep Hierarchical Framework

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

First Page

Last Page

Recommended Citation

Search

Browse

About

Links

School of Nursing and Midwifery

Automatically Generating Natural Language Descriptions of Images by a Deep Hierarchical Framework

Authors

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

First Page

Last Page

Recommended Citation

Share

Search

Browse

About

Links