TDM-CFC: Towards Document-Level Multi-label Citation Function Classification

Zhang, Y.; Wang, Y.; Sheng, Q.Z.; Mahmood, A.; Emma Zhang, W.; Zhao, R.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/137389

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Conference paper
Title:	TDM-CFC: Towards Document-Level Multi-label Citation Function Classification
Author:	Zhang, Y. Wang, Y. Sheng, Q.Z. Mahmood, A. Emma Zhang, W. Zhao, R.
Citation:	Lecture Notes in Artificial Intelligence, 2021 / Zhang, W., Zou, L., Maamar, Z., Chen, L. (ed./s), vol.13081, pp.363-376
Publisher:	Springer
Issue Date:	2021
Series/Report no.:	Lecture Notes in Computer Science (LNCS, volume 13081)
ISBN:	9783030915599
ISSN:	0302-9743 1611-3349
Conference Name:	International Conference on Web Information Systems Engineering (WISE) (26 Oct 2021 - 29 Oct 2021 : Melbourne, Australia)
Editor:	Zhang, W. Zou, L. Maamar, Z. Chen, L.
Statement of Responsibility:	Yang Zhang, Yufei Wang, Quan Z. Sheng, Adnan Mahmood, Wei Emma Zhang, Rongying Zhao
Abstract:	Citation function classification is an indispensable constituent of the citation content analysis, which has numerous applications, ranging from improving informative citation indexers to facilitating resource search. Existing research works primarily simply treat citation function classification as a sentence-level single-label task, ignoring some essential realistic phenomena thereby creating problems like data bias and noise information. For instance, one scientific paper contains many citations, and each citation context may contain rich discussions of the cited paper, which may reflect multiple citation functions. In this paper, we propose a novel task of Document-level Multi-label Citation Function Classification in a bid to considerably extend the previous research works from a sentence-level single-label task to a document-level multi-label task. Given the complicated nature of the document-level citation function analysis, we propose a novel two-stage fine-tuning approach of large scale pre-trained language model. Specifically, we represent a citation as an independent token and propose a novel two-stage fine-tuning approach to better represent it in the document context. To enable this task, we accordingly introduce a new benchmark, i.e., TDMCite, encompassing 9594 citations (annotated for their function) from online scientific papers by leveraging a three-aspect citation function annotation scheme. Experimental results suggest that our approach results in a considerable improvement in contrast to the state-of-the-art BERT classification fine-tuning approaches.
Keywords:	Citation function; Masked language model; BERT; Natural language processing
Rights:	© 2021 Springer Nature Switzerland AG
DOI:	10.1007/978-3-030-91560-5_26
Published version:	https://link.springer.com/book/10.1007/978-3-030-91560-5
Appears in Collections:	Computer Science publications

Files in This Item:

There are no files associated with this item.

Show full item record

Adelaide Research & Scholarship