Please use this identifier to cite or link to this item:
https://hdl.handle.net/2440/137389
Citations | ||
Scopus | Web of Science® | Altmetric |
---|---|---|
?
|
?
|
Type: | Conference paper |
Title: | TDM-CFC: Towards Document-Level Multi-label Citation Function Classification |
Author: | Zhang, Y. Wang, Y. Sheng, Q.Z. Mahmood, A. Emma Zhang, W. Zhao, R. |
Citation: | Lecture Notes in Artificial Intelligence, 2021 / Zhang, W., Zou, L., Maamar, Z., Chen, L. (ed./s), vol.13081, pp.363-376 |
Publisher: | Springer |
Issue Date: | 2021 |
Series/Report no.: | Lecture Notes in Computer Science (LNCS, volume 13081) |
ISBN: | 9783030915599 |
ISSN: | 0302-9743 1611-3349 |
Conference Name: | International Conference on Web Information Systems Engineering (WISE) (26 Oct 2021 - 29 Oct 2021 : Melbourne, Australia) |
Editor: | Zhang, W. Zou, L. Maamar, Z. Chen, L. |
Statement of Responsibility: | Yang Zhang, Yufei Wang, Quan Z. Sheng, Adnan Mahmood, Wei Emma Zhang, Rongying Zhao |
Abstract: | Citation function classification is an indispensable constituent of the citation content analysis, which has numerous applications, ranging from improving informative citation indexers to facilitating resource search. Existing research works primarily simply treat citation function classification as a sentence-level single-label task, ignoring some essential realistic phenomena thereby creating problems like data bias and noise information. For instance, one scientific paper contains many citations, and each citation context may contain rich discussions of the cited paper, which may reflect multiple citation functions. In this paper, we propose a novel task of Document-level Multi-label Citation Function Classification in a bid to considerably extend the previous research works from a sentence-level single-label task to a document-level multi-label task. Given the complicated nature of the document-level citation function analysis, we propose a novel two-stage fine-tuning approach of large scale pre-trained language model. Specifically, we represent a citation as an independent token and propose a novel two-stage fine-tuning approach to better represent it in the document context. To enable this task, we accordingly introduce a new benchmark, i.e., TDMCite, encompassing 9594 citations (annotated for their function) from online scientific papers by leveraging a three-aspect citation function annotation scheme. Experimental results suggest that our approach results in a considerable improvement in contrast to the state-of-the-art BERT classification fine-tuning approaches. |
Keywords: | Citation function; Masked language model; BERT; Natural language processing |
Rights: | © 2021 Springer Nature Switzerland AG |
DOI: | 10.1007/978-3-030-91560-5_26 |
Published version: | https://link.springer.com/book/10.1007/978-3-030-91560-5 |
Appears in Collections: | Computer Science publications |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.