Please use this identifier to cite or link to this item:
https://hdl.handle.net/2440/139687
Citations | ||
Scopus | Web of Science® | Altmetric |
---|---|---|
?
|
?
|
Type: | Journal article |
Title: | VPatho: a deep learning-based two-stage approach for accurate prediction of gain-of-function and loss-of-function variants |
Author: | Ge, F. Li, C. Iqbal, S. Muhammad, A. Li, F. Thafar, M.A. Yan, Z. Worachartcheewan, A. Xu, X. Song, J. Yu, D.J. |
Citation: | Briefings in Bioinformatics, 2023; 24(1):1-16 |
Publisher: | Oxford University Press (OUP) |
Issue Date: | 2023 |
ISSN: | 1467-5463 1477-4054 |
Statement of Responsibility: | Fang Ge, Chen Li, Shahid Iqbal, Arif Muhammad, Fuyi Li, Maha A. Thafar, Zihao Yan, Apilak Worachartcheewan, Xiaofeng Xu, Jiangning Song and Dong-Jun Yu |
Abstract: | Determining the pathogenicity and functional impact (i.e. gain-of-function; GOF or loss-of-function; LOF) of a variant is vital for unraveling the genetic level mechanisms of human diseases. To provide a 'one-stop' framework for the accurate identification of pathogenicity and functional impact of variants, we developed a two-stage deep-learning-based computational solution, termed VPatho, which was trained using a total of 9619 pathogenic GOF/LOF and 138 026 neutral variants curated from various databases. A total number of 138 variant-level, 262 protein-level and 103 genome-level features were extracted for constructing the models of VPatho. The development of VPatho consists of two stages: (i) a random under-sampling multi-scale residual neural network (ResNet) with a newly defined weighted-loss function (RUS-Wg-MSResNet) was proposed to predict variants' pathogenicity on the gnomAD_NV + GOF/LOF dataset; and (ii) an XGBOD model was constructed to predict the functional impact of the given variants. Benchmarking experiments demonstrated that RUS-Wg-MSResNet achieved the highest prediction performance with the weights calculated based on the ratios of neutral versus pathogenic variants. Independent tests showed that both RUS-Wg-MSResNet and XGBOD achieved outstanding performance. Moreover, assessed using variants from the CAGI6 competition, RUS-Wg-MSResNet achieved superior performance compared to state-of-the-art predictors. The fine-trained XGBOD models were further used to blind test the whole LOF data downloaded from gnomAD and accordingly, we identified 31 nonLOF variants that were previously labeled as LOF/uncertain variants. As an implementation of the developed approach, a webserver of VPatho is made publicly available at http://csbio.njust.edu.cn/bioinf/vpatho/ to facilitate community-wide efforts for profiling and prioritizing the query variants with respect to their pathogenicity and functional impact. |
Keywords: | weighted-loss function; random under-sampling; 1D-ResNet; 2D-ResNet; gnomAD variants; pathogenic GOF/LOF |
Rights: | © The Author(s) 2022. Published by Oxford University Press. All rights reserved. |
DOI: | 10.1093/bib/bbac535 |
Grant ID: | http://purl.org/au-research/grants/arc/DP120104460 http://purl.org/au-research/grants/arc/LP110200333 |
Published version: | http://dx.doi.org/10.1093/bib/bbac535 |
Appears in Collections: | Medicine publications |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.