Clear cell renal cell carcinoma (ccRCC) is highly heterogeneous and is the most lethal cancer of all urologic cancers. We developed an unsupervised deep learning method, stacked denoising autoencoders (SdA), by integrating multi-platform genomic data for subtyping ccRCC with the goal of assisting diagnosis, personalized treatments and prognosis. We successfully found two subtypes of ccRCC using five genomics datasets for Kidney Renal Clear Cell Carcinoma (KIRC) from The Cancer Genome Atlas (TCGA). Correlation analysis between the last reconstructed input and the original input data showed that all the five types of genomic data positively contribute to the identification of the subtypes. The first subtype of patients had significantly lower survival probability, higher grade on neoplasm histology and higher stage on pathology than the other subtype of patients. Furthermore, we identified a set of genes, proteins and miRNAs that were differential expressed (DE) between the two subtypes. The function annotation of the DE genes from pathway analysis matches the clinical features. Importantly, we applied the model learned from KIRC as a pre-trained model to two independent datasets from TCGA, Lung Adenocarcinoma (LUAD) dataset and Low Grade Glioma (LGG), and the model stratified the LUAD and LGG patients into clinical associated subtypes. The successful application of our method to independent groups of patients supports that the SdA method and the model learned from KIRC are effective on subtyping cancer patients and most likely can be used on other similar tasks. We supplied the source code and the models to assist similar studies at https://github.com/tjgu/cancer_subtyping.
Scientific reports. 2019 Nov 13*** epublish ***
Tongjun Gu, Xiwu Zhao
Bioinformatics, Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA. ., Department of Ophthalmology & Visual Sciences, University of Michigan, Ann Arbor, MI, USA. .