Transcriptional profiling of lncRNAs and novel transcribed regions across a diverse panel of archived human cancers
Molecular characterization of tumors has been critical for identifying important genes in cancer biology and for improving tumor classification and diagnosis. Long non-coding RNAs (lncRNAs), as a new, relatively unstudied class of transcripts, provide a rich opportunity to identify both functional drivers and cancer-type specific biomarkers. However, despite the potential importance of lncRNAs to the cancer field, no comprehensive survey of lncRNA expression across various cancers has been reported. We used 3''-End Sequencing for Expression Quantification (3SEQ) to quantify transcript abundance across 64 solid tumors representing 17 diagnostic subtypes of adenocarcinomas, squamous cell carcinomas, and sarcomas. We identified hundreds of transcripts from among the known 1,065 lncRNAs surveyed that show variability in transcript levels between the tumor types, and therefore, make potential biomarker candidates. We discovered 1,071 novel intergenic transcribed regions and demonstrate that these show similar patterns of variability between tumor types. We find that many of these differentially expressed cancer transcripts are also expressed in normal tissues. One such novel transcript specifically expressed in breast tissue was further evaluated using RNA in situ hybridization on a panel of breast tumors and shown to correlate with low tumor grade and estrogen receptor expression, thereby representing a potentially important new breast cancer biomarker. This study provides the first large survey of lncRNA expression within a panel of solid cancers and also identifies a number of novel transcribed regions differentially expressed across distinct cancer types that represent candidate biomarkers for future research. Overall design: 3SEQ was performed on 64 formalin-fixed, paraffin-embedded (FFPE) human tumors representing 17 diagnostic cancer subtypes. Duplicate libraries were prepared for two of the tumors (ESS STT5520 and LMS STT516). 3SEQ was also performed on 27 normal human tissue samples. RNA-Seq was performed on 6 breast cancer cell lines for the examination of the breast-specific transcript on chr10. Series supplementary files: ''GSE28866_raw_counts_54511_peaks_cancer_and_normal.txt'': Raw_counts: Total 3SEQ reads in each peak for each sample. File includes read counts for 54,511 peaks for the 66 cancer libraries and the 27 normal libraries. ''GSE28866_36048_normalized_peaks_cancer_and_normal.txt'': Normalized_peaks: Normalized 3SEQ expression data for the 36,048 filtered peaks for the 66 cancer libraries and the 27 normal libraries. Peaks were classified as coding, lncRNA (Rinn, Bartel, Hughes, Gencode_lnc), other known transcripts (ref_known_Gencode_nc, all_mrna, other known), downstream, intron, promoter, or novel_intergenic. Peaks determined to be differentially expressed in one of the 17 cancer subtypes using a series of 2-Class SAM analyses are noted along with the type of cancer showing upregulation. Expression data was normalized using the sequencing depth of each sample by scaling the data using the mean value of each sample. Data was further compressed to reduce outliers by taking the square root of each value.
External Link: /pubmed:22929540