skip to main content

GSE30567: Long RNA-seq from ENCODE/Cold Spring Harbor Lab

Identifiers: SRA: SRP007461
GEO: GSE30567
GEO: GSE30567
Study Type: 
Transcriptome Analysis
Description: Summary: These tracks were generate by the ENCODE Consortium. They contain information about human RNAs > 200 nucleotides in length obtained as short reads off the Illumina GAIIx platform. Data is available from biological replicates of several cell lines. In addition to profiling Poly-A+ and Poly-A- RNA from whole cells, we have also gather data from various subcellular compartments. In many cases, there are Cap Analysis of Gene Expression (CAGE, RIKEN Institute) and Small RNA-Seq (<200 nucleotides, CSHL) and Pair-End di-TAG-RNA (PET-RNA, Genome Institute of Singapore) datasets available from the same biological replicates. Overall Design: We are using the published protocol This protocol generates directional libraries and reports the transcripts strand of origin. Exogenous RNA spike-ins (Round 5, pool 14), in development at National Institutes Standards Technology were added to each endogenous RNA isolate and carried through library construction and sequencing. The Illumina PhiX control library was also spiked-in at 1% to each completed human library just prior to cluster formation. Accompanying each RNA-Seq dataset is a "Production Document". This document contains details about the RNA isolations and treatments, library construction, spike-ins as well as quality control figures for individual libraries. The spike-in sequence and the concentrations can are available for download in the supplemental directory.The libraries are sequenced on the Illumina platform to an average depth of ~200 million reads (100 million mate-pairs). The data are mapped against hg19 using Spliced Transcript Alignment and Reconstruction (STAR) written by Alex Dobin (CSHL). More information, about STAR including the parameters used for these data can be found at:, we provide the following processed "element" data files: de novo splice junctions, de novo transcripts, and contigs. These elements are assessed for reproducibility using a nonparametric irreproducible detection (IDR) rate script. The IDR values for each element are included in the files for end-users to threshold on. An IDR value of 0.1 means that the probability of detecting that element in a third experiment equivalent in depth to the the sum of the bioreplicates is 90%. In addition, we also compute expression values for annotated genes, transcripts and exons.
Center Project: GSE30567: Long RNA-seq from ENCODE/Cold Spring Harbor Lab

Related SRA data

99 ( 99 samples )
205 (3.2Tbp; 1.9Tb)
Additional objects:
File type count
fastq 319