|Run||Spots||Bases||Size||GC content||Published||Access Type|
This run has 5 reads per spot:
|L=4, 100%||L=12, 100%||L=2, 100%||L=19, 100%||L=235, 100%|
Technical read Application Read L=4, 100% Length is 4, 100% spots contain this read ̅L=165, σ=92.8, 66% Average length is 165, standard deviation is 92.8, 66% spots contain this read
Submitter demultiplexed reads. Each read was assigned to a sample pool member for those samples that yielded data.
Pool of 32 samples
|Biosample||Sample name||Title||Spots||Bases||Member Name|
|SRS008987||M10||human hand microbiome||708||192,062||M10_V2|
|SRS008988||M12||human hand microbiome||461||127,746||M12_V2|
|SRS008989||M14||human hand microbiome||834||230,917||M14_V2|
|SRS008990||M16||human hand microbiome||667||183,597||M16_V2|
|SRS008991||M20||human hand microbiome||493||132,979||M20_V2|
|SRS008992||M22||human hand microbiome||613||169,737||M22_V2|
|SRS008993||M24||human hand microbiome||695||191,140||M24_V2|
|SRS008994||M26||human hand microbiome||710||191,556||M26_V2|
|SRS008995||M30||human hand microbiome||539||142,779||M30_V2|
|SRS008996||M32||human hand microbiome||445||120,012||M32_V2|
|SRS008997||M34||human hand microbiome||470||127,968||M34_V2|
|SRS008998||M36||human hand microbiome||438||118,158||M36_V2|
|SRS008999||F10||human hand microbiome||271||72,596||F10_V2|
|SRS009000||F12||human hand microbiome||557||150,969||F12_V2|
|SRS009001||F14||human hand microbiome||608||165,166||F14_V2|
|SRS009002||F16||human hand microbiome||572||154,398||F16_V2|
|SRS009003||F20||human hand microbiome||457||121,784||F20_V2|
|SRS009004||F22||human hand microbiome||300||82,449||F22_V2|
|SRS009005||F24||human hand microbiome||514||138,732||F24_V2|
|SRS009006||F26||human hand microbiome||326||91,325||F26_V2|
|SRS009007||F30||human hand microbiome||365||99,596||F30_V2|
|SRS009008||F32||human hand microbiome||426||117,827||F32_V2|
|SRS009009||F34||human hand microbiome||319||87,744||F34_V2|
|SRS009010||F36||human hand microbiome||507||140,538||F36_V2|
|SRS009011||M40||human hand microbiome||285||77,364||M40_V2|
|SRS009012||M42||human hand microbiome||321||88,031||M42_V2|
|SRS009013||M44||human hand microbiome||341||94,150||M44_V2|
|SRS009014||M46||human hand microbiome||328||88,161||M46_V2|
|SRS009015||F40||human hand microbiome||311||83,907||F40_V2|
|SRS009016||F42||human hand microbiome||333||89,345||F42_V2|
|SRS009017||F44||human hand microbiome||219||59,503||F44_V2|
|SRS009018||F46||human hand microbiome||368||98,931||F46_V2|
|PRJNA34527||SRP000393||Global studies of microbial diversity on human skin|
This project aims to undertake global surveys of microbial diversity in a range of free-living and host-associated communities. The importance of the project is that it will provide a comparison of microbial diversity in a range of habitats and provide a platform to underpin many studies of community assembly, diversity, etc. Bacteria thrive on and within the human body. One of the largest human-associated microbial habitats is the skin surface, which harbors large numbers of bacteria that can have important effects on health. We examined the palmar surfaces of the dominant and nondominant hands of 51 healthy young adult volunteers to characterize bacterial diversity on hands and to assess its variability within and between individuals. We used a novel pyrosequencing- based method that allowed us to survey hand surface bacterial communities at an unprecedented level of detail. The diversity of skin-associated bacterial communities was surprisingly high; a typical hand surface harbored >150 unique species-level bacterial phylotypes, and we identified a total of 4,742 unique phylotypes across all of the hands examined. Although there was a core set of bacterial taxa commonly found on the palm surface, we observed pronounced intra- and interpersonal variation in bacterial community composition: hands from the same individual shared only 17% of their phylotypes, with different individuals sharing only 13%. Women had significantly higher diversity than men, and community composition was significantly affected by handedness, time since last hand washing, and an individual''s sex. The variation within and between individuals in microbial ecology illustrated by this study emphasizes the challenges inherent in defining what constitutes a "healthy" bacterial community; addressing these challenges will be critical for the International Human Microbiome Project. Bacterial 16S ribosomal RNA sequences have been deposited in the Short Read Archive.
You need SRA Toolkit to operate on SRA runs.
Default toolkit configuration enables it to find and retrieve SRA runs by accession. It also downloads (and cache) only the part of data you really need. For example quality scores represent a majority of data volume and you may not need them if you dump fasta only (versus fastq). Or if you are looking at particular gene you may not need reads aligned to other regions or not aligned at all. Same way if you use GATK with enabled SRA support you need only SRA run accessions to fire your process.
fastq-dump will dump reads in a number of "standard" fastq and fasta formats.
vdb-dump is also capable of producing fasta and fastq (beside other formats). It dumps data much faster then fastq-dump but ordering of reads may be different and it does not produce split-read multi-file output.
Prefetch tool will help you cache all data in advance if you plan to run data analysis in environment where getting data from NCBI at run time is unfeasible.
Read more at SRA Knowledge Base on how to download SRA data using command line utilities.
The sections below show results of analysis run by software which is still in experimental stage. Please use provided results with a boatload of salt and let us know what you think.
-- SRA team
- Unidentified reads: 6.84%
- Identified reads: 93.16%
Results show distribution of reads mapping to specific taxonomy nodes as a percentage of total reads within the analyzed run. In cases where a read maps to more than one related taxonomy node, the read is reported as originating from the lowest shared taxonomic node. So when a read maps to two species belonging to the same genus, it is reported as having originated from their common genus. Under typical conditions where a single organism has been sequenced, the expectations are that reads will map to several taxonomy nodes across the organism’s lineage, and that the number of reads mapping to higher level nodes will be more than those that map to terminal nodes.
STAT results are proportional to the size of sequenced genomes. So given a mixed sample containing several organisms at equal copy number, one expects proportionally more reads to originate from the larger genomes. This means that the percentages reported by STAT will reflect genome size and must be considered against the genomic complexity of the sequenced sample.
The NCBI SRA Taxonomy Analysis Tool (STAT) calculates the taxonomic distribution of reads from next generation sequencing runs. This analysis maps individual sequencing reads to a taxonomic hierarchy and reports the taxonomic composition of reads within a sequencing run.
STAT maps sequencing reads to a taxonomic hierarchy using a two-step strategy based on exact query read matches to precomputed k-mer dictionary databases. In the first pass a small, a “coarse” reference dictionary database is used to identify organisms matching a read set. In the second pass, organism-specific slices from a “fine” reference dictionary database are used to compute distribution of reads between identified taxonomy classes (species and higher order taxonomy nodes). When multiple taxnodes are mapped for single spot we use the lowest non-ambigous mappimg
STAT k-mer dictionaries are built using an iterative minhash based approach against reference genomic databases. For every fixed segment length of incoming reference nucleotide sequence, k-mer representing this segment selected based on minimum fvn1 hash function. Several strategies were used to enhance the specificity and accuracy of STAT results. Low complexity k-mers composed of >50% homo-polymer or dinucleotide repeats (e.g. AAAAAA or ACACACACACA) were filtered from dictionaries, and discrete k-mers belonging to multiple taxonomic references were “merged” at the lowest common taxonomic node shared between references. Finally, the specificity of representative k-mers was determined by searching against the source reference genomic database. When representative k-mers were found in multiple taxonomic references nodes, they were merged at the lowest common taxonomic node as above.
Reference sequences were mapped to the taxonomy hierarchy using the NCBI taxonomy database. The database contained 48,180 taxonomy nodes in January, 2017.
Segment sizes and K-mer selection
K-mer dictionaries were built by computationally slicing reference genomes into sequential segments and selecting 32-mers to represent each segment. The “coarse” k-mer dictionary uses variable segment lengths, proportional to genomes size and ranging from 200-8000 nt. The “fine” k-mer dictionary uses a constant 64 nt segment length for all genomes (for 32-mer index it gives us 32x reduction in space and io at the cost of expectation that we have at least one error-free 64-mer for every spot )
Can I get the software?
Yes. at github
git clone https://github.com/ncbi/ngs-tools.git --branch tax cd ./ngs-tools/tools/tax make Makefile and in ./examples folder you can find helper *.sh scripts
How can I cite you?
No publication yet. We intend to post a preprint soon.