首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.  相似文献   

3.
Rapidly improving high-throughput sequencing technologies provide unprecedented opportunities for carrying out population-genomic studies with various organisms. To take full advantage of these methods, it is essential to correctly estimate allele and genotype frequencies, and here we present a maximum-likelihood method that accomplishes these tasks. The proposed method fully accounts for uncertainties resulting from sequencing errors and biparental chromosome sampling and yields essentially unbiased estimates with minimal sampling variances with moderately high depths of coverage regardless of a mating system and structure of the population. Moreover, we have developed statistical tests for examining the significance of polymorphisms and their genotypic deviations from Hardy–Weinberg equilibrium. We examine the performance of the proposed method by computer simulations and apply it to low-coverage human data generated by high-throughput sequencing. The results show that the proposed method improves our ability to carry out population-genomic analyses in important ways. The software package of the proposed method is freely available from https://github.com/Takahiro-Maruki/Package-GFE.  相似文献   

4.
We show that existing RNA-seq, DNase-seq, and ChIP-seq data exhibit overdispersed per-base read count distributions that are not matched to existing computational method assumptions. To compensate for this overdispersion we introduce a nonparametric and universal method for processing per-base sequencing read count data called Fixseq. We demonstrate that Fixseq substantially improves the performance of existing RNA-seq, DNase-seq, and ChIP-seq analysis tools when compared with existing alternatives.  相似文献   

5.
The most compelling models concerning the peopling of the Americas consider that modern Amerindians share a common biological pattern, showing affinities with populations of the Asian Northeast. The aim of the present study was to assess the degree of variation of craniofacial morphology of South American Amerindians in a worldwide context. Forty-three linear variables were analyzed on crania derived from American, Asian, Australo-Melanesian, European, South-Saharan African, and Polynesian regions. South America was represented by seven Amerindian samples. In order to understand morphologic diversity among Amerindians of South America, variation was estimated using regions and local populations as units of analysis. Variances and F(ST) values were calculated for each unit, respectively. Both analyses indicated that morphologic variation in Southern Amerindians is extremely high: an F(ST) of 0.01531 was obtained for Southern Amerindians, and values from 0.0371-0.1205 for other world regions. Some aspects linked to the time and mode of the peopling of the Americas and various microevolutionary processes undergone by Amerindians are discussed. Some of the alternatives proposed to explain this high variation include: a greater antiquity of the peopling than what is mostly accepted, a peopling by several highly differentiated waves, an important effect of genetic drift, and gene flow with Paleoamericans. A combination of some of these alternatives explains at least some of the variation.  相似文献   

6.
7.
The use and validation of a strategy that allows a universal set of bar-coded sequencing primers to be appended to an amplified PCR product is described. The strategy allows a modular approach, in that the same bar code can be used with two or more target-specific primer sets, even simultaneously.  相似文献   

8.
9.
10.
Human males are remarkable among mammals in the level of investment they provide to their wives and children. However, there has been debate as to the degree to which men actually invest and through which fitness pathways the benefits of familial investment are realized. Much of the previous research exploring these issues has focused on men's roles as providers, but few have explored correlates of men's direct parental care. Although this is reasonable given men's parental emphasis on provisioning, the providing of direct care is more straightforward with a clear provider and recipient and little ambiguity as to the care‐giver's intent. Here, we explore contextual correlates of men's direct care among the Tsimane of Bolivia to determine the extent to which such care is patterned to enhance its effectiveness in increasing child wellbeing and the efficient functioning of the family. We also explore whether Tsimane fathers provide care in ways that enhance the positive effect it has on the wife's perception of the care provider. Overall, we find that Tsimane men appear responsive to the needs of children and the family, but show that there is little evidence that men respond to factors expected to increase the impact that men's care has on their reputations with their wives. Am J Phys Anthropol 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

11.
Microbially induced concrete corrosion (MICC) is an important problem in sewers. Here, small-subunit (SSU) rRNA gene amplicon pyrosequencing was used to characterize MICC communities. Microbial community composition differed between wall- and ceiling-associated MICC layers. Acidithiobacillus spp. were present at low abundances, and the communities were dominated by other sulfur-oxidizing-associated lineages.  相似文献   

12.
13.

Background

Bacterial viruses (phages) play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage.

Methodology/Principal Findings

To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles), and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL) or of a whole genome shotgun library (WGSL), or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling.

Conclusions/Significance

These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.  相似文献   

14.
15.
Takahiro Maruki  Michael Lynch 《Genetics》2014,197(4):1303-1313
Rapidly improving sequencing technologies provide unprecedented opportunities for analyzing genome-wide patterns of polymorphisms. In particular, they have great potential for linkage-disequilibrium analyses on both global and local genetic scales, which will substantially improve our ability to derive evolutionary inferences. However, there are some difficulties with analyzing high-throughput sequencing data, including high error rates associated with base reads and complications from the random sampling of sequenced chromosomes in diploid organisms. To overcome these difficulties, we developed a maximum-likelihood estimator of linkage disequilibrium for use with error-prone sampling data. Computer simulations indicate that the estimator is nearly unbiased with a sampling variance at high coverage asymptotically approaching the value expected when all relevant information is accurately estimated. The estimator does not require phasing of haplotypes and enables the estimation of linkage disequilibrium even when all individual reads cover just single polymorphic sites.  相似文献   

16.
17.
The tendency for chlorinated aliphatics and aromatic hydrocarbons to accumulate in environments such as groundwater and sediments poses a serious environmental threat. In this study, the metabolic capacity of hydrocarbon (aromatics and chlorinated aliphatics)-contaminated groundwater in the KwaZulu-Natal province of South Africa has been elucidated for the first time by analysis of pyrosequencing data. The taxonomic data revealed that the metagenomes were dominated by the phylum Proteobacteria (mainly Betaproteobacteria). In addition, Flavobacteriales, Sphingobacteria, Burkholderiales, and Rhodocyclales were the predominant orders present in the individual metagenomes. These orders included microorganisms (Flavobacteria, Dechloromonas aromatica RCB, and Azoarcus) involved in the degradation of aromatic compounds and various other hydrocarbons that were present in the groundwater. Although the metabolic reconstruction of the metagenome represented composite cell networks, the information obtained was sufficient to address questions regarding the metabolic potential of the microbial communities and to correlate the data to the contamination profile of the groundwater. Genes involved in the degradation of benzene and benzoate, heavy metal-resistance mechanisms appeared to provide a survival strategy used by the microbial communities. Analysis of the pyrosequencing-derived data revealed that the metagenomes represent complex microbial communities that have adapted to the geochemical conditions of the groundwater as evidenced by the presence of key enzymes/genes conferring resistance to specific contaminants. Thus, pyrosequencing analysis of the metagenomes provided insights into the microbial activities in hydrocarbon-contaminated habitats.  相似文献   

18.
目的:基于高通量测序技术,分析我国不同边境地区蚊虫携带病原的多样性,为快速筛查蚊媒病原提供参考依据。方法:将采自不同边境地区的田间样品随机分为6组,提取病毒核酸,然后使用Ion S5 XL测序仪进行高通量测序,并进行生物信息学分析。结果:在6个蚊类混合样本中,有4个混合样本发现2种及2种以上的可疑病毒。结论:高通量测序检测体系有效地检测出蚊媒病毒的存在,初步揭示了蚊类所携带的病原体种类的多样性。  相似文献   

19.
In many crop species, DNA fingerprinting is required for the precise identification of cultivars to protect the rights of breeders. Many families of retrotransposons have multiple copies throughout the eukaryotic genome and their integrated copies are inherited genetically. Thus, their insertion polymorphisms among cultivars are useful for DNA fingerprinting. In this study, we conducted a DNA fingerprinting based on the insertion polymorphisms of active retrotransposon families (Rtsp-1 and LIb) in sweet potato. Using 38 cultivars, we identified 2,024 insertion sites in the two families with an Illumina MiSeq sequencing platform. Of these insertion sites, 91.4% appeared to be polymorphic among the cultivars and 376 cultivar-specific insertion sites were identified, which were converted directly into cultivar-specific sequence-characterized amplified region (SCAR) markers. A phylogenetic tree was constructed using these insertion sites, which corresponded well with known pedigree information, thereby indicating their suitability for genetic diversity studies. Thus, the genome-wide comparative analysis of active retrotransposon insertion sites using the bench-top MiSeq sequencing platform is highly effective for DNA fingerprinting without any requirement for whole genome sequence information. This approach may facilitate the development of practical polymerase chain reaction-based cultivar diagnostic system and could also be applied to the determination of genetic relationships.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号