首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We offer a guide to de novo genome assembly1 using sequence data generated by the Illumina platform for biologists working with fungi or other organisms whose genomes are less than 100 Mb in size. The guide requires no familiarity with sequencing assembly technology or associated computer programs. It defines commonly used terms in genome sequencing and assembly; provides examples of assembling short-read genome sequence data for four strains of the fungus Grosmannia clavigera using four assembly programs; gives examples of protocols and software; and presents a commented flowchart that extends from DNA preparation for submission to a sequencing center, through to processing and assembly of the raw sequence reads using freely available operating systems and software.  相似文献   

2.
For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set.  相似文献   

3.
Multiple Displacement Amplification (MDA) of DNA using φ29 (phi29) DNA polymerase amplifies DNA several billion-fold, which has proved to be potentially very useful for evaluating genome information in a culture-independent manner. Whole genome sequencing using DNA from a single prokaryotic genome copy amplified by MDA has not yet been achieved due to the formation of chimeras and skewed amplification of genomic regions during the MDA step, which then precludes genome assembly. We have hereby addressed the issue by using 10 ng of genomic Vibrio cholerae DNA extracted within an agarose plug to ensure circularity as a starting point for MDA and then sequencing the amplified yield using the SOLiD platform. We successfully managed to assemble the entire genome of V. cholerae strain LMA3984-4 (environmental O1 strain isolated in urban Amazonia) using a hybrid de novo assembly strategy. Using our method, only 178 out of 16,713 (1%) of contigs were not able to be inserted into either chromosome scaffold, and out of these 178, only 3 appeared to be chimeras. The other contigs seem to be the result of template-independent non-specific amplification during MDA, yielding spurious reads. Extraction of genomic DNA within an agarose plug in order to ensure circularity of the extracted genome might be key to minimizing amplification bias by MDA for WGS.  相似文献   

4.
Little is known about the variations of nematode mitogenomes (mtDNA). Sequencing a complete mtDNA using a PCR approach remains a challenge due to frequent genome reorganizations and low sequence similarities between divergent nematode lineages. Here, a genome skimming approach based on HiSeq sequencing (shotgun) was used to assemble de novo the first complete mtDNA sequence of a root-knot nematode (Meloidogyne graminicola). An AT-rich genome (84.3%) of 20,030 bp was obtained with a mean sequencing depth superior to 300. Thirty-six genes were identified with a semi-automated approach. A comparison with a gene map of the M. javanica mitochondrial genome indicates that the gene order is conserved within this nematode lineage. However, deep genome rearrangements were observed when comparing with other species of the superfamily Hoplolaimoidea. Repeat elements of 111 bp and 94 bp were found in a long non-coding region of 7.5 kb, as similarly reported in Mjavanica and Mhapla. This study points out the power of next generation sequencing to produce complete mitochondrial genomes, even without a reference sequence, and possibly opening new avenues for species/race identification, phylogenetics and population genetics of nematodes.  相似文献   

5.
6.
7.
The assembly of multiple genomes from mixed sequence reads is a bottleneck in metagenomic analysis. A single-genome assembly program (assembler) is not capable of resolving metagenome sequences, so assemblers designed specifically for metagenomics have been developed. MetaVelvet is an extension of the single-genome assembler Velvet. It has been proved to generate assemblies with higher N50 scores and higher quality than single-genome assemblers such as Velvet and SOAPdenovo when applied to metagenomic sequence reads and is frequently used in this research community. One important open problem for MetaVelvet is its low accuracy and sensitivity in detecting chimeric nodes in the assembly (de Bruijn) graph, which prevents the generation of longer contigs and scaffolds. We have tackled this problem of classifying chimeric nodes using supervised machine learning to significantly improve the performance of MetaVelvet and developed a new tool, called MetaVelvet-SL. A Support Vector Machine is used for learning the classification model based on 94 features extracted from candidate nodes. In extensive experiments, MetaVelvet-SL outperformed the original MetaVelvet and other state-of-the-art metagenomic assemblers, IDBA-UD, Ray Meta and Omega, to reconstruct accurate longer assemblies with higher N50 scores for both simulated data sets and real data sets of human gut microbial sequences.  相似文献   

8.
9.
10.
Illumina's Genome Analyzer generates ultra-short sequence reads, typically 36 nucleotides in length, and is primarily intended for resequencing. We tested the potential of this technology for de novo sequence assembly on the 6 Mbp genome of Pseudomonas syringae pv. syringae B728a with several freely available assembly software packages. Using an unpaired data set, velvet assembled >96% of the genome into contigs with an N50 length of 8289 nucleotides and an error rate of 0.33%. edena generated smaller contigs (N50 was 4192 nucleotides) and comparable error rates. ssake and vcake yielded shorter contigs with very high error rates. Assembly of paired-end sequence data carrying 400 bp inserts produced longer contigs (N50 up to 15 628 nucleotides), but with increased error rates (0.5%). Contig length and error rate were very sensitive to the choice of parameter values. Noncoding RNA genes were poorly resolved in de novo assemblies, while >90% of the protein-coding genes were assembled with 100% accuracy over their full length. This study demonstrates that, in practice, de novo assembly of 36-nucleotide reads can generate reasonably accurate assemblies from about 40 × deep sequence data sets. These draft assemblies are useful for exploring an organism's proteomic potential, at a very economic low cost.  相似文献   

11.

Background

Novosphingobium sp. strain PP1Y is a marine α-proteobacterium adapted to grow at the water/fuel oil interface. It exploits the aromatic fraction of fuel oils as a carbon and energy source. PP1Y is able to grow on a wide range of mono-, poly- and heterocyclic aromatic hydrocarbons. Here, we report the complete functional annotation of the whole Novosphingobium genome.

Results

PP1Y genome analysis and its comparison with other Sphingomonadal genomes has yielded novel insights into the molecular basis of PP1Y’s phenotypic traits, such as its peculiar ability to encapsulate and degrade the aromatic fraction of fuel oils. In particular, we have identified and dissected several highly specialized metabolic pathways involved in: (i) aromatic hydrocarbon degradation; (ii) resistance to toxic compounds; and (iii) the quorum sensing mechanism.

Conclusions

In summary, the unraveling of the entire PP1Y genome sequence has provided important insight into PP1Y metabolism and, most importantly, has opened new perspectives about the possibility of its manipulation for bioremediation purposes.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-384) contains supplementary material, which is available to authorized users.  相似文献   

12.
13.
Plant microRNAs (miRNAs) are single-stranded 20-22 nt small RNAs (sRNA) that are produced from their own genes. We have developed a de novo genome-wide approach for the computational identification of novel plant miRNAs based on the integration of the complete genome sequence with sRNA libraries. It comprises three modules - the clustering module identifies genomic regions that have two closely-located unidirectional sRNA clusters, the mirplan module explores the secondary structure of the genomic regions, and the duplex module predicts miRNA/miRNA* duplexes. We applied our approach to the Brachypodium genome and publicly available sRNA libraries and predicted 102 miRNAs. Our results extend the list of known miRNAs with 58 novel miRNAs and define the genomic loci of all predicted miRNAs. Because this approach considers specific features of plant miRNAs, it can be employed for the analysis of the genome and sRNA libraries generated for plant species to achieve systematic miRNA discovery.  相似文献   

14.
The single‐humped dromedary (Camelus dromedarius) is the most numerous and widespread of domestic camel species and is a significant source of meat, milk, wool, transportation and sport for millions of people. Dromedaries are particularly well adapted to hot, desert conditions and harbour a variety of biological and physiological characteristics with evolutionary, economic and medical importance. To understand the genetic basis of these traits, an extensive resource of genomic variation is required. In this study, we assembled at 65× coverage, a 2.06 Gb draft genome of a female dromedary whose ancestry can be traced to an isolated population from the Canary Islands. We annotated 21 167 protein‐coding genes and estimated ~33.7% of the genome to be repetitive. A comparison with the recently published draft genome of an Arabian dromedary resulted in 1.91 Gb of aligned sequence with a divergence of 0.095%. An evaluation of our genome with the reference revealed that our assembly contains more error‐free bases (91.2%) and fewer scaffolding errors. We identified ~1.4 million single‐nucleotide polymorphisms with a mean density of 0.71 × 10?3 per base. An analysis of demographic history indicated that changes in effective population size corresponded with recent glacial epochs. Our de novo assembly provides a useful resource of genomic variation for future studies of the camel's adaptations to arid environments and economically important traits. Furthermore, these results suggest that draft genome assemblies constructed with only two differently sized sequencing libraries can be comparable to those sequenced using additional library sizes, highlighting that additional resources might be better placed in technologies alternative to short‐read sequencing to physically anchor scaffolds to genome maps.  相似文献   

15.
16.
Ou J  Meng Q  Li Y  Xiu Y  Du J  Gu W  Wu T  Li W  Ding Z  Wang W 《Fish & shellfish immunology》2012,32(2):345-352
The Chinese mitten crab Eriocheir sinensis is one of the most important freshwater aquaculture crustacean species in China. MicroRNAs (miRNAs) are small non-coding RNAs that are important effectors in the intricate host-pathogen interaction network. To increase the repertoire of miRNAs characterized in crustaceans and to examine the relationship between host miRNA expression and pathogen infection, we used the Illumina/Solexa deep sequencing technology to sequence two small RNA libraries prepared from haemocytes of E. sinensis under normal conditions and during infection with Spiroplasma eriocheiris. The high-throughput sequencing resulted in approximately 30,975,151 and 30,826,277 raw reads corresponding to 12,077,088 and 16,271,545 high-quality mappable reads for the normal and infected haemocyte samples, respectively. Bioinformatic analyses identified 735 unique miRNAs, including 36 that are conserved in crustaceans, 134 that are novel to crabs but are present in other arthropods (PN-type), and 565 that are completely new (PC-type). Two hundred twenty-eight unique miRNAs displayed significant differential expression between the normal and infected haemocyte samples (p < 0.0001). Of these, 133 (58%) were significantly up-regulated and 95 (42%) were significantly down-regulated upon challenge with S. eriocheiris. Real-time quantitative PCR (RT-qPCR) experiments were preformed for 10 miRNAs of the two samples, and agreement was found between the sequencing and RT-qPCR data. To our knowledge, this is the first report of comprehensive identification of E. sinensis miRNAs and of expression analysis of E. sinensis miRNAs after exposure to S. eriocheiris. Many miRNAs were differentially regulated when exposed to the pathogen, and these findings support the hypothesis that certain miRNAs might be essential in host-pathogen interactions. Our results suggest that elucidation of the molecular mechanisms responsible for miRNA regulation of the host’s innate immune system should help with the development of new control strategies to prevent or treat S. eriocheiris infections in crustaceans.  相似文献   

17.
By combining next‐generation sequencing technology (454) and reduced representation library (RRL) construction, the rapid and economical isolation of over 25 000 potential single‐nucleotide polymorphisms (SNP) and >6000 putative microsatellite loci from c. 2% of the genome of the non‐model teleost, Atlantic cod Gadus morhua from the Celtic Sea, south of Ireland, was demonstrated. A small‐scale validation of markers indicated that 80% (11 of 14) of SNP loci and 40% (6 of 15) of the microsatellite loci could be amplified and showed variability. The results clearly show that small‐scale next‐generation sequencing of RRL genomes is an economical and rapid approach for simultaneous SNP and microsatellite discovery that is applicable to any species. The low cost and relatively small investment in time allows for positive exploitation of ascertainment bias to design markers applicable to specific populations and study questions.  相似文献   

18.
Mutations in the GJB2 gene are the most common cause of nonsyndromic autosomal recessive sensorineural hearing loss (HL). A few mutations in GJB2 have also been reported to cause dominant nonsyndromic HL. Here we report a large inbred family including two individuals with nonsyndromic sensorineural hearing loss. A dominant GJB2 mutation, c.551G>A (p.R184Q), was detected in the proband, yet his parents were negative for the mutation. The second affected person had heterozygous c.35delG mutation, which was inherited from his father. Large deletions of the GJB6 gene were not detected in this family. This study highlights the importance of mutation analysis in all affected cases within a pedigree.  相似文献   

19.
Across many tissues and organs, the ability to create an organoid, the smallest functional unit of an organ, in vitro is the key both to tissue engineering and preclinical testing regimes. The hair follicle is an organoid that has been much studied based on its ability to grow quickly and to regenerate after trauma. But hair follicle formation in vitro has been elusive. Replacing hair lost due to pattern baldness or more severe alopecia, including that induced by chemotherapy, remains a significant unmet medical need.By carefully analyzing and recapitulating the growth conditions of hair follicle formation, we recreated human hair follicles in tissue culture that were capable of producing hair. Our microfollicles contained all relevant cell types and their structure and orientation resembled in some ways excised hair follicle specimens from human skin. This finding offers a new window onto hair follicle development. Having a robust culture system for hair follicles is an important step towards improved hair regeneration as well as to an understanding of how marketed drugs or drug candidates, including cancer chemotherapy, will affect this important organ.  相似文献   

20.
One of the classical DNA-binding proteins, bacteriophage lambda Cro, forms a homodimer with a unique fold of alpha-helices and beta-sheets. We have computationally designed an artificial sequence of 60 amino acid residues to stabilize the backbone tertiary structure of the lambda Cro dimer by simulated annealing using knowledge-based structure-sequence compatibility functions. The designed amino acid sequence has 25% identity with that of natural lambda Cro and preserves Phe58, which is important for formation of the stably folded structure of lambda Cro. The designed dimer protein and its monomeric variant, which was redesigned by the insertion of a beta-hairpin sequence at the C-terminal region to prevent dimerization, were synthesized and biochemically characterized to be well folded. The designed protein was monomeric under a wide range of protein concentrations and its solution structure was determined by NMR spectroscopy. The solved structure is similar to that of a monomeric variant of natural lambda Cro with a root-mean-square deviation of the polypeptide backbones at 2.1A and has a well-packed protein core. Thus, our knowledge-based functions provide approximate but essential relationships between amino acid sequences and protein structures, and are useful for finding novel sequences that are foldable into a given target structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号