首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
复杂基因组测序技术研究进展   总被引:1,自引:0,他引:1  
复杂基因组指的是无法使用常规测序和组装手段直接解析的一类基因组,通常指包含高比例重复序列、高杂合度、极端GC含量、存在难消除异源DNA污染的基因组。为了解决复杂基因组的测序和组装问题,需要分别从基因组测序实验方法、测序技术平台、组装算法与策略3个方面进行深入研究。本文详细介绍了复杂基因组测序组装相关的现有技术与方法,并结合复杂基因组经典实例介绍了复杂基因组测序的技术解决途径和发展历程,可为制订合适的复杂基因组测序策略提供参考。  相似文献   

2.
Hierarchical shotgun sequencing remains the method of choice for assembling high‐quality reference sequences of complex plant genomes. The efficient exploitation of current high‐throughput technologies and powerful computational facilities for large‐insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole‐genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high‐quality assemblies of a large number of clones to assemble map‐based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path.  相似文献   

3.
The construction of large DNA molecules that encode pathways, biological machinery, and entire genomes has been limited to the reproduction of natural sequences. However, now that robust methods for assembling hundreds of DNA fragments into constructs > 20 kb are readily available, optimization of large genetic elements for metabolic engineering purposes is becoming more routine. Here, various DNA assembly methodologies are reviewed and some of their potential applications are discussed. We tested the potential of DNA assembly to install rational changes in complex biosynthetic pathways, their potential for generating complex libraries, and consider how various strategies are applicable to metabolic engineering.  相似文献   

4.
We develop and test machine learning methods for the prediction of coarse 3D protein structures, where a protein is represented by a set of rigid rods associated with its secondary structure elements (alpha-helices and beta-strands). First, we employ cascades of recursive neural networks derived from graphical models to predict the relative placements of segments. These are represented as discretized distance and angle maps, and the discretization levels are statistically inferred from a large and curated dataset. Coarse 3D folds of proteins are then assembled starting from topological information predicted in the first stage. Reconstruction is carried out by minimizing a cost function taking the form of a purely geometrical potential. We show that the proposed architecture outperforms simpler alternatives and can accurately predict binary and multiclass coarse maps. The reconstruction procedure proves to be fast and often leads to topologically correct coarse structures that could be exploited as a starting point for various protein modeling strategies. The fully integrated rod-shaped protein builder (predictor of contact maps + reconstruction algorithm) can be accessed at http://distill.ucd.ie/.  相似文献   

5.
SUMMARY: AnnBuilder is an R package for assembling genomic annotation data. The system currently provides parsers to process annotation data from LocusLink, Gene Ontology Consortium, and Human Gene Project and can be extended to new data sources via user defined parsers. AnnBuilder differs from other existing systems in that it provides users with unlimited ability to assemble data from user selected sources. The products of AnnBuilder are files in XML format that can be easily used by different systems. AVAILABILITY: (http://www.bioconductor.org). Open source.  相似文献   

6.
7.
Picky: oligo microarray design for large genomes   总被引:4,自引:0,他引:4  
MOTIVATION: Many large genomes are getting sequenced nowadays. Biologists are eager to start microarray analysis taking advantage of all known genes of a species, but existing microarray design tools were very inefficient for large genomes. Also, many existing tools operate in a batch mode that does not assure best designs. RESULTS: Picky is an efficient oligo microarray design tool for large genomes. Picky integrates novel computer science techniques and the best known nearest-neighbor parameters to quickly identify sequence similarities and estimate their hybridization properties. Oligos designed by Picky are computationally optimized to guarantee the best specificity, sensitivity and uniformity under the given design constrains. Picky can be used to design arrays for whole genomes, or for only a subset of genes. The latter can still be screened against a whole genome to attain the same quality as a whole genome array, thereby permitting low budget, pathway-specific experiments to be conducted with large genomes. Picky is the fastest oligo array design tool currently available to the public, requiring only a few hours to process large gene sets from rice, maize or human.  相似文献   

8.
Versatile and open software for comparing large genomes   总被引:1,自引:0,他引:1       下载免费PDF全文
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at .  相似文献   

9.
10.
The genome of the nematode C. elegans is peppered with novel genes belonging to a superfamily whose members function in cellular cholesterol homeostasis (Niemann-Pick C1) and Hedgehog signal transduction (Patched) and biogenesis (Dispatched). In this issue of Developmental Cell and an analysis of a pair of Patched- and Dispatched-related proteins in C. elegans extends the superfamily's repertoire to include the formation of tubular organs.  相似文献   

11.
The whole-genome shotgun (WGS) assembly technique has been remarkably successful in efforts to determine the sequence of bases that make up a genome. WGS assembly begins with a large collection of short fragments that have been selected at random from a genome. The sequence of bases at each end of the fragment is determined, albeit imprecisely, resulting in a sequence of letters called a "read." Each letter in a read is assigned a quality value, which estimates the probability that a sequencing error occurred in determining that letter. Reads are typically cut off after about 500 letters, where sequencing errors become endemic. We report on a set of procedures that (1) corrects most of the sequencing errors, (2) changes quality values accordingly, and (3) produces a list of "overlaps," i.e., pairs of reads that plausibly come from overlapping parts of the genome. Our procedures, which we call collectively the "UMD Overlapper," can be run iteratively and as a preprocessor for other assemblers. We tested the UMD Overlapper on Celera's Drosophila reads. When we replaced Celera's overlap procedures in the front end of their assembler, it was able to produce a significantly improved genome.  相似文献   

12.
We present a graph-based method for the analysis of repeat families in a repeat library. We build a repeat domain graph that decomposes a repeat library into repeat domains, short subsequences shared by multiple repeat families, and reveals the mosaic structure of repeat families. Our method recovers documented mosaic repeat structures and suggests additional putative ones. Our method is useful for elucidating the evolutionary history of repeats and annotating de novo generated repeat libraries.  相似文献   

13.

Background

The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, Glossina pallidipes, we found the need for tools to search quickly a set of reads for near exact text matches.

Methods

A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of de novo assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads.

Results

Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension.

Conclusion

The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.  相似文献   

14.
Here, we present an improved amplified fragment length polymorphism (AFLP) protocol using restriction enzymes (AscI and SbfI) that recognize 8‐base pair sequences to provide alternative optimization suitable for species with a genome size over 70 Gb. This cost‐effective optimization massively reduces the number of amplified fragments using only +3 selective bases per primer during selective amplification. We demonstrate the effects of the number of fragments and genome size on the appearance of nonidentical comigrating fragments (size homoplasy), which has a negative impact on the informative value of AFLP genotypes. We also present various reaction conditions and their effects on reproducibility and the band intensity of the extremely large genome of Viscum album. The reproducibility of this octo‐cutter protocol was calculated using several species with genome sizes ranging from 1 Gb (Carex panicea) to 76 Gb (V. album). The improved protocol also succeeded in detecting high intraspecific variability in species with large genomes (V. album, Galanthus nivalis and Pinus pumila).  相似文献   

15.
Conserved RNA secondary structures in Picornaviridae genomes   总被引:7,自引:2,他引:5       下载免费PDF全文
The family Picornaviridae contains important pathogens including, for example, hepatitis A virus and foot-and-mouth disease virus. The genome of these viruses is a single messenger-active (+)-RNA of 7200–8500 nt. Besides coding for the viral proteins, it also contains functionally important RNA secondary structures, among them an internal ribosomal entry site (IRES) region towards the 5′-end. This contribution provides a comprehensive computational survey of the complete genomic RNAs and a detailed comparative analysis of the conserved structural elements in seven of the currently nine genera in the family Picornaviridae. Compared with previous studies we find: (i) that only smaller sections of the IRES region than previously reported are conserved at single base-pair resolution and (ii) that there is a number of significant structural elements in the coding region. Furthermore, we identify potential cis-acting replication elements in four genera where this feature has not been reported so far.  相似文献   

16.
17.
Site-specific or target-specific mutagenesis of viral DNA genomes, using a selectable marker system is a powerful tool for the analysis of the function of specific regions of large DNA genomes. Through these techniques the construction of vectors capable of delivering vaccines for the prevention of infectious disease in humans and animals is possible.  相似文献   

18.
MOTIVATION: Searching genomes for non-coding RNAs (ncRNAs) by their secondary structure has become an important goal for bioinformatics. For pseudoknot-free structures, ncRNA search can be effective based on the covariance model and CYK-type dynamic programming. However, the computational difficulty in aligning an RNA sequence to a pseudoknot has prohibited fast and accurate search of arbitrary RNA structures. Our previous work introduced a graph model for RNA pseudoknots and proposed to solve the structure-sequence alignment by graph optimization. Given k candidate regions in the target sequence for each of the n stems in the structure, we could compute a best alignment in time O(k(t)n) based upon a tree width t decomposition of the structure graph. However, to implement this method to programs that can routinely perform fast yet accurate RNA pseudoknot searches, we need novel heuristics to ensure that, without degrading the accuracy, only a small number of stem candidates need to be examined and a tree decomposition of a small tree width can always be found for the structure graph. RESULTS: The current work builds on the previous one with newly developed preprocessing algorithms to reduce the values for parameters k and t and to implement the search method into a practical program, called RNATOPS, for RNA pseudoknot search. In particular, we introduce techniques, based on probabilistic profiling and distance penalty functions, which can identify for every stem just a small number k (e.g. k 相似文献   

19.
Predicting failure rate of PCR in large genomes   总被引:1,自引:0,他引:1  
We have developed statistical models for estimating the failure rate of polymerase chain reaction (PCR) primers using 236 primer sequence-related factors. The model involved 1314 primer pairs and is based on more than 80 000 PCR experiments. We found that the most important factor in determining PCR failure is the number of predicted primer-binding sites in the genomic DNA. We also compared different ways of defining primer-binding sites (fixed length word versus thermodynamic model; exact match versus matches including 1–2 mismatches). We found that the most efficient prediction of PCR failure rates can be achieved using a combination of four factors (number of primer-binding sites counted in different ways plus GC% of the primer) combined into single statistical model GM1. According to our estimations from experimental data, the GM1 model can reduce the average failure rate of PCR primers nearly 3-fold (from 17% to 6%). The GM1 model can easily be implemented in software to premask genome sequences for potentially failing PCR primers, thus improving large-scale PCR-primer design.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号