期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Correcting errors in shotgun sequences 总被引：3，自引：1，他引：3

Tammi MT Arner E Kindlund E Andersson B 《Nucleic acids research》2003,31(15):4663-4672

Sequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. In this paper, a new strategy designed to correct errors in shotgun sequence data using defined nucleotide positions, DNPs, is presented. The method distinguishes single base differences from sequencing errors by analyzing multiple alignments consisting of a read and all its overlaps with other reads. The construction of multiple alignments is performed using a novel pattern matching algorithm, which takes advantage of the symmetry between indices that can be computed for similar words of the same length. This allows for rapid construction of multiple alignments, with no previous pair-wise matching of sequence reads required. Results from a C++ implementation of this method show that up to 99% of sequencing errors can be corrected, while up to 87% of the single base differences remain and up to 80% of the corrected reads contain at most one error. The results also show that the method outperforms the error correction method used in the EULER assembler. The prototype software, MisEd, is freely available from the authors for academic use. 相似文献

2.

Heterochromatic sequences in a Drosophila whole-genome shotgun assembly

下载免费PDF全文

Hoskins RA Smith CD Carlson JW Carvalho AB Halpern A Kaminker JS Kennedy C Mungall CJ Sullivan BA Sutton GG Yasuhara JC Wakimoto BT Myers EW Celniker SE Rubin GM Karpen GH 《Genome biology》2002,3(12):research0085.1-8516

Background

Most eukaryotic genomes include a substantial repeat-rich fraction termed heterochromatin, which is concentrated in centric and telomeric regions. The repetitive nature of heterochromatic sequence makes it difficult to assemble and analyze. To better understand the heterochromatic component of the Drosophila melanogaster genome, we characterized and annotated portions of a whole-genome shotgun sequence assembly.

Results

WGS3, an improved whole-genome shotgun assembly, includes 20.7 Mb of draft-quality sequence not represented in the Release 3 sequence spanning the euchromatin. We annotated this sequence using the methods employed in the re-annotation of the Release 3 euchromatic sequence. This analysis predicted 297 protein-coding genes and six non-protein-coding genes, including known heterochromatic genes, and regions of similarity to known transposable elements. Bacterial artificial chromosome (BAC)-based fluorescence in situ hybridization analysis was used to correlate the genomic sequence with the cytogenetic map in order to refine the genomic definition of the centric heterochromatin; on the basis of our cytological definition, the annotated Release 3 euchromatic sequence extends into the centric heterochromatin on each chromosome arm.

Conclusions

Whole-genome shotgun assembly produced a reliable draft-quality sequence of a significant part of the Drosophila heterochromatin. Annotation of this sequence defined the intron-exon structures of 30 known protein-coding genes and 267 protein-coding gene models. The cytogenetic mapping suggests that an additional 150 predicted genes are located in heterochromatin at the base of the Release 3 euchromatic sequence. Our analysis suggests strategies for improving the sequence and annotation of the heterochromatic portions of the Drosophila and other complex genomes. 相似文献

3.

Correcting errors in synthetic DNA through consensus shuffling 总被引：4，自引：2，他引：4

下载免费PDF全文

Binkowski BF Richmond KE Kaysen J Sussman MR Belshaw PJ 《Nucleic acids research》2005,33(6):e55

Although efficient methods exist to assemble synthetic oligonucleotides into genes and genomes, these suffer from the presence of 1–3 random errors/kb of DNA. Here, we introduce a new method termed consensus shuffling and demonstrate its use to significantly reduce random errors in synthetic DNA. In this method, errors are revealed as mismatches by re-hybridization of the population. The DNA is fragmented, and mismatched fragments are removed upon binding to an immobilized mismatch binding protein (MutS). PCR assembly of the remaining fragments yields a new population of full-length sequences enriched for the consensus sequence of the input population. We show that two iterations of consensus shuffling improved a population of synthetic green fluorescent protein (GFPuv) clones from ~60 to >90% fluorescent, and decreased errors 3.5- to 4.3-fold to final values of ~1 error per 3500 bp. In addition, two iterations of consensus shuffling corrected a population of GFPuv clones where all members were non-functional, to a population where 82% of clones were fluorescent. Consensus shuffling should facilitate the rapid and accurate synthesis of long DNA sequences. 相似文献

4.

Correcting errors in short reads by multiple alignments

Salmela L Schröder J 《Bioinformatics (Oxford, England)》2011,27(11):1455-1461

相似文献

5.

DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions

Erik Arner Martti T Tammi Anh-Nhi Tran Ellen Kindlund Bjorn Andersson 《BMC bioinformatics》2006,7(1):155-11

Background

Many genome projects are left unfinished due to complex, repeated regions. Finishing is the most time consuming step in sequencing and current finishing tools are not designed with particular attention to the repeat problem. 相似文献

6.

Correcting out-of-plane errors in two-dimensional imaging using nonimage-related information

Sih BL Hubbard M Williams KR 《Journal of biomechanics》2001,34(2):257-260

Two-dimensional imaging with a single camera assumes that the motion occurs in a calibrated plane perpendicular to the camera axis. It is well known that kinematic errors result if the object fails to remain in this plane and that if both the distance to the calibration plane from the camera and the distance out-of-plane are known, an analytical correction for the out-of-plane error can be made. Less well appreciated is that out-of-plane distance can frequently be acquired from other, nonimage-related information. In the two examples given, the mediolateral center of pressure coordinate of the foot measured from a force plate and the measured landing point of a shot put throw were used. In both cases, the resulting out-of-plane correction improved the accuracy of the 2-D kinematic data dramatically. These examples also demonstrate that the use of nonimage-related data can increase the accuracy of kinematic data without an increase in the complexity of the experiment. 相似文献

7.

Correcting common errors in identifying cancer-specific serum peptide signatures

Villanueva J Philip J Chaparro CA Li Y Toledo-Crow R DeNoyer L Fleisher M Robbins RJ Tempst P 《Journal of proteome research》2005,4(4):1060-1072

"Molecular signatures" are the qualitative and quantitative patterns of groups of biomolecules (e.g., mRNA, proteins, peptides, or metabolites) in a cell, tissue, biological fluid, or an entire organism. To apply this concept to biomarker discovery, the measurements should ideally be noninvasive and performed in a single read-out. We have therefore developed a peptidomics platform that couples magnetics-based, automated solid-phase extraction of small peptides with a high-resolution MALDI-TOF mass spectrometric readout (Villanueva, J.; Philip, J.; Entenberg, D.; Chaparro, C. A.; Tanwar, M. K.; Holland, E. C.; Tempst, P. Anal. Chem. 2004, 76, 1560-1570). Since hundreds of peptides can be detected in microliter volumes of serum, it allows to search for disease signatures, for instance in the presence of cancer. We have now evaluated, optimized, and standardized a number of clinical and analytical chemistry variables that are major sources of bias; ranging from blood collection and clotting, to serum storage and handling, automated peptide extraction, crystallization, spectral acquisition, and signal processing. In addition, proper alignment of spectra and user-friendly visualization tools are essential for meaningful, certifiable data mining. We introduce a minimal entropy algorithm, "Entropycal", that simplifies alignment and subsequent statistical analysis and increases the percentage of the highly distinguishing spectral information being retained after feature selection of the datasets. Using the improved analytical platform and tools, and a commercial statistics program, we found that sera from thyroid cancer patients can be distinguished from healthy controls based on an array of 98 discriminant peptides. With adequate technological and computational methods in place, and using rigorously standardized conditions, potential sources of patient related bias (e.g., gender, age, genetics, environmental, dietary, and other factors) may now be addressed. 相似文献

8.

ReDiT: Repeat Discrepancy Tagger--a shotgun assembly finishing aid

Tammi MT Arner E Kindlund E Andersson B 《Bioinformatics (Oxford, England)》2004,20(5):803-804

Finishing, i.e. gap closure and editing, is the most time-consuming part of genome sequencing. Repeated sequences together with sequencing errors complicate the assembly and often result in misassemblies that are difficult to correct. Repeat Discrepancy Tagger (ReDiT) is a tool designed to aid in the finishing step. This software processes assembly results produced by any fragment assembly program that outputs ace files. The input sequences are analyzed to determine possible differences between repeated sequences. The output is written as tags in an ace file that can be viewed by, e.g. the Consed sequence editor. AVAILABILITY: The ReDiT program is freely available at http://web.cgb.ki.se/redit 相似文献

9.

Functional interactions between the proline-rich and repeat regions of tau enhance microtubule binding and assembly. 总被引：7，自引：2，他引：7

下载免费PDF全文

B L Goode P E Denis D Panda M J Radeke H P Miller L Wilson S C Feinstein 《Molecular biology of the cell》1997,8(2):353-365

Tau is a neuronal microtubule-associated protein that promotes microtubule assembly, stability, and bundling in axons. Two distinct regions of tau are important for the tau-microtubule interaction, a relatively well-characterized repeat region in the carboxyl terminus (containing either three or four imperfect 18-amino acid repeats separated by 13- or 14-amino acid long inter-repeats) and a more centrally located, relatively poorly characterized proline-rich region. By using amino-terminal truncation analyses of tau, we have localized the microtubule binding activity of the proline-rich region to Lys215-Asn246 and identified a small sequence within this region, 215KKVAVVR221, that exerts a strong influence on microtubule binding and assembly in both three- and four-repeat tau isoforms. Site-directed mutagenesis experiments indicate that these capabilities are derived largely from Lys215/Lys216 and Arg221. In marked contrast to synthetic peptides corresponding to the repeat region, peptides corresponding to Lys215-Asn246 and Lys215-Thr222 alone possess little or no ability to promote microtubule assembly, and the peptide Lys215-Thr222 does not effectively suppress in vitro microtubule dynamics. However, combining the proline-rich region sequences (Lys215-Asn246) with their adjacent repeat region sequences within a single peptide (Lys215-Lys272) enhances microtubule assembly by 10-fold, suggesting intramolecular interactions between the proline-rich and repeat regions. Structural complexity in this region of tau also is suggested by sequential amino-terminal deletions through the proline-rich and repeat regions, which reveal an unusual pattern of loss and gain of function. Thus, these data lead to a model in which efficient microtubule binding and assembly activities by tau require intramolecular interactions between its repeat and proline-rich regions. This model, invoking structural complexity for the microtubule-bound conformation of tau, is fundamentally different from previous models of tau structure and function, which viewed tau as a simple linear array of independently acting tubulin-binding sites. 相似文献

10.

Applications of the double-barreled data in whole-genome shotgun sequence assembly and analysis

HAN Yujun NI Peixiang U Hong YE Jia HU Jianfei CHEN Chen HUANG Xiangang CONG Lijuan Li Guangyuan WANG Jing GU Xiaocheng YU Jun Li Songgang 《中国科学C辑(英文版)》2005,48(3)

Double-barreled (DB) data have been widely used for the assembly of large genomes. Based on the experience of building the whole-genome working draft of Oryza sativa L.ssp. Indica, we present here the prevailing and improved uses of DB data in the assembly procedure and report on novel applications during the following data-mining processes such as acquiring precise insert fragment information of each clone across the genome, and a new kind of Iow-cost whole-genome microarray. With the increasing number of organisms being sequenced,we believe that DB data will play an important role both in other assembly procedures and infuture genomic studies. 相似文献

11.

Correcting for signal saturation errors in the analysis of microarray data 总被引：2，自引：0，他引：2

Hsiao LL Jensen RV Yoshida T Clark KE Blumenstock JE Gullans SR 《BioTechniques》2002,32(2):330-2, 334, 336

相似文献

12.

Correcting developmental errors by apoptosis: lessons from Drosophila JNK signaling

Tatsushi Igaki 《Apoptosis : an international journal on programmed cell death》2009,14(8):1021-1028

Spatio-temporal regulation of the cell death machinery is essential for normal development and homeostasis of multicellular organisms. While the molecular basis for the central cell death machinery driven by caspases is now well documented, its regulatory mechanisms, especially in the context of living animals, remain to be clarified. The c-Jun N-terminal kinase (JNK) pathway is an evolutionarily conserved kinase cascade that regulates the apoptotic machinery. In mammals, JNK signaling has been implicated in stress-induced apoptosis. Drosophila genetics has now provided evidence of a novel role for JNK-mediated cell death signaling in eliminating developmentally aberrant cells from a tissue. The JNK-dependent cell-elimination system is orchestrated by cell-cell communication between normal and aberrant cells and is essential for ensuring developmental robustness, as well as for protecting organisms against fatal abnormalities such as neoplastic development. These processes are mediated by cell competition, morphogenetic apoptosis, and intrinsic tumor suppression. A combinatorial approach using both genetic and live-imaging systems in Drosophila will be extremely powerful to decipher how JNK-mediated apoptosis works within multicellular communities. 相似文献

13.

AMASS: a structured pattern matching approach to shotgun sequence assembly.

S Kim A M Segre 《Journal of computational biology》1999,6(2):163-186

In this paper, we propose an efficient, reliable shotgun sequence assembly algorithm based on a fingerprinting scheme that is robust to both noise and repetitive sequences in the data, two primary roadblocks to effective whole-genome shotgun sequencing. Our algorithm uses exact matches of short patterns randomly selected from fragment data to identify fragment overlaps, construct an overlap map, and deliver a consensus sequence. We show how statistical clues made explicit in our approach can easily be exploited to correctly assemble results even in the presence of extensive repetitive sequences. Our approach is both accurate and exceptionally fast in practice: e.g., we have correctly assembled the whole Mycoplasma genitalium genome (approximately 580 kbp) is roughly 8 minutes of 64MB 200MHz Pentium Pro CPU time from real shotgun data, where most existing algorithms can be expected to run for several hours to a day on the same data. Moreover, experiments with artificially-shotgunned data prepared from real DNA sequences from a wide range of organisms (including human DNA) and containing complex repeating regions demonstrate our algorithm's robustness to input noise and the presence of repetitive sequences. For example, we have correctly assembled a 238-kbp human DNA sequence in less than 3 min of 64-MB 200-MHz Pentium Pro CPU time. 相似文献

14.

Tandem repeat copy-number variation in protein-coding regions of human genes 总被引：1，自引：1，他引：1

下载免费PDF全文

O'Dushlaine CT Edwards RJ Park SD Shields DC 《Genome biology》2005,6(8):R69-12

Background

Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles.

Results

Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms.

Conclusion

Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation. 相似文献

15.

Tandem repeat copy-number variation in protein-coding regions of human genes

下载免费PDF全文

Colm T O'Dushlaine Richard J Edwards Stephen D Park Denis C Shields 《Genome biology》2004,6(8):R69

Background

Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles. 相似文献

16.

Methionine-rich repeat proteins: a family of membrane-associated proteins which contain unusual repeat regions

Jamie L. Weiss Tanweer Ahmed Shukria Khan Jeffrey N. Keen John B.C. Findlay 《生物化学与生物物理学报:生物膜》2005,1668(2):164-174

相似文献

17.

Methionine-rich repeat proteins: a family of membrane-associated proteins which contain unusual repeat regions

Weiss JL Evans NA Ahmed T Wrigley JD Khan S Wright C Keen JN Holzenburg A Findlay JB 《Biochimica et biophysica acta》2005,1668(2):164-174

相似文献

18.

MapLinker: a software tool that aids physical map-linked whole genome shotgun assembly

Xu J Gordon JI 《Bioinformatics (Oxford, England)》2005,21(7):1265-1266

MapLinker is an analysis tool, as well as a browsing interface, that facilitates integration of whole genome sequence assembly with a clone-based physical map. Using the locations of sequence markers on the physical map, MapLinker generates a tentative sequence map of the genome that serves to verify the map and to guide genome-wide finishing. 相似文献

19.

Optimization of sequence alignment for simple sequence repeat regions

Abdulqader Jighly Aladdin Hamwieh Francis C Ogbonnaya 《BMC research notes》2011,4(1):239

相似文献

20.

Binding dynamics of isolated nucleoporin repeat regions to importin-beta

Isgro TA Schulten K 《Structure (London, England : 1993)》2005,13(12):1869-1879

The nuclear pore complex, through the interaction of its proteins with transport receptors, controls the transport of large molecules into and out of the cell's nucleus. There is ample evidence for proteins with FG sequence repeats playing an essential role in this control. Previous studies have elucidated binding spots for FG sequence repeats on the surface of the transport receptor importin-beta by X-ray crystallography and mutational studies. Molecular dynamics simulations have been performed to characterize the interaction of FG sequence repeats with the transport receptor. Observed binding spots have been verified and novel sites discovered, suggesting that importin-beta features many more binding spots than suspected so far. The observed binding spots are in accord with several models of nucleocytoplasmic transport, and the large number of binding spots on importin-beta may be necessary for the pore complex to distinguish between importin-beta and inert proteins, and to allow for its passage through the pore. 相似文献