首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A novel algorithm, GS-Aligner, that uses bit-level operations was developed for aligning genomic sequences. GS-Aligner is efficient in terms of both time and space for aligning two very long genomic sequences and for identifying genomic rearrangements such as translocations and inversions. It is suitable for aligning fairly divergent sequences such as human and mouse genomic sequences. It consists of several efficient components: bit-level coding, search for matching segments between the two sequences as alignment anchors, longest increasing subsequence (LIS), and optimal local alignment. Efforts have been made to reduce the execution time of the program to make it truly practical for aligning very long sequences. Empirical tests suggest that for relatively divergent sequences such as sequences from different mammalian orders or from a mammal and a nonmammalian vertebrate GS-Aligner performs better than existing methods. The program and data can be downloaded from http://pondside.uchicago.edu/~lilab/ and http://webcollab.iis.sinica.edu.tw/~biocom.  相似文献   

2.
3.
Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.  相似文献   

4.
In this paper, we propose a simple method to analyze the similarity of biological sequences. By taking the average contents of biological sequences and their information entropies as the variables, the fuzzy method is used to cluster them. From the results of application, it finds that the method is relatively easy and rapid. Unlike other methods such as the graphical representation methods, which is usually very complex to compute some invariants of matric derived from graphical representation, our method pays more attention to the information of biological sequences themselves. Especially with the help of the software (SPSS), it seems to be very convenient. Therefore, it may be used to study the new biological sequences such as their evolution relationship and structures.  相似文献   

5.
PICDI is a very simple program designed to calculate the Intrinsic Codon Deviation Index (ICDI). The program is available in Macintosh as well a PC format. Requirements for correct input of the sequences have been kept to a minimum and the analysis of sequences up to 2000 codons is very quick. The ICDI is very useful for estimation of codon bias of genes from species in which optimal codons are not known. The availability of a computer program for its calculation will increase its usefulness in the fields of Molecular Biology and Biotechnology.  相似文献   

6.
Howard D  Benson K 《Bio Systems》2003,72(1-2):19-27
This paper develops an evolutionary method that learns inductively to recognize the makeup and the position of very short consensus sequences, cis-acting sites, which are a typical feature of promoters in genomes. The method combines a Finite State Automata (FSA) and Genetic Programming (GP) to discover candidate promoter sequences in primary sequence data. An experiment measures the success of the method for promoter prediction in the human genome. This class of method can take large base pair jumps and this may enable it to process very long genomic sequences to discover gene specific cis-acting sites, and genes which are regulated together.  相似文献   

7.
Yves Quentin 《Genetica》1994,93(1-3):203-215
The past few years have brought new insight into the evolution of families of retroposons. These are composed of a very small number of master sequences able to duplicate, and a large majority of copies that are inactive for retroposition. During the course of time, successive replacements of master sequences have produced waves of amplification that are recognizable as subfamilies. In the Alu and the B1 families, one can distinguish two evolutionary periods. The first involves only monomeric elements that are now extinguished (fossil elements) and is characterized by deep remodeling of the sequences. This period ends, in primates, with the fusion of a free left and a free right Alu monomer, producing the first modern Alu dimeric element; in rodents it ends with a tandem duplication of 29 bp to create the first modern B1 element. The second period is characterized by a great stability of the master sequences. The observed turn-over of master sequences is still an enigma. However, analysis of the contemporary master sequences and of the oldest master sequences provide some clues. Here, we review the very first stages of the appearance of the Alu and the B1 families in mammalian genomes.  相似文献   

8.
Hood ME 《Genetica》2005,124(1):1-10
The small genomes of fungi are expected to have little repetitive content other than rDNA genes. Moreover, among asexual or highly selfing lineages, the diversity of repetitive elements is also expected to be very low. However, in the automictic fungus Microbotryum violaceum, a very large proportion of random DNA fragments from the autosomes and the fungal sex chromosomes are repetitive in nature, either as retrotransposon or helicase sequences. Among the retrotransposon sequences, examples were found from each major kind of elements, including copia, gypsy, and non-LTR sequences. The most numerous were copia-like elements, which are believed to be rare in fungi, particularly among basidiomycetes. The many helicase sequences appear to belong to the recently discovered Helitron type of transposable elements. Also, sequences that could not be identified as a known type of gene were also very repetitive within the database of random fragments from M. violaceum. The differentiated pair of fungal sex chromosomes and suppression of recombination may be the major forces determining the highly repetitive content in the small genome of M. violaceum.  相似文献   

9.
It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences. Phylogenetic analyses based on protein sequences are generally considered to be more reliable than those derived from the corresponding DNA sequences because it is believed that the use of encoded protein sequences circumvents the problems caused by nucleotide compositional biases in the DNA sequences. There exists, however, a correlation between AT/GC bias at the nucleotide level and content of AT- and GC-rich codons and their corresponding amino acids. Consequently, protein sequences can also be affected secondarily by nucleotide compositional bias. Here, we report that DNA bias not only may affect phylogenetic analysis based on DNA sequences, but also drives a protein bias which may affect analyses based on protein sequences. We present a striking example where common phylogenetic tools fail to recover the correct tree from complete animal mitochondrial protein-coding sequences. The data set is very extensive, containing several thousand sites per sequence, and the incorrect phylogenetic trees are statistically very well supported. Additionally, neither the use of the LogDet/paralinear transform nor removal of positions in the protein alignment with AT- or GC-rich codons allowed recovery of the correct tree. Two taxa with a large compositional bias continually group together in these analyses, despite a lack of close biological relatedness. We conclude that even protein-based phylogenetic trees may be misleading, and we advise caution in phylogenetic reconstruction using protein sequences, especially those that are compositionally biased. Received: 19 February 1998 / Accepted: 28 August 1998  相似文献   

10.
Structure of the chicken apo very low density lipoprotein II gene   总被引:4,自引:0,他引:4  
We describe two cloned genomic DNA fragments, both bearing the entire apo very low density lipoprotein II gene. Electron microscopy and restriction enzyme mapping showed that this gene is split into at least four coding sequences by three or more intervening sequences. A very short exon at the 5'-end of the gene is separated by a 1.5-kilobase intron from the second exon, which codes for the AUG initiation codon of the mRNA.  相似文献   

11.
12.
The compositional distributions of large (main-band) DNA fragments from eight birds belonging to eight different orders (including both paleognathous and neognathous species) are very broad and extremely close to each other. These findings, which are paralleled by the compositional similarity of homologous coding sequences and their codon positions, support the idea that birds are a monophyletic group.The compositional distribution of third-codon positions of genes from chicken, the only avian species for which a relatively large number of coding sequences is known, is very broad and bimodal, the minor GC-richer peak reaching 100% GC. The very high compositional heterogeneity of avian genomes is accompanied (as in the case of mammalian genomes) by a very high speciation rate compared to cold-blooded vertebrates which are characterized by genomes that are much less heterogeneous. The higher GC levels attained by avian compared to mammalian genomes might be correlated with the higher body temperature (41–43°C) of birds compared to mammals (37°C).A comparison of GC levels of coding sequences and codon positions from man and chicken revealed very close average GC levels and standard deviations. Homologous coding sequences and codon positions from man and chicken showed a surprisingly high degree of compositional similarity which was, however, higher for GC-poor than for GC-rich sequences. This indicates that GC-poor isochores of warm-blooded vertebrates reflect the composition of the isochores of the genome of the common reptilian ancestor of mammals and birds, which underwent only a small compositional change at the transition from cold- to warm-blooded vertebrates. In contrast, the GC-rich isochores of birds and mammals are the result of large compositional changes at the same evolutionary transition, where were in part different in the two classes of warm-blooded vertebrates.Correspondence to: G. Bernaadi  相似文献   

13.

Background  

Non-coding DNA sequences comprise a very large proportion of the total genomic content of mammals, most other vertebrates, many invertebrates, and most plants. Unraveling the functional significance of non-coding DNA depends on how well we are able to align non-coding DNA sequences. However, the alignment of non-coding DNA sequences is more difficult than aligning protein-coding sequences.  相似文献   

14.
There are at least nine, and probably ten, ribosomal RNA gene sets in the genome of Bacillus subtilis. Each gene set contains sequences complementary to 16S, 23S and 5S rRNAs. We have determined the nucleotide sequences of two DNA fragments which each contain 165 base pairs of the 16S rRNA gene, 191 base pairs of the 23S rRNA gene, and the spacer region between them. The smaller space region is 164 base pairs in length and the larger one includes an additional 180 base pairs. The extra nucleotides could be transcribed in tRNAIIe and tRNA Ala sequences. Evidence is also presented for the existence of a second spacer region which also contains tRNAIIe and tRNA Ala sequences. No other tRNAs appear to be encoded in the spacer regions between the 16S and 23S rRNA genes. Whereas the nucleotide sequences corresponding to the 16S rRNA, 23S rRNA and the spacer tRNAs are very similar to those of E. coli, the sequences between these structural genes are very different.  相似文献   

15.
A database of the structural properties of all 32,896 unique DNA octamer sequences has been calculated, including information on stability, the minimum energy conformation and flexibility. The contents of the database have been analysed using a variety of Euclidean distance similarity measures. A global comparison of sequence similarity with structural similarity shows that the structural properties of DNA are much less diverse than the sequences, and that DNA sequence space is larger and more diverse than DNA structure space. Thus, there are many very different sequences that have very similar structural properties, and this may be useful for identifying DNA motifs that have similar functional properties that are not apparent from the sequences. On the other hand, there are also small numbers of almost identical sequences that have very different structural properties, and these could give rise to false-positives in methods used to identify function based on sequence alignment. A simple validation test demonstrates that structural similarity can differentiate between promoter and non-promoter DNA. Combining structural and sequence similarity improves promoter recall beyond that possible using either similarity measure alone, demonstrating that there is indeed information available in the structure of double-helical DNA that is not readily apparent from the sequence.  相似文献   

16.
The genomes of barley and wheat, two of the world's most important crops, are very large and complex due to their high content of repetitive DNA. In order to obtain a whole-genome sequence sample, we performed two runs of 454 (GS20) sequencing on genomic DNA of barley cv. Morex, which yielded approximately 1% of a haploid genome equivalent. Almost 60% of the sequences comprised known transposable element (TE) families, and another 9% represented novel repetitive sequences. We also discovered high amounts of low-complexity DNA and non-genic low-copy DNA. We identified almost 2300 protein coding gene sequences and more than 660 putative conserved non-coding sequences. Comparison of the 454 reads with previously published genomic sequences suggested that TE families are distributed unequally along chromosomes. This was confirmed by in situ hybridizations of selected TEs. A comparison of these data for the barley genome with a large sample of publicly available wheat sequences showed that several TE families that are highly abundant in wheat are absent from the barley genome. This finding implies that the TE composition of their genomes differs dramatically, despite their very similar genome size and their close phylogenetic relationship.  相似文献   

17.
The RNA genome of the Moloney isolate of murine sarcoma virus (M-MSV) consists of two parts--a sarcoma-specific region with no homology to known leukemia viral RNAs, and a shared region present also in Moloney murine leukemia virus RNA. Complementary DNA was isolated which was specific for each part of the M-MSV genome. The DNA of a number of mammalian species was examined for the presence of nucleotide sequences homologous with the two M-MSV regions. Both sets of viral sequences had homologous nucleotide sequences present in normal mouse cellular DNA. MSV-specific sequences found in mouse cellular DNA closely matched those nucleotide sequences found in M-MSV as seen by comparisons of thermal denaturation profiles. In all normal mouse cells tested, the cellular set of M-MSV-specific nucleotide sequences was present in DNA as one to a few copies per cell. The rate of base substitution of M-MSV nucleotide sequences was compared with the rate of evolution of both unique sequences and the hemoglobin gene of various species. Conservation of MSV-specific nucleotide sequences among species was similar to that of mouse globin gene(s) and greater than that of average unique cellular sequences. In contrast, cellular nucleotide sequences that are homologous to the M-MSV-murine leukemia virus "common" nucleotide region were present in multiple copies in mouse cells and were less well matched, as seen by reduced melting profiles of the hybrids. The cellular common nucleotide sequences diverged very rapidly during evolution, with a base substitution rate similar to that reported for some primate and avian endogenous virogenes. The observation that two sets of covalently linked viral sequences evolved at very different rates suggests that the origin of M-MSV may be different from endogenous helper viruses and that cellular sequences homologous to MSV-specific nucleotide sequences may be important to survival.  相似文献   

18.

Background  

When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences.  相似文献   

19.
Summary Continued insertion into the genome of functionalAlu sequences is expected to compensate for the functional eclipse of older sequences attributable to structural adulteration and can be presumed to establish a renewable store of functional sequences at a relatively elevated numerical level. This store of functional sequences could be maintained at almost no selective cost. A strategy of maintaining function in multiple sequence copies with selection limited to a very few master (source) sequences may be resorted to also by other types of DNA sequences that are generated repeatedly during evolution and that are spread over many sectors of the genome.  相似文献   

20.
A 1700 nucleotide cDNA clone for a bean (Phaseolus vulgaris cv Red Kidney) abscission cellulase (endo-(1,4)-β-d-glucanase) has been identified and sequenced. This cDNA clone contains a 1485 nucleotide open reading frame which includes coding sequences for a putative signal peptide and mature protein. The nucleotide and deduced amino acid sequences for the bean abscission cellulase are compared to the previously reported sequences of an avocado fruit ripening cellulase. Optimal alignment of these sequences shows 64% and 50% identically matched nucleotides and amino acids, respectively. Analysis of the deduced amino acid sequences for the mature bean and avocado cellulases indicates that these two proteins share similar molecular weights, position of cysteine residues, and hydropathic character, but have very different isoelectric points and glycosylation. Genomic blot data suggest that the avocado fruit cellulase belongs to a small gene family, whereas the bean abscission cellulase appears to be encoded by a single gene or a few very closely related genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号