首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Programs for monitoring biological diversity over time are needed to detect changes that can constitute threats to biological resources. The convention on biological diversity regards effective monitoring as necessary to halt the ongoing erosion of biological variation, and such programs at the ecosystem and species levels are enforced in several countries. However, at the level of genetic biodiversity, little has been accomplished, and monitoring programs need to be developed. We define “conservation genetic monitoring” to imply the systematic, temporal study of genetic variation within particular species/populations with the aim to detect changes that indicate compromise or loss of such diversity. We also (i) identify basic starting points for conservation genetic monitoring, (ii) review the availability of such information using Sweden as an example, (iii) suggest categories of species for pilot monitoring programs, and (iv) identify some scientific and logistic issues that need to be addressed in the context of conservation genetic monitoring. We suggest that such programs are particularly warranted for species subject to large scale enhancement and harvest—operations that are known to potentially alter the genetic composition and reduce the variability of populations.  相似文献   

2.
Summary We describe and illustrate a simple heuristic approach to the Sankoff methods for construction of parsimonious evolutionary trees from nucleotide sequence data. The procedure is intended to permit more valid inferences, particularly from relatively short sequences, concerning relationships among taxa separated for long time intervals. The procedure is based on the freat variability of evolutionary plasticity among sites in the molecules and removes from consideration the more highly variable sites. Editing is accomplished after classifying sites in carefully aligned arrays of sequences. Only “ditypic sites,” i.e., sites observed in only two evolutionary states within the array, are used in making phylogenetic inferences. This strategy makes possible the construction of good approximations to the most parsimonious Steiner strees, by means of efficient programs that require “dense species arrays,” i.e., species sets that differ from each other by relatively small numbers of differences in conservative sites. The technique is illustrated with 5S and 5.8S rRNA sequence data from published catalogs.  相似文献   

3.

Background

Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.

Methodology

We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.

Results

To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification of a specific signal. Code, documentation, and all data for the applications presented here are provided for the community at http://www.cs.gmu.edu/~ashehu/?q=OurTools.  相似文献   

4.
 In a companion paper a new functional architecture was proposed for the basal ganglia based on the premise that these brain structures play a central role in behavioural action selection. The current paper quantitatively describes the properties of the model using analysis and simulation. The decomposition of the basal ganglia into selection and control pathways is supported in several ways. First, several elegant features are exposed – capacity scaling, enhanced selectivity and synergistic dopamine modulation – which might be expected to exist in a well designed action selection mechanism. The discovery of these features also lends support to the computational premise of selection that underpins our model. Second, good matches between model globus pallidus external segment output and globus pallidus internal segment and substantia nigra reticulata area output, and neurophysiological data, have been found which are indicative of common architectural features in the model and biological basal ganglia. Third, the behaviour of the model as a signal selection mechanism has parallels with some kinds of action selection observed in animals under various levels of dopaminergic modulation. Received: 16 July 2000 / Accepted in revised form: 30 October 2000  相似文献   

5.
This article deals with the relationship between vocabulary (total number of distinct oligomers or “words”) and text-length (total number of oligomers or “words”) for a coding DNA sequence (CDS). For natural human languages, Heaps established a mathematical formula known as Heaps' law, which relates vocabulary to text-length. Our analysis shows that Heaps' law fails to model this relationship for CDSs. Here we develop a mathematical model to establish the relationship between the number of type of words (vocabulary) and the number of words sampled (text-length) for CDSs, when non-overlapping nucleotide strings with the same length are treated as words. We use tangent-hyperbolic function, which captures the saturation property of vocabulary. Based on the parameters of the model, we formulate a mathematical equation, known as “equation of word organization”, whose parameters essentially indicate that nucleotide organization of coding sequences are different from one another. We also compare the word organization of CDSs with the random word distribution and conclude that a CDS is neither similar to a natural human language nor to a random one. Moreover, these sequences have their unique nucleotide organization and it is completely structured for specific biological functioning. IM and AS contributed equally to this work.  相似文献   

6.
The recent mathematical formalization of the concepts of matter and extrinsical energy, which are used for the relational representation of biological systems, is employed in the analysis of the important experimental discoveries of Comorosanet al. related to low energy electromagnetic irradiations on enzyme substrates. By means of the present analysis one of the properties inherent to the experimental phenomena is more precisely exposed, and theoretical developments corresponding to “energetical evolutions” in a biological system (Leguizamón, 1976) may now have an experimental basis. Important limitations are introduced for the validity of the commutativity and associativity of cartesian product of sets, when they represent matter and its linked extrinsical energy. In connection with this last aspect, new important knowledge is obtained for the relational mathematical representation of biological systems.  相似文献   

7.
An expressed sequence tag database from immune tissues was used to design the first high-density turbot (Scophthalmus maximus) oligo-microarray with the aim of identifying candidate genes for tolerance to pathogens. Specific oligonucleotides (60 mers) were successfully designed for 2,716 out of 3,482 unique sequences of the database. An Agilent custom oligo-microarray 8 × 15 k (five replicates/gene; eight microarrays/slide) was constructed. The performance of the microarray and the sources of variation along microarray analysis were examined on spleen pools of controls and Aeromonas salmonicida-challenged fish at 3 days postinfection. Only 48 out of 2,716 probes did not show signal of hybridization on the 32 microarrays employed, thus demonstrating the consistency of the bioinformatic applications of our database. An asymmetric hierarchical design was employed to ascertain the noise associated with biological and technical (RNA extraction, labeling, hybridization, slide, and dye bias) factors using 1C and 2C labeling approaches. The high correlation coefficient between replicates at most factors tested demonstrated the high reproducibility of the signal. An analysis of random-effects variance revealed that technical variation was mostly negligible, and biological variation represented the main factor, even using pooled samples. One-color approach performed at least as well as 2C, suggesting their usefulness due to its higher design flexibility and lower cost. A relevant proportion of genes turn out to be differentially labeled depending on fluorophore, which alerts for the likely need of swapping replication in 2C experiments. A set of differentially expressed genes and enriched functions related to immune/defense response were detected at 3 days postchallenging.  相似文献   

8.
The photoaffinity spin-labeled ATP analog, 2-N3-SL-adenosine triphosphate (ATP), was used to covalently modify isolated β-subunits from F1-ATPase of the thermophilic bacterium PS3. Approximately 1.2 mol of the nucleotide analog bound to the isolated subunit in the dark. Irradiation leads to covalent incorporation of the nucleotide into the binding site. ESR spectra of the complex show a signal that is typical for protein-immobilized radicals. Addition of isolated α-subunits to the modified β-subunits results in ESR spectra with two new signals indicative of two distinctly different environments of the spin-label, e.g., two distinctly different conformations of the catalytic sites. The relative ratio of the signals is approx 2∶1 in favor of the more closed conformation. The data show for the first time that when nucleotides are bound to isolated β-subunits, binding of α-subunits induces asymmetry in the catalytic sites even in the absence of the γ-subunit. This work was supported by a grant from the Deutsche Forschungsgemeinschaft to PDV.  相似文献   

9.
Approximately 39 to 49% of the genome of finger millet consists of repetitive DNA sequences which intersperse with 18% of single copy DNA sequences of 1900 nucleotide pairs. Agarose gel filtration and electrophoresis experiments have yielded the sizes of interspersed repeated sequences as 4000–4200 nucleotide pairs and 150–200 nucleotide pairs. Approximately 20% of the repeated DNA sequences (4000–4200 nucleotide pairs) are involved in long range interspersion pattern, while 60% of the repeated DNA sequences (150–200 nucleotide pairs) are involved in short period interspersion pattern. Based on the data available in literature and the results described here on DNA sequence organization in plants, it is proposed that plants with haploid DNA content of more than 2.5 pg exhibit mostly the short period interspersion pattern, while those with haploid DNA content of less than 2.5 pg show diverse patterns of genome organization. NCL Communication No.: 2708  相似文献   

10.
Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.  相似文献   

11.
In recent years, reports of sponge bleaching, disease, and subsequent mortality have increased alarmingly. Population recovery may depend strongly on colonization capabilities of the affected species. The giant barrel sponge Xestospongia muta is a dominant reef constituent in the Caribbean. However, little is known about its population structure and gene flow. The 5′-end fragment of the mitochondrial gene cytochrome oxidase subunit I is often used to address these kinds of questions, but it presents very low intraspecific nucleotide variability in sponges. In this study, the usefulness of the I3-M11 partition of COI to determine the genetic structure of X. muta was tested for seven populations from Florida, the Bahamas and Belize. A total of 116 sequences of 544 bp were obtained for the I3-M11 partition corresponding to four haplotypes. In order to make a comparison with the 5′-end partition, 10 sequences per haplotype were analyzed for this fragment. The 40 resulting sequences were of 569 bp and corresponded to two haplotypes. The nucleotide diversity of the I3-M11 partition (π = 0.00386) was higher than that of the 5′-end partition (π = 0.00058), indicating better resolution at the intraspecific level. Sponges with the most divergent external morphologies (smooth vs. digitate surface) had different haplotypes, while those with the most common external morphology (rough surface) presented a mixture of haplotypes. Pairwise tests for genetic differentiation among geographic locations based on F ST values showed significant genetic divergence between most populations, but this genetic differentiation was not due to isolation by distance. While limited larval dispersal may have led to differentiation among some of the populations, the patterns of genetic structure appear to be most strongly related to patterns of ocean currents. Therefore, hydrological features may play a major role in sponge colonization and need to be considered in future plans for management and conservation of these important components of coral reef ecosystems. Communicated by Biology Editor Dr Ruth Gates  相似文献   

12.
We have improved an existing clone database management systemwritten in FORTRAN 77 and adapted it to our software environment.Improvements are that the database can be interrogated for anytype of information, not just keywords. Also, recombinant DNAconstructions can be represented in a simplified ‘shorthand’,whereafter a program assembles the full nucleotide sequencefrom the contributing fragments, which may be obtained fromnucleotide sequence databases. Another improvement is the replacementof the database manager by programs, running in batch to maintainthe databank and verify its consistency automatically. Finally,graphic extensions are written in Graphical Kernel System, todraw linear and circular restriction maps of recombinants. Besidesrestriction sites, recombinant features can be presented fromthe feature lines of recombinant database entries, or from thefeature tables of nucleotide databases. The clone database managementsystem is fully integrated into the sequence analysis softwarepackage from the Pasteur Institute, Paris, and is made accessiblethrough the same menu. As a result, recombinant DNA sequencescan directly be analysed by the sequence analysis programs. Received on March 17, 1986; accepted on June 16, 1986  相似文献   

13.
GOBASE: the organelle genome database   总被引:3,自引:1,他引:2  
  相似文献   

14.
15.
Abstract The nucleotide sequences of 16 newly reported and 8 previously reported actin-encoding macronuclear DNA molecules in spirotrichs have been compared. As described for the eight previously reported molecules, the first 50 bases (noncoding) inside the telomere at both 5′ strands in additional actin molecules are purine-rich. This anomalous base composition might serve as a signal to identify macronuclear molecules in micronuclear DNA during development. The 50-base segment upstream of the ATG in the 5′ leaders of the actin molecules contains extensive, conserved sequence motifs that are possibly promoter elements. The 3′ noncoding trailers contain virtually no conserved sequence motifs. With one exception, the 3′ trailers contain a second stop codon (TGA) 36 bases on average downstream of the primary stop codon. Excluding Moneuplotes crassus, amino acid identities in actin I range from 78 to 100%, with variations distributed nonrandomly along the sequence. Phylogenetic trees based on the actin nucleotide sequences of 22 spirotrichs define the evolutionary relationships of their actin-encoding molecules. The actin phylogeny, while well supported by posterior probabilities, does not always coincide with the phylogeny defined in rDNA analyses or classical taxonomic classifications.  相似文献   

16.
Fluorescence-based sequencing is playing an increasingly important role in efforts to identify DNA polymorphisms and mutations of biological and medical interest. The application of this technology in generating the reference sequence of simple and complex genomes is also driving the development of new computer programs to automate base calling (Phred), sequence assembly (Phrap) and sequence assembly editing (Consed) in high throughput settings. In this report we describe a new computer program known as PolyPhred that automatically detects the presence of heterozygous single nucleotide substitutions by fluorescencebased sequencing of PCR products. Its operations are integrated with the use of the Phred, Phrap and Consed programs and together these tools generate a high throughput system for detecting DNA polymorphisms and mutations by large scale fluorescence-based resequencing. Analysis of sequences containing known DNA variants demonstrates that the accuracy of PolyPhred with single pass data is >99% when the sequences are generated with fluorescent dye-labeled primers and approximately 90% for those prepared with dye-labeled terminators.  相似文献   

17.

Background  

Within the emerging field of synthetic biology, engineering paradigms have recently been used to design biological systems with novel functionalities. One of the essential challenges hampering the construction of such systems is the need to precisely optimize protein expression levels for robust operation. However, it is difficult to design mRNA sequences for expression at targeted protein levels, since even a few nucleotide modifications around the start codon may alter translational efficiency and dramatically (up to 250-fold) change protein expression. Previous studies have used ad hoc approaches (e.g., random mutagenesis) to obtain the desired translational efficiencies for mRNA sequences. Hence, the development of a mathematical methodology capable of estimating translational efficiency would greatly facilitate the future design of mRNA sequences aimed at yielding desired protein expression levels.  相似文献   

18.
Since it was first described 25 years ago, phosphorylation has come to be recognized as a widespread and dynamic post-translational modification of myelin protein. In this review, the phosphorylation characteristics of myelin basic protein, protein zero (P0), myelin-associated glycoprotein and 2′3′ cyclic nucleotide 3′-phosphodiesterase are summarized. Emphasis is placed on recent advances in our knowledge concerning the protein kinases involved and the sites, of phosphorylation in the amino acid sequences, where known. The possible roles of myelin protein phosphorylation in modulating myelin structure, the process of myelin assembly and mediation of signal transduction events are discussed. Special issue dedicated to Dr. Marion E. Smith.  相似文献   

19.
The main idea of S-curve diagram is to assign different angle values (from 0° to 180°) to different nucleotide acid residues or to different protein amino acids, and then according to cos α j and sin α j , the values are accumulated to construct an S-curve diagram, which is in strict one-to-one correspondence with the biological sequence. In addition, the S-curve diagram proves to be without the degeneracy phenomenon, so that both the degeneracy problem represented by diagrams and the problem of visualization for biological sequence data are solved. Meanwhile, a new approach to differentiate the similarity of biological sequences—the degree of similarity—is put forward on the basis of the S-curve diagram. To put it in detail, the least square approach is first adopted to obtain a straight line equation according to the S-curve diagram, then according to the distance formula of the point to the straight line, the average ratio of square sum for the distance between the S-curve and the straight line is calculated, and finally, the similarity of the biological sequences is presented by the new standard—the degree of similarity. As is shown by the experimental results, the S-curve diagram can better represent biological sequences (such as protein’s) within Cartesian coordinate system, and the mutation point of biological sequence. Thus, it turns out that the new standard—the degree of similarity is of obviously great advantage.  相似文献   

20.
Comparison of complete genome sequences for different variants of hepatitis C virus (HCV) reveals several different constraints on sequence change. Synonymous changes are suppressed in coding regions at both 5′ and 3′ ends of the genome. No evidence was found for the existence of alternative reading frames or for a lower mutation frequency in these regions. Instead, suppression may be due to constraints imposed by RNA secondary structures identified within the core and NS5b genes. Nonsynonymous substitutions are less frequent than synonymous ones except in the hypervariable region of E2 and, to a lesser extent, in E1, NS2, and NS5b. Transitions are more frequent than transversions, particularly at the third position of codons where the bias is 16:1. In addition, nucleotide substitutions may not occur symmetrically since there is a bias toward G or C at the third position of codons, while T ↔ C transitions were twice as frequent as A ↔ G transitions. These different biases do not affect the phylogenetic analysis of HCV variants but need to be taken into account in interpreting sequence change in longitudinal studies. Received: 9 September 1996 / Accepted: 20 April 1997  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号