首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Noise in database searches resulting from random sequence similarities increases as the databases expand rapidly. The noise problems are not a technical shortcoming of the database search programs, but a logical consequence of the idea of homology searches. The effect can be observed in simulation experiments. RESULTS: We have investigated noise levels in pairwise alignment based database searches. The noise levels of 38 releases of the SwissProt database, display perfect logarithmic growth with the total length of the databases. Clustering of real biological sequences reduces noise levels, but the effect is marginal.  相似文献   

2.
MOTIVATION: Protein sequence comparison methods are routinely used to infer the intricate network of evolutionary relationships found within the rapidly growing library of protein sequences, and thereby to predict the structure and function of uncharacterized proteins. In the present study, we detail an improved statistical benchmark of pairwise protein sequence comparison algorithms. We use bootstrap resampling techniques to determine standard statistical errors and to estimate the confidence of our conclusions. We show that the underlying structure within benchmark databases causes Efron's standard, non-parametric bootstrap to be biased. Consequently, the standard bootstrap underpredicts average performance when used in the context of evaluating sequence comparison methods. We have developed, as an alternative, an unbiased statistical evaluation based on the Bayesian bootstrap, a resampling method operationally similar to the standard bootstrap. RESULTS: We apply our analysis to the comparative study of amino acid substitution matrix families and find that using modern matrices results in a small, but statistically significant improvement in remote homology detection compared with the classic PAM and BLOSUM matrices. AVAILABILITY: The sequence sets and code for performing these analyses are available from http://compbio.berkeley.edu/. Contact: brenner@compbio.berkeley.edu.  相似文献   

3.
The Smith-Waterman (SW) algorithm is a typical technique for local sequence alignment in computational biology. However, the SW algorithm does not consider the local behaviours of the amino acids, which may result in loss of some useful information. Inspired by the success of Markov Edit Distance (MED) method, this paper therefore proposes a novel Markov pairwise protein sequence alignment (MPPSA) method that takes the local context dependencies into consideration. The numerical results have shown its superiority to the SW for pairwise protein sequence comparison.  相似文献   

4.
Satellite DNA sequences are known to be highly variable and to have been subjected to concerted evolution that homogenizes member sequences within species. We have analyzed the mode of evolution of satellite DNA sequences in four fishes from the genusDiplodus by calculating the nucleotide frequency of the sequence array and the phylogenetic distances between member sequences. Calculation of nucleotide frequency and pairwise sequence comparison enabled us to characterize the divergence among member sequences in this satellite DNA family. The results suggest that the evolutionary rate of satellite DNA inD. bellottii is about two-fold greater than the average of the other three fishes, and that the sequence homogenization event occurred inD. puntazzo more recently than in the others. The procedures described here are effective to characterize mode of evolution of satellite DNA. Published: March 4, 2003  相似文献   

5.
6.
7.
We have determined the nucleotide sequence of the immunoglobulin epsilon gene cloned from newborn mouse DNA. The epsilon gene sequence allows prediction of the amino acid sequence of the constant region of the epsilon chain and comparison of it with sequences of the human epsilon and other mouse immunoglobulin genes. The epsilon gene was shown to be under the weakest selection pressure at the protein level among the immunoglobulin genes although the divergence at the synonymous position is similar. Our results suggest that the epsilon gene may be dispensable, which is in accord with the fact that IgE has only obscure roles in the immune defense system but has an undesirable role as a mediator of hypersensitivity. The sequence data suggest that the human and murine epsilon genes were derived from different ancestors duplicated a long time ago. The amino acid sequence of the epsilon chain is more homologous to those of the gamma chains than the other mouse heavy chains. Two membrane exons, separated by an 80-base intron, were identified 1.7 kb 3' to the CH4 domain of the epsilon gene and shown to conserve a hydrophobic portion similar to those of other heavy chain genes. RNA blot hybridization showed that the epsilon membrane exons are transcribed into two species of mRNA in an IgE hybridoma.  相似文献   

8.
An extraordinarily large number of single nucleotide polymorphisms (SNPs) are now available in humans as well as in other model organisms. Technological advancements may soon make it feasible to assay hundreds of SNPs in virtually any organism of interest. One potential application of SNPs is the determination of pairwise genetic relationships in populations without known pedigrees. Although microsatellites are currently the marker of choice for this purpose, the number of independently segregating microsatellite markers that can be feasibly assayed is limited. Thus, it can be difficult to distinguish reliably some classes of relationship (e.g. full-sibs from half-sibs) with microsatellite data alone. We assess, via Monte Carlo computer simulation, the potential for using a large panel of independently segregating SNPs to infer genetic relationships, following the analytical approach of Blouin et al. (1996). We have explored a 'best case scenario' in which 100 independently segregating SNPs are available. For discrimination among single-generation relationships or for the identification of parent-offspring pairs, it appears that such a panel of moderately polymorphic SNPs (minor allele frequency of 0.20) will provide discrimination power equivalent to only 16-20 independently segregating microsatellites. Although newly available analytical methods that can account for tight genetic linkage between markers will, in theory, allow improved estimation of relationships using thousands of SNPs in highly dense genomic scans, in practice such studies will only be feasible in a handful of model organisms. Given the comparable amount of effort required for the development of both types of markers, it seems that microsatellites will remain the marker of choice for relationship estimation in nonmodel organisms, at least for the foreseeable future.  相似文献   

9.
10.
11.
We present the complete sequence of mouse 18 S rRNA. As indicated by comparison with yeast, Xenopus and rat, the conservation of eukaryotic 18 S rRNA sequences is extensive. However, this conservation is far from being uniform along the molecule: most of the base changes and the size differences between species are concentrated at specific locations. Two distinct classes of divergent traces can be detected which differ markedly in their rates of nucleotide substitution during evolution, and should prove valuable in additional comparative analyses, both for eukaryotic taxonomy and for rRNA higher order organization. Mouse and rat 18 S rRNA sequences differ by only 14 point changes over the 1869 nucleotides of the molecule.  相似文献   

12.
13.
14.
15.
16.
Multiple sequence alignment by a pairwise algorithm   总被引:1,自引:0,他引:1  
An algorithm is described that processes the results of a conventionalpairwise sequence alignment program to automatically producean unambiguous multiple alignment of many sequences. Unlikeother, more complex, multiple alignment programs, the methoddescribed here is fast enough to be used on almost any multiplesequence alignment problem. Received on September 25, 1986; accepted on January 29, 1987  相似文献   

17.

Background  

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate.  相似文献   

18.
We have purified apolipoprotein C-II (apo C-II) from cynomolgus monkey plasma, prepared antibody against it and used the antibody to isolate a cDNA containing the complete coding sequence for cynomolgus monkey apo C-11. Sequence analysis indicated that the monkey apo C-11 cDNA was 200 by longer than the human and the difference in size was all in the 5° untranslated region of the mRNA. This was confirmed by Northern analysis of human and monkey RNA. There was an open reading frame in the monkey apo C-11 cDNA sequence encoding a preprotein of 101 amino acids — identical in size to the human protein. The carboxyl terminal 44 amino acids of the protein were 100% homologous to the human apo C-11 amino acid sequence indicating evolutionary conservation of both structure and function. However, the amino terminal 35 amino acids of the protein were only 75% homologous and the amino terminal 19 amino acids were only 58% homologous to the human sequence. The amino acid sequence derived from the nucleotide sequence predicts a more basic protein than the human apo C-11 and this is confirmed by isoelectric focusing and immunoblotting.  相似文献   

19.
SUMMARY: NdPASA is a web server specifically designed to optimize sequence alignment between distantly related proteins. The program integrates structure information of the template sequence into a global alignment algorithm by employing neighbor-dependent propensities of amino acids as a unique parameter for alignment. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. NdPASA is most effective in aligning homologous proteins sharing low percentage of sequence identity. The server is designed to aid homologous protein structure modeling. A PSI-BLAST search engine was implemented to help users identify template candidates that are most appropriate for modeling the query sequences.  相似文献   

20.
Epistasis plays an essential role in the development of complex diseases. Interaction methods face common challenge of seeking a balance between persistent power, model complexity, computation efficiency, and validity of identified bio-markers. We introduce a novel W-test to identify pairwise epistasis effect, which measures the distributional difference between cases and controls through a combined log odds ratio. The test is model-free, fast, and inherits a Chi-squared distribution with data adaptive degrees of freedom. No permutation is needed to obtain the P-values. Simulation studies demonstrated that the W-test is more powerful in low frequency variants environment than alternative methods, which are the Chi-squared test, logistic regression and multifactor-dimensionality reduction (MDR). In two independent real bipolar disorder genome-wide associations (GWAS) datasets, the W-test identified significant interactions pairs that can be replicated, including SLIT3-CENPN, SLIT3-TMEM132D, CNTNAP2-NDST4 and CNTCAP2-RTN4R. The genes in the pairs play central roles in neurotransmission and synapse formation. A majority of the identified loci are undiscoverable by main effect and are low frequency variants. The proposed method offers a powerful alternative tool for mapping the genetic puzzle underlying complex disorders.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号