首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We propose and study a new approach for the analysis of families of protein sequences. This method is related to the LogDet distances used in phylogenetic reconstructions; it can be viewed as an attempt to embed these distances into a multidimensional framework. The proposed method starts by associating a Markov matrix to each pairwise alignment deduced from a given multiple alignment. The central objects under consideration here are matrix-valued logarithms L of these Markov matrices, which exist under conditions that are compatible with fairly large divergence between the sequences. These logarithms allow us to compare data from a family of aligned proteins with simple models (in particular, continuous reversible Markov models) and to test the adequacy of such models. If one neglects fluctuations arising from the finite length of sequences, any continuous reversible Markov model with a single rate matrix Q over an arbitrary tree predicts that all the observed matrices L are multiples of Q. Our method exploits this fact, without relying on any tree estimation. We test this prediction on a family of proteins encoded by the mitochondrial genome of 26 multicellular animals, which include vertebrates, arthropods, echinoderms, molluscs, and nematodes. A principal component analysis of the observed matrices L shows that a single rate model can be used as a rough approximation to the data, but that systematic deviations from any such model are unmistakable and related to the evolutionary history of the species under consideration.  相似文献   

2.
A method for reconstruction of genetic distances' matrix, based on linear combination of physical distances' matrices among populations and mean sizes of the population matrices is proposed. The analogue of genetic distances' matrix obtained correlates with the matrix at the level 0.59. The reconstruction may be used for the populations of about 2-3 neighbour districts. An index xi is introduced, as a constant describing some big regions. Comparison of reconstructed matrix of genetic distances with some well-known matrices of genetic distances is performed.  相似文献   

3.
Mao Y  Xu S 《Heredity》2005,94(3):305-315
Identity-By-Descent (IBD) is a general measurement of the relationship between two groups of genes. If the two groups consist of two homologous genes, one from each individual, the IBD is called the coancestry between the two individuals. Coancestry is an important concept in both population and quantitative genetics. It is the probability that both genes are copies of the same gene in the genealogy. The average coancestry value at a random locus in a population reflects the level of population diversity, effective population size, the level of inbreeding and other attributes. Coancestry is also the building block for the covariance structure used to estimate the additive genetic variance component for a quantitative trait. There are many other types of IBD matrices, depending on the natures of the genes included in each group, and these IBD matrices vary from locus to locus. Molecular markers distributed along the genome provide information that can be used to infer these locus-specific IBD matrices. As a result, we can estimate and test the variance components of a quantitative trait contributed by these loci using the inferred IBD matrices. In this study, we develop the concept of locus-specific epistatic IBD matrices and a Monte Carlo method to infer these IBD matrices. The method is suitable for large pedigrees with arbitrary complexity and various levels of missing marker information. With these locus-specific IBD matrices, we are ready to search for quantitative trait loci along the genome in complicated pedigrees.  相似文献   

4.
P J Kraulis  T A Jones 《Proteins》1987,2(3):188-201
A method to build a three-dimensional protein model from nuclear magnetic resonance (NMR) data using fragments from a data base of crystallographically determined protein structures is presented. The interproton distances derived from the nuclear Overhauser effect (NOE) data are compared to the precalculated distances in the known protein structures. An efficient search algorithm is used, which arranges the distances in matrices akin to a C alpha diagonal distance plot, and compares the NOE distance matrices for short sequential zones of the protein to the data base matrices. After cluster analysis of the fragments found in this way, the structure is built by aligning fragments in overlapping zones. The sequentially long-range NOEs cannot be used in the initial fragments search but are vital to discriminate between several possible combinations of different groups of fragments. The method has been tested on one simulated NOE data set derived from a crystal structure and one experimental NMR data set. The method produces models that have good local structure, but may contain larger global errors. These models can be used as the starting point for further refinement, e.g., by restrained molecular dynamics or interactive graphics.  相似文献   

5.
Amino acid substitution matrices play an essential role in protein sequence alignment, a fundamental task in bioinformatics. Most widely used matrices, such as PAM matrices derived from homologous sequences and BLOSUM matrices derived from aligned segments of PROSITE, did not integrate conformation information in their construction. There are a few structure-based matrices, which are derived from limited data of structure alignment. Using databases PDB_SELECT and DSSP, we create a database of sequence-conformation blocks which explicitly represent sequence-structure relationship. Members in a block are identical in conformation and are highly similar in sequence. From this block database, we derive a conformation-specific amino acid substitution matrix CBSM60. The matrix shows an improved performance in conformational segment search and homolog detection.  相似文献   

6.
Isozyme analysis is a valuable tool for determining genetic relationships among breeding lines and populations. The recently developed DNA technologies which can assay a greater proportion of the plant genome are providing a plentiful array of additional genomic markers. The objective of this research was to compare random amplified polymorphic DNA (RAPD) versus isozyme-based estimation of relationships among 24 accessions of a hexaploid wild oat, Avena sterilis L. The accessions were evaluated for variation in 23 enzyme systems and by 21 10-mer primers. A total of 77 polymorphic isozyme bands and 115 polymorphic RAPD bands were observed. Two matrices of genetic distances were estimated based on band presence/ absence. These matrices were subsequently utilized in cluster analysis and principal coordinate analysis. Both isozymes and RAPDs were proficient at distinguishing between the 24 accessions. The correspondence between the elements of both distance matrices was moderate (r=0.36**). Nevertheless, the overall representation of relationships among accessions by cluster analysis and ordination was in considerable agreement. The two techniques contrasted most notably in pair-by-pair comparisons of relationships. RAPD analysis resulted in a more definitive separation of clusters of accessions. The most significant impact of the DNA-based markers probably will be the more accurate determination of relationships between accessions that are too close to be accurately differentiated by isozymes.The research reported in this publication was funded by the North Carolina Agricultural Research Service, the North Carolina Biotechnology Center, and by a Heisenberg Fellowship (HE 1497/3-2) provided by the German Research Council to Manfred Heun  相似文献   

7.
Singer B  Sager R  Ramanis Z 《Genetics》1976,83(2):341-354
A novel mapping procedure is presented for organelle genes or any other genetic system exhibiting a measurable frequency of exchanges occurring at a constant rate over a measurable time interval. For a set of markers in a multiply-marked cross, the exchange rates measure relative map distances from a centromere-like attachment point.With this method, we present mapping data and a linear map of genes in the chlcroplast genome of Chlamydomonas. The data are plotted as log (percent remaining heterozygotes) against time and map distances are taken as proportional to slope.A statistical method which is an adaptation of jackknife methodology to a regression problem was developed to estimate slope values. A single line is fitted to pooled data for each marker from several crosses, and then lines are re-fit to a series of pooled data sets in each of which the observations from a single cross have been omitted. From these data sets a final summary slope is computed as well as a statement of its variability. The relative positions of new markers present in single crosses can then be estimated utilizing data from many crosses. The method does not distinguish between one-armed and two-armed linear or circular maps. However, evaluation of this map in conjunction with cosegregation frequency data (Sager and Ramanis 1976b) provides unambiguous evidence of the genetic circularity of the Chlamydomonas chloroplast genome.  相似文献   

8.
Allozyme electrophoresis has been employed to examine genetic differentiation among eight described species, and representatives of an additional 15 taxa, of Australian peripatopsid Onychophora. The data reveal extremely high genetic differentiation among the described species and among the other taxa, each of which warrants specific recognition. Rapid protein evolution cannot account for the large genetic distances and it is proposed that these are a consequence of ancient divergence times. A method is presented for extracting phylogenetic information from allozyme data sets which are not amenable to conventional analysis.  相似文献   

9.
We address the problem of comparing interindividual genomic sequence diversity between two populations. Although the methods are general, for concreteness we focus on comparing two human immunodeficiency virus (HIV) infected populations. From a viral isolate(s) taken from each individual in a sample of persons from each population, suppose one or multiple measurements are made on the genetic sequence of a coding region of HIV. Given a definition of genetic distance between sequences, the goal is to test if the distribution of interindividual distances differs between populations. If distances between all pairs of sequences within each group are used, then data-dependencies arising from the use of multiple sequences from individuals invalidates the use of a standard two-sample test such as the t-test. Where this problem has been recognized, a typical solution has been to apply a standard test to a reduced dataset comprised of one sequence or a consensus sequence from each patient. Disadvantages of this procedure are that the conclusion of the test depends on the choice of utilized sequences, often an arbitrary decision, and exclusion of replicate sequences from the analysis may needlessly sacrifice statistical power. We present a new test free of these drawbacks, which is based on a statistic that linearly combines all possible standard test statistics calculated from independent sequence subsamples. We describe statistical power advantages of the test and illustrate its use by application to nucleotide sequence distances measured from HIV-1 infected populations in southern Africa (GenBank accession numbers AF110959--AF110981) and North America/Europe. The test makes minimal assumptions, is maximally efficient and objective, and is broadly applicable.  相似文献   

10.
The aim of this study is to search for certain repeating phenotypic patterns, i.e. sets of complementary relationships across five isolated populations, which may represent the traces of expression of different genes or gene complexes. The study was conducted among isolates of five island populations of eastern Adriatic, Croatia, and the data were collected between 1979 and 1990. Selected phenotypic characteristics included measures of biological distances (e.g. anthropometrical body and head distances, physiological, dermatoglyphic and radiogrammetric bone distances), while other examined traits included sociocultural (linguistic), bio-cultural (migrational kinship) and genetic distances. The sample consisted of 6,286 examinees from 43 villages of five isolate populations. Correlations between distance matrices based on examined traits were analyzed in each of five populations using Mantel's test of matrix correspondence, and factor analysis (rotated principal component) was then performed over obtained correlation matrices. The results showed that there were several consistent and significant correlations between some analyzed traits across all of the studied isolate populations, which might indicate their regulation by the shared gene complexes or genome regions. The analyses identified three main clusters of correlations in all five isolate populations: the first one containing anthropometric measures (body and head measures and physiological properties in both sexes), the second one containing geographic distance-related traits (migrational kinship, linguistic and genetic distances), and the third one containing dermatoglyphic properties and radiogrammetric bone measures in both sexes. The higher order varimax rotation over the matrix of factor correlations revealed that the primary source of variation within all five analyzed populations was not sex-related, but rather variable-specific.  相似文献   

11.
Landscape genetics aims to assess the effect of the landscape on intraspecific genetic structure. To quantify interdeme landscape structure, landscape genetics primarily uses landscape resistance surfaces (RSs) and least-cost paths or straight-line transects. However, both approaches have drawbacks. Parameterization of RSs is a subjective process, and least-cost paths represent a single migration route. A transect-based approach might oversimplify migration patterns by assuming rectilinear migration. To overcome these limitations, we combined these two methods in a new landscape genetic approach: least-cost transect analysis (LCTA). Habitat-matrix RSs were used to create least-cost paths, which were subsequently buffered to form transects in which the abundance of several landscape elements was quantified. To maintain objectivity, this analysis was repeated so that each landscape element was in turn regarded as migration habitat. The relationship between explanatory variables and genetic distances was then assessed following a mixed modelling approach to account for the nonindependence of values in distance matrices. Subsequently, the best fitting model was selected using the statistic. We applied LCTA and the mixed modelling approach to an empirical genetic dataset on the endangered damselfly, Coenagrion mercuriale. We compared the results to those obtained from traditional least-cost, effective and resistance distance analysis. We showed that LCTA is an objective approach that identifies both the most probable migration habitat and landscape elements that either inhibit or facilitate gene flow. Although we believe the statistical approach to be an improvement for the analysis of distance matrices in landscape genetics, more stringent testing is needed.  相似文献   

12.
Aim The aim of this study was to understand the roles of landscape features in shaping patterns of contemporary and historical genetic diversification among populations of the Andean tree frog (Hypsiboas andinus) across spatial scales. Location Andes mountains, north‐western Argentina, South America. Methods Mitochondrial DNA control region sequences were utilized to assess genetic differentiation among populations and calculate population pair‐wise genetic distances. Three models of movement, namely traditional straight‐line distance and two effective distances based on habitat classification, were examined to determine which of these explained the most variation in pair‐wise population genetic differentiation. The two habitat classifications were based on digital vegetation and hydrology layers that were generated from a 90‐m resolution digital elevation model (DEM) and known relationships between elevation and habitat. Mantel tests were conducted to test for correlations between geographic and genetic distance matrices and to estimate the percentage variation explained by each type of geographic distance. To investigate the location of possible barriers to gene flow, we used Monmonier’s maximum difference algorithm as implemented in barrier 2.2. Results At both geographic scales, effective distances explained more variation in genetic differentiation than did straight‐line distance. The least‐cost distances based on the simple classification performed better than the more detailed habitat classification. We controlled for the effects of historical range fragmentation determined from previous nested clade analyses, and therefore evaluated the effect of different distances on the genetic variation attributable to more recent factors. Effective distances identified populations that were highly divergent as a result of isolation in unsuitable habitats. The proposed locations of barriers to gene flow identified using Monmonier’s maximum difference algorithm corresponded well with earlier analyses and supported findings from our partial Mantel tests. Main conclusions Our results indicate that landscape features have been important in both historical and contemporary genetic structuring of populations of H. andinus at both large and small spatial scales. A landscape genetic perspective offers novel insights not provided by traditional phylogeographic studies: (1) effective distances can better explain patterns of differentiation in populations, especially in heterogeneous landscapes where barriers to dispersal may be common; and (2) least‐cost path analysis can help to identify corridors of movement between populations that are biologically more realistic.  相似文献   

13.
We examined the efficiencies of ordination methods in the treatment of gene frequency data at intraspecific level, using metric and nonmetric distance measures (Nei's and Rogers' genetic distances, chi 2 distance). We assessed initial processes responsible for the geographical distribution of the Mediterranean land snail Helix aspersa. Seventeen enzyme loci from 30 North African snail populations were considered in the present analysis. Five combinations of distance/multivariate analysis were compared: correspondence analysis (CA), nonmetric multidimensional scaling (NMDS) on Nei's, Rogers', and chi 2 distances, and principal coordinates analysis on Rogers' distances. Configuration of the objects resulting from ordination was projected onto three-dimensional graphics with the minimum spanning tree or the relative neighborhood graph superimposed. Pre- and postordination or clustering distance matrices were compared by means of correlation methods. As expected, all combinations led to a clear west versus east pattern of variation. However, the intraregional relationships and degree of connectivity between pairs of operational taxonomic units were not necessarily constant from one method to another. Ordination methods when applied with Nei's and Rogers' distances provided the best fit, with original distances (r = 0.98) compared with UPGMA clustering (r approximately 0.75). The Nei/NMDS combination seems to be a good compromise (distortion index dt = 10%) between Rogers/NMDS, which produces a more confusing pattern of differentiation (dt = 24%), and chi 2/CA, which tends to distort large distances (dt = 31%). NMDS obviously provides a powerful method to summarize relationships between populations, when neither hierarchical structure nor phylogenetic inference are required. These findings led the discussion on the good performance of NMDS, the appropriate distances to be used, and the potential application of this method to other types of allelic data (such as microsatellite loci) or data on nucleotide sequences of genes.  相似文献   

14.
This study provides statistical analyses of allele frequencies for populations of Thailand, with an attempt to trace the roles of differential malarial selection and genetic admixtures on the observed frequency variation of certain red cell genetic abnormalities (the two beta-globin variants--hemoglobin E and beta-thalassemia--and G-6PD deficiency), probably evolving under malarial endemicity. It is found that frequencies of hemoglobin E vary accordingly with those of G-6PD deficiency, and with diverse malarial ecology. The levels of genetic diversity are greater for hemoglobin E and G-6PD deficiency than for most other nonmalarial related genetic markers, suggesting the evolution of these two genetic abnormalities under differential selection. Results of the Mantel's statistical test for correspondence between distance matrices suggest distinctive patterns of allele frequency differentiation between malarial-related and nonmalarial-related genetic loci. Correlations between beta-globin and G-6PD genetic distances, as well as those between both sets of distances and the malarial distances, are statistically significant. On the other hand, a correlation between malarial distances and the genetic distances for nonmalarial-related genetic loci is not significant statistically. A correlation between the beta-globin genetic distances and the genetic distances for nonmalarial-related genetic loci is, however, statistically significant. The latter result could be attributed largely to the clustering of relatively high hemoglobin E frequencies among genetically closely related populations of northeastern Thailand, whose recent homeland was Laos. The consistently low frequencies of beta-thalassemia observed in most studied populations are explained as a result of the replacement of this genetic variant by hemoglobin E, under long-term malarial selection.  相似文献   

15.
We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97), and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.  相似文献   

16.
I explore the use of multiple regression on distance matrices (MRM), an extension of partial Mantel analysis, in spatial analysis of ecological data. MRM involves a multiple regression of a response matrix on any number of explanatory matrices, where each matrix contains distances or similarities (in terms of ecological, spatial, or other attributes) between all pair-wise combinations of n objects (sample units); tests of statistical significance are performed by permutation. The method is flexible in terms of the types of data that may be analyzed (counts, presence–absence, continuous, categorical) and the shapes of response curves. MRM offers several advantages over traditional partial Mantel analysis: (1) separating environmental distances into distinct distance matrices allows inferences to be made at the level of individual variables; (2) nonparametric or nonlinear multiple regression methods may be employed; and (3) spatial autocorrelation may be quantified and tested at different spatial scales using a series of lag matrices, each representing a geographic distance class. The MRM lag matrices model may be parameterized to yield very similar inferences regarding spatial autocorrelation as the Mantel correlogram. Unlike the correlogram, however, the lag matrices model may also include environmental distance matrices, so that spatial patterns in species abundance distances (community similarity) may be quantified while controlling for the environmental similarity between sites. Examples of spatial analyses with MRM are presented.  相似文献   

17.
Although several methods are available to study the extent of isolation by distance (IBD) among natural populations, comparatively few exist to detect the presence of sharp genetic breaks in genetic distance datasets. In recent years, Monmonier's maximum-difference algorithm has been increasingly used by population geneticists. However, this method does not provide means to measure the statistical significance of such barriers, nor to determine their relative contribution to population differentiation with respect to IBD. Here, we propose an approach to assess the significance of genetic boundaries. The method is based on the calculation of a multiple regression from distance matrices, where binary matrices represent putative genetic barriers to test, in addition to geographic and genetic distances. Simulation results suggest that this method reliably detects the presence of genetic barriers, even in situations where IBD is also significant. We also illustrate the methodology by analyzing previously published datasets. Conclusions about the importance of genetic barriers can be misleading if one does not take into consideration their relative contribution to the overall genetic structure of species.  相似文献   

18.
Substitution matrices have been useful for sequence alignment and protein sequence comparisons. The BLOSUM series of matrices, which had been derived from a database of alignments of protein blocks, improved the accuracy of alignments previously obtained from the PAM-type matrices estimated from only closely related sequences. Although BLOSUM matrices are scoring matrices now widely used for protein sequence alignments, they do not describe an evolutionary model. BLOSUM matrices do not permit the estimation of the actual number of amino acid substitutions between sequences by correcting for multiple hits. The method presented here uses the Blocks database of protein alignments, along with the additivity of evolutionary distances, to approximate the amino acid substitution probabilities as a function of actual evolutionary distance. The PMB (Probability Matrix from Blocks) defines a new evolutionary model for protein evolution that can be used for evolutionary analyses of protein sequences. Our model is directly derived from, and thus compatible with, the BLOSUM matrices. The model has the additional advantage of being easily implemented.  相似文献   

19.
Phylogenomic studies aim to build phylogenies from large sets of homologous genes. Such "genome-sized" data require fast methods, because of the typically large numbers of taxa examined. In this framework, distance-based methods are useful for exploratory studies and building a starting tree to be refined by a more powerful maximum likelihood (ML) approach. However, estimating evolutionary distances directly from concatenated genes gives poor topological signal as genes evolve at different rates. We propose a novel method, named super distance matrix (SDM), which follows the same line as average consensus supertree (ACS; Lapointe and Cucumel, 1997) and combines the evolutionary distances obtained from each gene into a single distance supermatrix to be analyzed using a standard distance-based algorithm. SDM deforms the source matrices, without modifying their topological message, to bring them as close as possible to each other; these deformed matrices are then averaged to obtain the distance supermatrix. We show that this problem is equivalent to the minimization of a least-squares criterion subject to linear constraints. This problem has a unique solution which is obtained by resolving a linear system. As this system is sparse, its practical resolution requires O(naka) time, where n is the number of taxa, k the number of matrices, and a < 2, which allows the distance supermatrix to be quickly obtained. Several uses of SDM are proposed, from fast exploratory studies to more accurate approaches requiring heavier computing time. Using simulations, we show that SDM is a relevant alternative to the standard matrix representation with parsimony (MRP) method, notably when the taxa sets of the different genes have low overlap. We also show that SDM can be used to build an excellent starting tree for an ML approach, which both reduces the computing time and increases the topogical accuracy. We use SDM to analyze the data set of Gatesy et al. (2002, Syst. Biol. 51: 652-664) that involves 48 genes of 75 placental mammals. The results indicate that these genes have strong rate heterogeneity and confirm the simulation conclusions.  相似文献   

20.
The estimation of amino acid replacement frequencies during molecular evolution is crucial for many applications in sequence analysis. Score matrices for database search programs or phylogenetic analysis rely on such models of protein evolution. Pioneering work was done by Dayhoff et al. (1978) who formulated a Markov model of evolution and derived the famous PAM score matrices. Her estimation procedure for amino acid exchange frequencies is restricted to pairs of proteins that have a constant and small degree of divergence. Here we present an improved estimator, called the resolvent method, that is not subject to these limitations. This extension of Dayhoff's approach enables us to estimate an amino acid substitution model from alignments of varying degree of divergence. Extensive simulations show the capability of the new estimator to recover accurately the exchange frequencies among amino acids. Based on the SYSTERS database of aligned protein families (Krause and Vingron, 1998) we recompute a series of score matrices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号