首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Selecting human immunodeficiency virus (HIV) sequences for inclusion within vaccines has been a difficult problem, as circulating HIV strains evolve relentlessly and become increasingly divergent over time. We report an assessment of this divergence from three perspectives: (i) across different hosts as a function of time of infection, (ii) between donors and recipients in known transmission pairs, and (iii) within individual hosts over time in relation to the initially replicating virus and to the deduced ancestral sequence of the intrahost viral population. Surprisingly, we consistently found less divergence between viruses from different individuals sampled in primary infection than in individuals sampled at more advanced stages of illness. Furthermore, longitudinal analysis of intrahost divergence revealed a 2- to 3-year period of evolution toward a common ancestral sequence at the start of infection, indicating that HIV recovers certain ancestral features when infecting a new host. These results have important implications for the study of HIV population genetics and rational vaccine design, including favoring the inclusion of viral gene sequences taken early in infection.  相似文献   

2.
H Tyson 《Génome》1992,35(2):360-371
Optimum alignment in all pairwise combinations among a group of amino acid sequences generated a distance matrix. These distances were clustered to evaluate relationships among the sequences. The degree of relationship among sequences was also evaluated by calculating specific distances from the distance matrix and examining correlations between patterns of specific distances for pairs of sequences. The sequences examined were a group of 20 amino acid sequences of scorpion toxins originally published and analyzed by M.J. Dufton and H. Rochat in 1984. Alignment gap penalties were constant for all 190 pairwise sequence alignments and were chosen after assessing the impact of changing penalties on resultant distances. The total distances generated by the 190 pairwise sequence alignments were clustered using complete (farthest neighbour) linkage. The square, symmetrical input distance matrix is analogous to diallel cross data where reciprocal and parental values are absent. Diallel analysis methods provided analogues for the distance matrix to genetical specific combining abilities, namely specific distances between all sequence pairs that are independent of the average distances shown by individual sequences. Correlation of specific distance patterns, with transformation to modified z values and a stringent probability level, were used to delineate subgroups of related sequences. These were compared with complete linkage clustering results. Excellent agreement between the two approaches was found. Three originally outlying sequences were placed within the four new subgroups.  相似文献   

3.
A simple model is put forward to explain the long-known three-base periodicity in coding DNA. We propose the concept of same-phase triplet clustering, i.e. a condition wherein a triplet appears several times in one phase without interruption by the two other possible phases. For instance, in the sequence (i): NTT_GNN_NTT_GNN_NTT_GNN_NNN_NTT_GNN (where N is any nucleotide but combinations producing TTG are excluded) there would be clustering of same-phase TTG because this triplet appears uninterruptedly in phase 2. In contrast, in the sequence (ii): TTG_NTT_GNN_NNT_TGN_NNN_NTT_GNN there is no same-phase clustering because neighboring TTGs are all in different phases. Observe also that in sequence (i) TTG triplets are separated by 3, 3 and 6 nucleotides (3n distances), while in sequence (ii) they are separated by 1, 4 and 5 nucleotides (non-3n distances). In this work, we demonstrate that in coding DNA the 3n distances generated by (i)-type sequences proportionally outnumber the non-3n distances generated by (ii)-type sequences, this condition would be the basis of three-base periodicity. Randomized sequences had (i)- and (ii)-type sequences too but clustering was statistically different. To prove our model we generated (i)-type sequences in a randomized sequence by inducing clustering of same-phase triplets. In agreement with the model this sequence displayed three-base periodicity. Furthermore, two- and four-base periodicities could also be induced by artificially inducing clustering of duplets and tetraplets.  相似文献   

4.
Vignieri SN 《Molecular ecology》2005,14(7):1925-1937
In species affiliated with heterogeneous habitat, we expect gene flow to be restricted due to constraints placed on individual movement by habitat boundaries. This is likely to impact both individual dispersal and connectivity between populations. In this study, a GIS-based landscape genetics approach was used, in combination with fine-scale spatial autocorrelation analysis and the estimation of recent intersubpopulation migration rates, to infer patterns of dispersal and migration in the riparian-affiliated Pacific jumping mouse (Zapus trinotatus). A total of 228 individuals were sampled from nine subpopulations across a system of three rivers and genotyped at eight microsatellite loci. Significant spatial autocorrelation among individuals revealed a pattern of fine-scale spatial genetic structure indicative of limited dispersal. Geographical distances between pairwise subpopulations were defined following four criteria: (i) Euclidean distance, and three landscape-specific distances, (ii) river distance (distance travelled along the river only), (iii) overland distance (similar to Euclidean, but includes elevation), and (iv) habitat-path distance (a least-cost path distance that models movement along habitat pathways). Pairwise Mantel tests were used to test for a correlation between genetic distance and each of the geographical distances. Significant correlations were found between genetic distance and both the overland and habitat-path distances; however, the correlation with habitat-path distance was stronger. Lastly, estimates of recent migration rates revealed that migration occurs not only within drainages but also across large topographic barriers. These results suggest that patterns of dispersal and migration in Pacific jumping mice are largely determined by habitat connectivity.  相似文献   

5.
6.
The present work describes an attempt to identify reliable criteria which could be used as distance indices between protein sequences. Seven different criteria have been tested: i and ii) the scores of the alignments as given by the BESTFIT and the FASTA programs; iii) the ratio parameter, i.e. the BESTFIT score divided by the length of the aligned peptides; iv and v) the statistical significance (Z-scores) of the scores calculated by BESTFIT and FASTA, as obtained by comparison with shuffled sequences; vi) the Z-scores provided by the program RELATE which performs a segment-by-segment comparison of 2 sequences, and vii) an original distance index calculated by the program DOCMA from all the pairwise dotplots between the sequences. These 7 criteria have been tested against the aminoacid sequences of 39 globins and those of the 20 aminoacyl-tRNA synthetases from E. coli. The distances between the sequences were analyzed by the multivariate analysis techniques. The results show that the distances calculated from the scores of the pairwise alignments are not adequately sensitive. The Z-score from RELATE is not selective enough and too demanding in computer time. Three criteria gave a classification consistent with the known similarities between the sequences in the sets, namely the Z-scores from BESTFIT and FASTA and the multiple dotplot comparison distance index from DOCMA.  相似文献   

7.
The blind use of models of nucleotide substitution in evolutionary analyses is a common practice in the viral community. Typically, a simple model of evolution like the Kimura two-parameter model is used for estimating genetic distances and phylogenies, either because other authors have used it or because it is the default in various phylogenetic packages. Using two statistical approaches to model fitting, hierarchical likelihood ratio tests and the Akaike information criterion, we show that different viral data sets are better explained by different models of evolution. We demonstrate our results with the analysis of HIV-1 sequences from a hierarchy of samples; sequences within individuals, individuals within subtypes, and subtypes within groups. We also examine results for three different gene regions: gag, pol, and env. The Kimura two-parameter model was not selected as the best-fit model for any of these data sets, despite its widespread use in phylogenetic analyses of HIV-1 sequences. Furthermore, the model complexity increased with increasing sequence divergence. Finally, the molecular-clock hypothesis was rejected in most of the data sets analyzed, throwing into question clock-based estimates of divergence times for HIV-1. The importance of models in evolutionary analyses and their repercussions on the derived conclusions are discussed.  相似文献   

8.
Human immunodeficiency virus (HIV) infects different organs and tissues. During these infection events, subpopulations of HIV type 1 (HIV-1) develop and, if viral trafficking is restricted between subpopulations, the viruses can follow independent evolutionary histories, i.e., become compartmentalized. This phenomenon is usually detected via comparative sequence analysis and has been reported for viruses isolated from the central nervous system (CNS) and the genital tract. Several approaches have been proposed to study the compartmentalization of HIV sequences, but to date, no rigorous comparison of the most commonly employed methods has been made. In this study, we systematically compared inferences made by six different methods for detecting compartmentalization based on three data sets: (i) a sample of 45 patients with sequences gathered from the CNS, (ii) sequences from the female genital tract of 18 patients, and (iii) a set of simulated sequences. We found that different methods often reached contradictory conclusions. Methods based on the topology of a phylogenetic tree derived from clonal sequences were generally more sensitive in detecting compartmentalization than those that relied solely upon pairwise genetic distances between sequences. However, as the branching structure in a phylogenetic tree is often uncertain, especially for short, low-diversity, or recombinant sequences, tree-based approaches may need to be modified to take phylogenetic uncertainty into account. Given the frequently discordant predictions of different methods and the strengths and weaknesses of each particular methodology, we recommend that a suite of several approaches be used for reliable inference of compartmentalized population structure.  相似文献   

9.
We address the problem of comparing interindividual genomic sequence diversity between two populations. Although the methods are general, for concreteness we focus on comparing two human immunodeficiency virus (HIV) infected populations. From a viral isolate(s) taken from each individual in a sample of persons from each population, suppose one or multiple measurements are made on the genetic sequence of a coding region of HIV. Given a definition of genetic distance between sequences, the goal is to test if the distribution of interindividual distances differs between populations. If distances between all pairs of sequences within each group are used, then data-dependencies arising from the use of multiple sequences from individuals invalidates the use of a standard two-sample test such as the t-test. Where this problem has been recognized, a typical solution has been to apply a standard test to a reduced dataset comprised of one sequence or a consensus sequence from each patient. Disadvantages of this procedure are that the conclusion of the test depends on the choice of utilized sequences, often an arbitrary decision, and exclusion of replicate sequences from the analysis may needlessly sacrifice statistical power. We present a new test free of these drawbacks, which is based on a statistic that linearly combines all possible standard test statistics calculated from independent sequence subsamples. We describe statistical power advantages of the test and illustrate its use by application to nucleotide sequence distances measured from HIV-1 infected populations in southern Africa (GenBank accession numbers AF110959--AF110981) and North America/Europe. The test makes minimal assumptions, is maximally efficient and objective, and is broadly applicable.  相似文献   

10.
Using DNA sequence data from pathogens to infer transmission networks has traditionally been done in the context of epidemics and outbreaks. Sequence data could analogously be applied to cases of ubiquitous commensal bacteria; however, instead of inferring chains of transmission to track the spread of a pathogen, sequence data for bacteria circulating in an endemic equilibrium could be used to infer information about host contact networks. Here, we show--using simulated data--that multilocus DNA sequence data, based on multilocus sequence typing schemes (MLST), from isolates of commensal bacteria can be used to infer both local and global properties of the contact networks of the populations being sampled. Specifically, for MLST data simulated from small-world networks, the small world parameter controlling the degree of structure in the contact network can robustly be estimated. Moreover, we show that pairwise distances in the network--degrees of separation--correlate with genetic distances between isolates, so that how far apart two individuals in the network are can be inferred from MLST analysis of their commensal bacteria. This result has important consequences, and we show an example from epidemiology: how this result could be used to test for infectious origins of diseases of unknown etiology.  相似文献   

11.
Summary Relationships among 18 peroxidases amino acid sequences of animal, microbial and plant origin were examined using optimum alignment of all pairwise sequence combinations to generate a total distance matrix. The matrix was used to cluster the sequences with complete linkage (farthest neighbour) procedures. Specific distances were calculated from the total distances matrix. The patterns of specific distances for each sequence were compared to evaluate the relationships between sequences, check their significance and construct subgroups of related sequences. The results were compared with those from clustering and its resultant dendrogram; good agreement was achieved. The 18 sequences fell into two principal groups, plant peroxidases and animal/microbial peroxidases. Within the plant peroxidases four subgroups were detected; the animal/microbial peroxidases formed a fifth subgroup. Profiles were constructed for the subgroups from lists of matching amino acids generated by the alignment calculations. Superimposed lists were realigned to recognise conserved areas and elements. Individual subgroup profiles for the plant peroxidases were then combined into a single profile which in turn was combined with profiles from the animal/microbial peroxidases. The final profile suggested that numerous sequence features (motifs) were common to peroxidases of widely different function and origins.  相似文献   

12.
To be effective, management programmes geared towards halting or reversing the spread of invasive species must focus on defined and defensible areas. This requires knowledge of the dispersal of non-native species targeted for control to better understand invasion and recolonisation scenarios. We investigated the genetic structure of invasive American mink ( Neovison vison ) in Scotland, and incorporated landscape genetic approaches to examine resultant patterns in relation to geographical features that may influence dispersal. Populations of mink sampled from 10 sites in two regions (Argyll and Northeast Scotland) show a distinct genetic structure. First, the majority of pairwise population comparisons yielded F ST values that were significantly greater than zero. Second, amova revealed that most of the genetic variance was attributable to differences among regions. Assignment tests placed 89 or more of individuals into their sampled region. Bayesian clustering methods grouped samples into two clusters according to their region of origin. Wombling approach identified the Cairngorms Mountains as a major impediment to gene flow between the regions. Mantel pairwise correlations between genetic and geographical distances estimated as least-cost distance assuming a linear increase in the cost of movement with increasing elevation were higher than Euclidean distances or distance along waterways. Spatial autocorrelation analyses revealed stronger spatial structuring for females than for males. These results suggest that gene flow by American mink is restricted by landscape features (mountain ranges) and that eradication attempt should in the first instance break down the connectivity between management units separated by mountains.  相似文献   

13.
Relative-rate tests have previously been developed to compare the substitution rates of two sequences or two groups of sequences. These tests usually assume that the process of nucleotide substitution is stationary and the same for all lineages, i.e., uniform. In this study, we conducted simulations to assess the performance of the relative-rate tests when the molecular-clock (MC) hypothesis is true (i.e., there is no rate difference between lineages), but the stationarity and uniformity assumptions are violated. Kimura's and bias-corrected LogDet distances were used. We found that the computation of the variances and covariances of LogDet distances had to be modified, because the constraint that the sum of the frequencies of the 16 nucleotide pair types is equal to 1 must be imposed. Comparison of the rates of two single sequences (Wu and Li's test) or two groups of sequences (Li and Bousquet's test) gave similar results. When the sequences are long (> or = 500 nt), the test based on LogDet distances and their appropriate variances and covariances is appropriate even when the substitution process is not stationary and/or not uniform. That is, at the 5% significance level, the test rejects the MC hypothesis in about 5% of the simulation replicates. In contrast, if the sequences are short (< or = 200 bases) and highly divergent, the LogDet test is very conservative due to overestimation of the variances of the distances. When the uniformity assumption is violated, the relative-rate test based on Kimura's distances can be severely misleading because of differences in base composition between sequences. However, if the uniformity assumption held and so the base frequencies remained similar among sequences, the rate of rejection turned out to be close to 5%, especially with short sequences. Under such conditions, the test using Kimura's distances performs better than the LogDet test. The reason seems to be that these distances are less affected by a reduction in the number of sites than the LogDet distances because they depend on only two parameters.  相似文献   

14.
spag e d i version 1.0 is a software primarily designed to characterize the spatial genetic structure of mapped individuals or populations using genotype data of codominant markers. It computes various statistics describing genetic relatedness or differentiation between individuals or populations by pairwise comparisons and tests their significance by appropriate numerical resampling. spag e d i is useful for: (i) detecting isolation by distance within or among populations and estimating gene dispersal parameters; (ii) assessing genetic relatedness between individuals and its actual variance, a parameter of interest for marker based inferences of quantitative inheritance; (iii) assessing genetic differentiation among populations, including the case of haploids or autopolyploids.  相似文献   

15.
Phylogenetic methods that use matrices of pairwise distances between sequences (e.g., neighbor joining) will only give accurate results when the initial estimates of the pairwise distances are accurate. For many different models of sequence evolution, analytical formulae are known that give estimates of the distance between two sequences as a function of the observed numbers of substitutions of various classes. These are often of a form that we call "log transform formulae". Errors in these distance estimates become larger as the time t since divergence of the two sequences increases. For long times, the log transform formulae can sometimes give divergent distance estimates when applied to finite sequences. We show that these errors become significant when t approximately 1/2 |lambda(max)|(-1) logN, where lambda(max) is the eigenvalue of the substitution rate matrix with the largest absolute value and N is the sequence length. Various likelihood-based methods have been proposed to estimate the values of parameters in rate matrices. If rate matrix parameters are known with reasonable accuracy, it is possible to use the maximum likelihood method to estimate evolutionary distances while keeping the rate parameters fixed. We show that errors in distances estimated in this way only become significant when t approximately 1/2 |lambda(1)|(-1) logN, where lambda(1) is the eigenvalue of the substitution rate matrix with the smallest nonzero absolute value. The accuracy of likelihood-based distance estimates is therefore much higher than those based on log transform formulae, particularly in cases where there is a large range of timescales involved in the rate matrix (e.g., when the ratio of transition to transversion rates is large). We discuss several practical ways of estimating the rate matrix parameters before distance calculation and hence of increasing the accuracy of distance estimates.  相似文献   

16.
Twenty-six Rhizobium galegae strains, representing the center of origin of the host plants Galega orientalis and G. officinalis as well as other geographic regions, were used in a polyphasic analysis of the relationships of R. galegae strains. Phage typing, lipopolysaccharide (LPS) profiling, pulsed field gel electrophoresis (PFGE) profiling and rep-PCR (use of repetitive sequences as PCR primers for genomic fingerprinting) with REP and ERIC primers investigated nonsymbiotic properties, whereas plasmid profiling and hybridisation with a nif gene probe, and with nodB, nodD, nod box and an IS sequence from the symbiotic region as probes, were used to reveal the relationships of symbiotic genes. The results were used in pairwise calculations of distances between the strains, and the distances were visualised as a dendrogram. Indexes of association were compared for all tests pooled, and for chromosomal tests and symbiotic markers separately, to display the input of the different categories of tests on the grouping of the strains. Our study shows that symbiosis related genetic traits in R. galegae divide strains belonging to the species into two groups, which correspond to strains forming an effective symbioses with G. orientalis and G. officinalis respectively. We therefore propose that Rhizobium galegae strains forming an effective symbiosis with Galega orientalis are called R. galegae bv. orientalis and strains forming an effective symbiosis with Galega officinalis are called R. galegae bv. officinalis.  相似文献   

17.
Human immunodeficiency virus type 1 (HIV-1) sequences were generated from blood and from brain tissue obtained by stereotactic biopsy from six patients undergoing a diagnostic neurosurgical procedure. Proviral DNA was directly amplified by nested PCR, and 8 to 36 clones from each sample were sequenced. Phylogenetic analysis of intrapatient envelope V3-V5 region HIV-1 DNA sequence sets revealed that brain viral sequences were clustered relative to the blood viral sequences, suggestive of tissue-specific compartmentalization of the virus in four of the six cases. In the other two cases, the blood and brain virus sequences were intermingled in the phylogenetic analyses, suggesting trafficking of virus between the two tissues. Slide-based PCR-driven in situ hybridization of two of the patients' brain biopsy samples confirmed our interpretation of the intrapatient phylogenetic analyses. Interpatient V3 region brain-derived sequence distances were significantly less than blood-derived sequence distances. Relative to the tip of the loop, the set of brain-derived viral sequences had a tendency towards negative or neutral charge compared with the set of blood-derived viral sequences. Entropy calculations were used as a measure of the variability at each position in alignments of blood and brain viral sequences. A relatively conserved set of positions were found, with a significantly lower entropy in the brain-than in the blood-derived viral sequences. These sites constitute a brain "signature pattern," or a noncontiguous set of amino acids in the V3 region conserved in viral sequences derived from brain tissue. This brain-derived signature pattern was also well preserved among isolates previously characterized in vitro as macrophage tropic. Macrophage-monocyte tropism may be the biological constraint that results in the conservation of the viral brain signature pattern.  相似文献   

18.
In cooperatively breeding species, restricted dispersal of offspring leads to clustering of closely related individuals, increasing the potential both for indirect genetic benefits and inbreeding costs. In apostlebirds (Struthidea cinerea), philopatry by both sexes results in the formation of large (up to 17 birds), predominantly sedentary breeding groups that remain stable throughout the year. We examined patterns of relatedness and fine-scale genetic structure within a population of apostlebirds using six polymorphic microsatellite loci. We found evidence of fine-scale genetic structure within the study population that is consistent with behavioural observations of short-distance dispersal, natal philopatry by both sexes and restricted movement of breeding groups between seasons. Global F(ST) values among breeding groups were significantly positive, and the average level of pairwise relatedness was significantly higher for individuals within groups than between groups. For individuals from different breeding groups, geographical distance was negatively correlated with pairwise relatedness and positively correlated with pairwise F(ST). However, when each sex was examined separately, this pattern was significant only among males, suggesting that females may disperse over longer distances. We discuss the potential for kin selection to influence the evolution and maintenance of cooperative breeding in apostlebirds. Our results demonstrate that spatial genetic structural analysis offers a useful alternative to field observations in examining dispersal patterns of cooperative breeders.  相似文献   

19.

Background  

Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e., the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough OGs to properly evaluate their pairwise OG distances.  相似文献   

20.
Genetic data are increasingly used in landscape ecology for the indirect assessment of functional connectivity, that is, the permeability of landscape to movements of organisms. Among available tools, matrix correlation analyses (e.g. Mantel tests or mixed models) are commonly used to test for the relationship between pairwise genetic distances and movement costs incurred by dispersing individuals. When organisms are spatially clustered, a population‐based sampling scheme (PSS) is usually performed, so that a large number of genotypes can be used to compute pairwise genetic distances on the basis of allelic frequencies. Because of financial constraints, this kind of sampling scheme implies a drastic reduction in the number of sampled aggregates, thereby reducing sampling coverage at the landscape level. We used matrix correlation analyses on simulated and empirical genetic data sets to investigate the efficiency of an individual‐based sampling scheme (ISS) in detecting isolation‐by‐distance and isolation‐by‐barrier patterns. Provided that pseudo‐replication issues are taken into account (e.g. through restricted permutations in Mantel tests), we showed that the use of interindividual measures of genotypic dissimilarity may efficiently replace interpopulation measures of genetic differentiation: the sampling of only three or four individuals per aggregate may be sufficient to efficiently detect specific genetic patterns in most situations. The ISS proved to be a promising methodological alternative to the more conventional PSS, offering much flexibility in the spatial design of sampling schemes and ensuring an optimal representativeness of landscape heterogeneity in data, with few aggregates left unsampled. Each strategy offering specific advantages, a combined use of both sampling schemes is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号