It is understood that DNA and amino acid substitution rates are highly sequence context-dependent, e.g., C --> T substitutions in vertebrates may occur much more frequently at CpG sites and that cysteine substitution rates may depend on support of the context for participation in a disulfide bond. Furthermore, many applications rely on quantitative models of nucleotide or amino acid substitution, including phylogenetic inference and identification of amino acid sequence positions involved in functional specificity. We describe quantification of the context dependence of nucleotide substitution rates using baboon, chimpanzee, and human genomic sequence data generated by the NISC Comparative Sequencing Program. Relative mutation rates are reported for the 96 classes of mutations of the form 5' alphabetagamma 3' --> 5' alphadeltagamma 3', where alpha, beta, gamma, and delta are nucleotides and beta not equal delta, based on maximum likelihood calculations. Our results confirm that C --> T substitutions are enhanced at CpG sites compared with other transitions, relatively independent of the identity of the preceding nucleotide. While, as expected, transitions generally occur more frequently than transversions, we find that the most frequent transversions involve the C at CpG sites (CpG transversions) and that their rate is comparable to the rate of transitions at non-CpG sites. A four-class model of the rates of context-dependent evolution of primate DNA sequences, CpG transitions > non-CpG transitions approximately CpG transversions > non-CpG transversions, captures qualitative features of the mutation spectrum. We find that despite qualitative similarity of mutation rates among different genomic regions, there are statistically significant differences.  相似文献   

Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species.  相似文献   

Interest in methods that estimate speciation and extinction rates from molecular phylogenies has increased over the last decade. The application of such methods requires reliable estimates of tree topology and node ages, which are frequently obtained using standard phylogenetic inference combining concatenated loci and molecular dating. However, this practice disregards population‐level processes that generate gene tree/species tree discordance. We evaluated the impact of employing concatenation and coalescent‐based phylogeny inference in recovering the correct macroevolutionary regime using simulated data based on the well‐established diversification rate shift of delphinids in Cetacea. We found that under scenarios of strong incomplete lineage sorting, macroevolutionary analysis of phylogenies inferred by concatenating loci failed to recover the delphinid diversification shift, while the coalescent‐based tree consistently retrieved the correct rate regime. We suggest that ignoring microevolutionary processes reduces the power of methods that estimate macroevolutionary regimes from molecular data.  相似文献   

耐旱种质创新对农业节水和培育水稻新品种具有重要意义。本研究以SSR标记检测了以空心莲子草DNA溶液浸胚处理获得的10个农艺性状稳定遗传的水稻变异体,结果表明,10个水稻变异体均整合了供体空心莲子草DNA的不同片段。在此基础上,以8个导入空心莲子草DNA的水稻导入系及2个对照为试验材料,采用二因素裂区设计,并以主成分分析、逐步回归分析等多种统计方法分析了导入系的耐旱性。结果表明,以综合评价指标与耐旱指数相结合的复合评价体系,可增强水稻耐旱评价的可靠性。导入系H8最耐旱,H6和H7较耐旱,均优于巴西陆稻。本研究结果对水稻的耐旱性评价与耐旱品种选育具有重要意义。  相似文献   

Secondary contact in close relatives can result in hybridization and the admixture of previously isolated gene pools. However, after an initial period of hybridization, reproductive isolation can evolve through different processes and lead to the interruption of gene flow and the completion of the speciation process. Omocestus minutissimus and Ouhagonii are two closely related grasshoppers with partially overlapping distributions in the Central System mountains of the Iberian Peninsula. To analyse spatial patterns of historical and/or contemporary hybridization between these two taxa and understand how species boundaries are maintained in the region of secondary contact, we sampled sympatric and allopatric populations of the two species and obtained genome‐wide single nucleotide polymorphism data using a restriction site‐associated DNA sequencing approach. We used Bayesian clustering analyses to test the hypothesis of contemporary hybridization in sympatric populations and employed a suite of phylogenomic approaches and a coalescent‐based simulation framework to evaluate alternative hypothetical scenarios of interspecific gene flow. Our analyses rejected the hypothesis of contemporary hybridization but revealed past introgression in the area where the distributions of the two species overlap. Overall, these results point to a scenario of historical gene flow after secondary contact followed by the evolution of reproductive isolation that currently prevents hybridization among sympatric populations.  相似文献   

油菜是目前我国主要种植的油料作物之一,但现有的种质资源限制了产量的进一步提高。本研究采取了一种新的育种方式来增加甘蓝型油菜的种质资源,即通过远缘杂交结合分子标记辅助选择的方式将白菜型油菜的Ar基因组和埃塞俄比亚芥的Cc对现有的甘蓝型油菜品种的基因组(AnAnCnCn)进行部分替换。通过对五倍体杂交后代(ArAnBcCcCn)进行染色体选择,找到了染色体数目为38的材料。为了和现有的甘蓝型油菜进行区分,得到的新材料被认定为甘蓝型油菜新材料。实验结果表明,得到的部分甘蓝型油菜新材料具有基本正常的减数分裂过程、正常的花粉萌发以及胚囊发育过程,这说明甘蓝型油菜新材料达到了遗传平衡。分子标记分析表明:甘蓝型油菜新材料的约50%的基因组被白菜型油菜的Ar基因组和埃塞俄比亚芥的Cc替换,并且这些甘蓝型油菜新材料之间具有丰富的遗传多样性。因此,白菜型油菜的Ar基因组和埃塞俄比亚芥的Cc基因组导入对于丰富现有的甘蓝型油菜种质资源具有明显的效果。  相似文献   

Bayesian inference operates under the assumption that the empirical data are a good statistical fit to the analytical model, but this assumption can be challenging to evaluate. Here, we introduce a novel r package that utilizes posterior predictive simulation to evaluate the fit of the multispecies coalescent model used to estimate species trees. We conduct a simulation study to evaluate the consistency of different summary statistics in comparing posterior and posterior predictive distributions, the use of simulation replication in reducing error rates and the utility of parallel process invocation towards improving computation times. We also test P2C2M on two empirical data sets in which hybridization and gene flow are suspected of contributing to shared polymorphism, which is in violation with the coalescent model: Tamias chipmunks and Myotis bats. Our results indicate that (i) probability‐based summary statistics display the lowest error rates, (ii) the implementation of simulation replication decreases the rate of type II errors, and (iii) our r package displays improved statistical power compared to previous implementations of this approach. When probabilistic summary statistics are used, P2C2M corroborates the assumption that genealogies collected from Tamias and Myotis are not a good fit to the multispecies coalescent model. Taken as a whole, our findings argue that an assessment of the fit of the multispecies coalescent model should accompany any phylogenetic analysis that estimates a species tree.  相似文献   

The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.  相似文献   

Urban-scale traffic monitoring plays a vital role in reducing traffic congestion. Owing to its low cost and wide coverage, floating car data (FCD) serves as a novel approach to collecting traffic data. However, sparse probe data represents the vast majority of the data available on arterial roads in most urban environments. In order to overcome the problem of data sparseness, this paper proposes a hidden Markov model (HMM)-based traffic estimation model, in which the traffic condition on a road segment is considered as a hidden state that can be estimated according to the conditions of road segments having similar traffic characteristics. An algorithm based on clustering and pattern mining rather than on adjacency relationships is proposed to find clusters with road segments having similar traffic characteristics. A multi-clustering strategy is adopted to achieve a trade-off between clustering accuracy and coverage. Finally, the proposed model is designed and implemented on the basis of a real-time algorithm. Results of experiments based on real FCD confirm the applicability, accuracy, and efficiency of the model. In addition, the results indicate that the model is practicable for traffic estimation on urban arterials and works well even when more than 70% of the probe data are missing.  相似文献   

标记辅助导入中不同前景和背景选择方法的比较   总被引:5,自引:0,他引:5  
白俊艳  张勤  贾小平 《遗传学报》2006,33(12):1073-1080
标记辅助导入是分子遗传信息应用于动物育种的一个重要方面,其目的是在标记信息的辅助下将一个品种(供体)中的一个或多个优良基因导入另一个品种(受体),同时还要尽可能地保持受体群体原有的遗传背景。标记辅助导入的过程包括3个阶段,第一阶段是杂交,即供体与受体杂交产生F1代个体,第二阶段是回交,即F1个体以及后续各个世代的后代个体重复地与受体回交,以使受体的遗传背景得到恢复,第三阶段是横交,即重复回交后得到的个体彼此问交配,以便获得供体基因的纯合个体,使该基因在群体中固定。在回交和横交阶段,都要对参与交配的个体进行选择。在选择中,要分别进行前景选择和背景选择,前景选择是对供体基因的选择,选择携带有供体基因个体参加配种,从而使该基因在回交过程中不会丢失,并在横交过程中能尽快固定,背景选择是对受体遗传背景的选择,选择那些含有受体基因组比例较高的个体参加配种,从而加快恢复受体遗传背景的速度。本研究通过计算机模拟对不同的前景选择方法和不同的背景选择方法进行了比较。前景选择方法包括对受体基因的直接选择(假设该基冈可以直接测定)、利用单个连锁标记的间接选择和利用两侧标记的间接选择3种,背景选择方法包括随机选择、基因组相似性选择、指数选择和标记辅助BLUP(MBLUP)选择4种。研究结果表明,对于前景选择来说,对供体基因的直接选择能保证该基因在回交的各个世代中保持一个稳定的频率(0.25)并在横交阶段迅速固定(2个世代),用两侧标记的间接选择也能得到类似的结果,但如果仅利用单个连锁标记进行选择,则会导致供体基因的频率在回交阶段中有所下降,并在横交阶段不能被固定。对于背景选择来说,如果最终的目的是要完全恢复受体的遗传背景,基因组相似性选择或标记指数选择是最好的选择方法,它们可使受体的遗传背景在回交3个世代后就恢复到98%以上,而随机选择或MBLUP选择需要至少5个世代的回交才能达到这个水平。但如果最终的目的只是要恢复受体的某些优良性状,则MBLUP选择是值得推荐的方法,它可使影响这些性状的受体基因频率在回交3个世代后就达到99%以上,而且还能在整个基因导入过程中给这些性状带来最大的遗传进展。虽然用标记指数选择也有相似的结果,但与之相比,MBLUP的成本要低得多,更具有实际可行性。  相似文献   

We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005–0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level.  相似文献   

Introgression of mtDNA appears common in animals, but the implications of acquiring a novel mitochondrial genome are not well known. This study investigates mito‐genome introgression between the lizard species Urosaurus graciosus, a thermal specialist, and U. ornatus, a species that occupies a wider range of thermal environments. As ectotherms, their metabolic rate is strongly influenced by the thermal environment; with mitochondria being linked to metabolic rates, overall energy budgets could be impacted by introgression. I use mitochondrial gene trees, inferred from Bayesian analyses of Cyt‐B and ND1 gene sequences, along with morphology and microsatellites from nineteen populations of these two species to address if the direction and location of mito‐nuclear discordance match predictions of introgression resulting from past population expansions. MtDNA is expected to move from resident species into expanding or invading species. Second, does having a heterospecific form of mitochondria impact body size, a trait strongly associated with fitness? Multiple independent introgression events of historic origin were detected. All introgression was unidirectional with U. ornatus‐type mtDNA found in U. graciosus parental type individuals. This result was consistent with population expansions detected in U. graciosus but not U. ornatus. Females with heterospecific mtDNA were significantly smaller than homospecific forms, and heterospecific males had a different relationship of body mass to body length than those with homospecific mtDNA. These changes indicate a potential selective disadvantage for individuals with heterospecific mitochondria and are consistent with the theoretical expectation that deleterious alleles are more likely to persist in expanding populations.  相似文献   

Approximate nonparametric maximum likelihood estimation of the tumor incidence rate and comparison of tumor incidence rates between treatment groups are examined in the context of animal carcinogenicity experiments that have interval sacrifice data but lack cause-of-death information. The estimation procedure introduced by MALANI and VAN RYZIN (1988), which can result in a negative estimate of the tumor incidence rate, is modified by employing a numerical method to maximize the likelihood function iteratively, under the constraint that the tumor incidence rate is nonnegative. With the new procedure, estimates can be obtained even if sacrifices occur anywhere within an interval. The resulting estimates have reduced standard error and give more power to the test of two heterogeneous groups. Furthermore, a linear contrast of more than two groups can be tested using our procedure. The proposed estimation and testing methods are illustrated with an experimental data set.  相似文献   

