首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.  相似文献   

2.
Segmental duplications and copy-number variation in the human genome   总被引:33,自引:0,他引:33       下载免费PDF全文
The human genome contains numerous blocks of highly homologous duplicated sequence. This higher-order architecture provides a substrate for recombination and recurrent chromosomal rearrangement associated with genomic disease. However, an assessment of the role of segmental duplications in normal variation has not yet been made. On the basis of the duplication architecture of the human genome, we defined a set of 130 potential rearrangement hotspots and constructed a targeted bacterial artificial chromosome (BAC) microarray (with 2,194 BACs) to assess copy-number variation in these regions by array comparative genomic hybridization. Using our segmental duplication BAC microarray, we screened a panel of 47 normal individuals, who represented populations from four continents, and we identified 119 regions of copy-number polymorphism (CNP), 73 of which were previously unreported. We observed an equal frequency of duplications and deletions, as well as a 4-fold enrichment of CNPs within hotspot regions, compared with control BACs (P < .000001), which suggests that segmental duplications are a major catalyst of large-scale variation in the human genome. Importantly, segmental duplications themselves were also significantly enriched >4-fold within regions of CNP. Almost without exception, CNPs were not confined to a single population, suggesting that these either are recurrent events, having occurred independently in multiple founders, or were present in early human populations. Our study demonstrates that segmental duplications define hotspots of chromosomal rearrangement, likely acting as mediators of normal variation as well as genomic disease, and it suggests that the consideration of genomic architecture can significantly improve the ascertainment of large-scale rearrangements. Our specialized segmental duplication BAC microarray and associated database of structural polymorphisms will provide an important resource for the future characterization of human genomic disorders.  相似文献   

3.
Teshima KM  Innan H 《Genetics》2012,190(3):1077-1086
We develop a coalescent-based simulation tool to generate patterns of single nucleotide polymorphisms (SNPs) in a wide region encompassing both the original and duplicated genes. Selection on the new duplicated copy and interlocus gene conversion between the two copies are incorporated. This simulation enables us to explore how selection on duplicated copies affects the pattern of SNPs. The fixation of an advantageous duplicated copy causes a strong reduction in polymorphism not only in the duplicated copy but also in its flanking regions, which is a typical signature of a selective sweep by positive selection. After fixation, polymorphism gradually increases by accumulating neutral mutations and eventually reaches the equilibrium value if there is no gene conversion. When gene conversion is active, the number of SNPs in the duplicated copy quickly increases by transferring SNPs from the original copy; therefore, the time when we can recognize the signature of selection is decreased. Because this effect of gene conversion is restricted only to the duplicated region, more power to detect selection is expected if a flanking region to the duplicated copy is used.  相似文献   

4.
Horne I  Haritos VS 《Gene》2008,411(1-2):27-37
We have examined a highly dynamic section of the Drosophila melanogaster genome which contains neutral lipase family genes that have undergone multiple tandem duplication events. We have identified the orthologous clusters, encoding between five and eight apparently functional lipases, in other Drosophila genomes: yakuba, ananassae, pseudoobscura, virilis, mojavensis, persimilis, grimshawi and willistoni. We examined their gene structure, duplication and pseudogene formation, and the presence of transposable elements. Based on phylogenetic comparisons, the lipase genes contained in each of the clusters fall into four distinct clades. Clades I and II have distinct evolutionary constraints to clades III and IV. Multiple gene duplications have occurred in different lineages of clades I and II while clades III and IV contain a single lipase gene from each species. Compared with lipases from other clades, clade IV genes contain an additional 3' domain of tandemly repeated sequence of varying length and composition, and a substitution in the residue adjacent to the key catalytic serine in the encoded proteins. A comparison of non-synonymous to synonymous nucleotide substitution (dN/dS) rates within each clade showed the highest rate of divergence was between paralogous lipase gene pairs suggesting selection pressure on duplicated genes. Analysis of the encoded lipase protein sequences within each species using PAML identified positively selected sites; structure homology modeling based on human pancreatic lipase indicated many of these residues formed part of the active site of the enzyme. As some of the cluster lipase genes are known to be expressed in the insect midgut and respond to changes in dietary components, we propose that the lipase cluster has undergone dynamic evolutionary changes to maximize absorption of lipid nutrients from the diet.  相似文献   

5.
The coalescent with gene conversion   总被引:7,自引:0,他引:7  
Wiuf C  Hein J 《Genetics》2000,155(1):451-462
In this article we develop a coalescent model with intralocus gene conversion. The distribution of the tract length is geometric in concordance with results published in the literature. We derive a simulation scheme and deduce a number of analytical results for this coalescent with gene conversion. We compare patterns of variability in samples simulated according to the coalescent with recombination with similar patterns simulated according to the coalescent with gene conversion alone. Further, an expression for the expected number of topology shifts in a sample of present-day sequences caused by gene conversion events is derived.  相似文献   

6.
Ancient and recent duplications of the rainbow trout Wilms' tumor gene.   总被引:4,自引:0,他引:4  
The Wilms' tumor suppressor (WT1) gene plays an important role in the development and functioning of the genitourinary system, and mutations in this gene are associated with nephroblastoma formation in humans. Rainbow trout (Oncorhynchus mykiss) is one of the rare animal models that readily form nephroblastomas, yet trout express three distinct WT1 genes, one of which is duplicated and inherited tetrasomically. Sequence analyses suggest an ancient gene duplication in the common ancestor of bony fishes resulted in the formation of two WT1 gene families, that conserve the splicing variations of tetrapod WT1, and a second duplication event occurred in the trout lineage. The WT1 genes of one family map to linkage groups 6 and 27 in the trout genome map. Reverse transcribed polymerase chain reaction (RT-PCR) expression analysis demonstrated little difference in W  相似文献   

7.
Island models and the coalescent process   总被引:2,自引:1,他引:1  
Using a coalescent approach, we derive several classical results and extend them to more general models. We find that the classic result for constant population size and constant migration rates holds in models with varying population size and varying migration rates with the obvious substitution of effective population size and mean migration fraction. In addition, the relationship of a 'local' F ST to local gene flow is derived. This result may be useful for analysing gene flow in a regional subset of a large global population, using only data from the regional subset.  相似文献   

8.
H W Sheppard  G A Gutman 《Cell》1982,29(1):121-127
We have cloned DNA segments containing the Jk genes from LOUVAIN rat liver, and have determined their nucleotide sequence. Seven readily identifiable Jk-coding regions (six expressible) are evident in the rat, compared with five in the mouse (four expressible). The two additional J segments in the rat appear to be the result of two sequential gene duplications occurring since the divergence of rats and mice. The first involved a homologous but unequal crossing-over in a 14 bp region spanning the 3' end of the coding region of J1 and J2. The second involved a crossing-over following unequal pairing of the two newly duplicated regions. We propose that the probability of a second duplication was greatly increased following the first as a result of the increased target for unequal pairing (370 bp of good homology versus 27 bp in the original pairing). Comparisons of rat and mouse J genes show a surprisingly high degree of sequence conservation, both inside and outside the coding regions, similar to the pattern we reported previously for the kappa constant-region gene. This provides additional evidence that constraints exist on the nucleotide sequences of these genes independent of the function of the encoded proteins.  相似文献   

9.
  1. Download : Download high-res image (280KB)
  2. Download : Download full-size image
  相似文献   

10.
Chordoma is a rare bone cancer that is believed to originate from notochordal remnants. We previously identified germline T duplication as a major susceptibility mechanism in several chordoma families. Recently, a common genetic variant in T (rs2305089) was significantly associated with the risk of sporadic chordoma. We sequenced all T exons in 24 familial cases and 54 unaffected family members from eight chordoma families (three with T duplications), 103 sporadic cases, and 160 unrelated controls. We also measured T copy number variation in all sporadic cases. We confirmed the association between the previously reported variant rs2305089 and risk of familial [odds ratio (OR) = 2.6, 95 % confidence interval (CI) = 0.93, 7.25, P = 0.067] and sporadic chordoma (OR = 2.85, 95 % CI = 1.89, 4.29, P < 0.0001). We also identified a second common variant, rs1056048, that was strongly associated with chordoma in families (OR = 4.14, 95 % CI = 1.43, 11.92, P = 0.0086). Among sporadic cases, another common variant (rs3816300) was significantly associated with risk when jointly analyzed with rs2305089. The association with rs3816300 was significantly stronger in cases with early age onset. In addition, we identified three rare variants that were only observed among sporadic chordoma cases, all of which have potential functional relevance based on in silico predictions. Finally, we did not observe T duplication in any sporadic chordoma case. Our findings further highlight the importance of the T gene in the pathogenesis of both familial and sporadic chordoma and suggest a complex susceptibility related to T.  相似文献   

11.
We shall extend Kingman's coalescent to the geographically structured population model with migration among colonies. It is described by a continuous-time Markov chain, which is proved to be a dual process of the diffusion process of stepping-stone model. We shall derive a system of equations for the spatial distribution of a common ancestor of sampled genes from colonies and the mean time to getting to one common ancestor. These equations are solved in three particular models; a two-population model, the island model and the one-dimensional stepping-stone model with symmetric nearest-neighbour migration.  相似文献   

12.

Background  

Rice is an important staple food and, with the smallest cereal genome, serves as a reference species for studies on the evolution of cereals and other grasses. Therefore, decoding its entire genome will be a prerequisite for applied and basic research on this species and all other cereals.  相似文献   

13.
Copy number variants (CNVs) are pervasive in several animal and plant genomes and contribute to shaping genetic diversity. In barley, there is evidence that changes in gene copy number underlie important agronomic traits. The recently released reference sequence of barley represents a valuable genomic resource for unveiling the incidence of CNVs that affect gene content and for identifying sequence features associated with CNV formation. Using exome sequencing and read count data, we detected 16 605 deletions and duplications that affect barley gene content by surveying a diverse panel of 172 cultivars, 171 landraces, 22 wild relatives and other 32 uncategorized domesticated accessions. The quest for segmental duplications (SDs) in the reference sequence revealed many low‐copy repeats, most of which overlap predicted coding sequences. Statistical analyses revealed that the incidence of CNVs increases significantly in SD‐rich regions, indicating that these sequence elements act as hot spots for the formation of CNVs. The present study delivers a comprehensive genome‐wide study of CNVs affecting barley gene content and implicates SDs in the molecular mechanisms that lead to the formation of this class of CNVs.  相似文献   

14.
Structural variation is an important cause of genetic variation. Whole genome analysis techniques can efficiently identify copy-number variable regions but there is a need for targeted methods, to verify and accurately size variable regions, and to diagnose large sample cohorts. We have developed a technique based on multiplex amplification of size-coded selectively circularized genomic fragments, which is robust, cheaper and more rapid than current multiplex targeted copy-number assays.  相似文献   

15.
We describe a forward-time haploid reproduction model with a constant population size that includes life history characteristics common to many marine organisms. We develop coalescent approximations for sample gene genealogies under this model and use these to predict patterns of genetic variation. Depending on the behavior of the underlying parameters of the model, the approximations are coalescent processes with simultaneous multiple mergers or Kingman’s coalescent. Using simulations, we apply our model to data from the Pacific oyster and show that our model predicts the observed data very well. We also show that a fact which holds for Kingman’s coalescent and also for general coalescent trees–that the most-frequent allele at a biallelic locus is likely to be the ancestral allele–is not true for our model. Our work suggests that the power to detect a “sweepstakes effect” in a sample of DNA sequences from marine organisms depends on the sample size.  相似文献   

16.
Several eukaryotic genomes have been completely sequenced and this provides an opportunity to investigate the extent and characteristics (e.g., single gene duplication, block duplication, etc.) of gene duplication in a genome. Detecting duplicate genes in a genome, however, is not a simple problem because of several complications such as domain shuffling, the existence of isoforms derived from alternative splicing, and annotational errors in the databases. We describe a method for overcoming these difficulties and the extents of gene duplication in the genomes of Drosophila melanogaster, Caenorhabditis elegans, and yeast inferred from this method. We also describe a method for detecting block duplications in a genome. Application of this method showed that block duplication is a common phenomenon in both yeast and nematode. The patterns of block duplication in the two species are, however, markedly different. Yeast shows much more extensive block duplication than nematode, with some chromosomes having more than 40% of the duplications derived from block duplications. Moreover, in yeast the majority of block duplications occurred between chromosomes, while in nematode most block duplications occurred within chromosomes.  相似文献   

17.
The interactions between tPA domains that are important for catalysis are poorly understood. We have probed the function of interdomain interactions by generating tPA variants in which domains are duplicated or rearranged. The proteins were expressed in a transient mammalian expression system and tested in vitro for their ability to activate plasminogen, induce fibrinolysis and bind to a forming fibrin clot. Duplication of the heavy chain domains of tPA produced enzymatically active tPA variants, many of which demonstrated similar in vitro amidolytic and fibrinolytic activity and similar fibrin affinity to the parent molecule. Zymographic analysis of the domain duplication tPA variants showed one major active species for each variant. Selection of the residues duplicated and the interdomain spacing were found to be critical considerations in the design of tPA variants with duplicated domains. We also rearranged the domains of tPA such that kringle 1 replaced the second kringle domain and vice versa. An analysis of these variants indicates that the first kringle domain can confer fibrin affinity to a tPA variant and function in place of kringle 2. Therefore, in wild-type tPA, the functions of kringle 1 and kringle 2 must be dependent partially on their orientation within the heavy chain of the protein. The functional autonomy of the heavy and light chains of tPA is demonstrated by the activity of a tPA variant in which the order of the heavy and light chains was reversed.  相似文献   

18.
Abstract.— The genealogies of samples of orthologous regions from multiple species can be classified by their shapes. Using a neutral coalescent model of two species, I give exact probabilities of each of four possible genealogical shapes: reciprocal monophyly, two types of paraphyly, and polyphyly. After the divergence that forms two species, each of which has population size N , polyphyly is the most likely genealogical shape for the lineages of the two species. At ∼ 1.300 N generations after divergence, paraphyly becomes most likely, and reciprocal monophyly becomes most likely at ∼1.665 N generations. For a given species, the time at which 99% of its loci acquire monophyletic genealogies is ∼5.298 N generations, assuming all loci in its sister species are monophyletic. The probability that all lineages of two species are reciprocally monophyletic given that a sample from the two species has a reciprocally monophyletic genealogy increases rapidly with sample size, as does the probability that the most recent common ancestor (MRCA) for a sample is also the MRCA for all lineages from the two species. The results have potential applications for the testing of evolutionary hypotheses.  相似文献   

19.
A population genetic model with a single locus at which balancing selection acts and many linked loci at which neutral mutations can occur is analysed using the coalescent approach. The model incorporates geographic subdivision with migration, as well as mutation, recombination, and genetic drift of neutral variation. It is found that geographic subdivision can affect genetic variation even with high rates of migration, providing that selection is strong enough to maintain different allele frequencies at the selected locus. Published sequence data from the alcohol dehydrogenase locus of Drosophila melanogaster are found to fit the proposed model slightly better than a similar model without subdivision.  相似文献   

20.
Lohse K  Harrison RJ  Barton NH 《Genetics》2011,189(3):977-987
Analysis of genomic data requires an efficient way to calculate likelihoods across very large numbers of loci. We describe a general method for finding the distribution of genealogies: we allow migration between demes, splitting of demes [as in the isolation-with-migration (IM) model], and recombination between linked loci. These processes are described by a set of linear recursions for the generating function of branch lengths. Under the infinite-sites model, the probability of any configuration of mutations can be found by differentiating this generating function. Such calculations are feasible for small numbers of sampled genomes: as an example, we show how the generating function can be derived explicitly for three genes under the two-deme IM model. This derivation is done automatically, using Mathematica. Given data from a large number of unlinked and nonrecombining blocks of sequence, these results can be used to find maximum-likelihood estimates of model parameters by tabulating the probabilities of all relevant mutational configurations and then multiplying across loci. The feasibility of the method is demonstrated by applying it to simulated data and to a data set previously analyzed by Wang and Hey (2010) consisting of 26,141 loci sampled from Drosophila simulans and D. melanogaster. Our results suggest that such likelihood calculations are scalable to genomic data as long as the numbers of sampled individuals and mutations per sequence block are small.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号