首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Haplotyping as perfect phylogeny: a direct approach.   总被引:4,自引:0,他引:4  
A full haplotype map of the human genome will prove extremely valuable as it will be used in large-scale screens of populations to associate specific haplotypes with specific complex genetic-influenced diseases. A haplotype map project has been announced by NIH. The biological key to that project is the surprising fact that some human genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected (Helmuth, 2001; Daly et al., 2001; Stephens et al., 2001; Friss et al., 2001). In this paper we explore the algorithmic implications of the no-recombination in long blocks observation, for the problem of inferring haplotypes in populations. This assumption, together with the standard population-genetic assumption of infinite sites, motivates a model of haplotype evolution where the haplotypes in a population are assumed to evolve along a coalescent, which as a rooted tree is a perfect phylogeny. We consider the following algorithmic problem, called the perfect phylogeny haplotyping problem (PPH), which was introduced by Gusfield (2002) - given n genotypes of length m each, does there exist a set of at most 2n haplotypes such that each genotype is generated by a pair of haplotypes from this set, and such that this set can be derived on a perfect phylogeny? The approach taken by Gusfield (2002) to solve this problem reduces it to established, deep results and algorithms from matroid and graph theory. Although that reduction is quite simple and the resulting algorithm nearly optimal in speed, taken as a whole that approach is quite involved, and in particular, challenging to program. Moreover, anyone wishing to fully establish, by reading existing literature, the correctness of the entire algorithm would need to read several deep and difficult papers in graph and matroid theory. However, as stated by Gusfield (2002), many simplifications are possible and the list of "future work" in Gusfield (2002) began with the task of developing a simpler, more direct, yet still efficient algorithm. This paper accomplishes that goal, for both the rooted and unrooted PPH problems. It establishes a simple, easy-to-program, O(nm(2))-time algorithm that determines whether there is a PPH solution for input genotypes and produces a linear-space data structure to represent all of the solutions. The approach allows complete, self-contained proofs. In addition to algorithmic simplicity, the approach here makes the representation of all solutions more intuitive than in Gusfield (2002), and solves another goal from that paper, namely, to prove a nontrivial upper bound on the number of PPH solutions, showing that that number is vastly smaller than the number of haplotype solutions (each solution being a set of n pairs of haplotypes that can generate the genotypes) when the perfect phylogeny requirement is not imposed.  相似文献   

2.
Inferring haplotype data from genotype data is a crucial step in linking SNPs to human diseases. Given n genotypes over m SNP sites, the haplotype inference (HI) problem deals with finding a set of haplotypes so that each given genotype can be formed by a combining a pair of haplotypes from the set. The perfect phylogeny haplotyping (PPH) problem is one of the many computational approaches to the HI problem. Though it was conjectured that the complexity of the PPH problem was O(nm), the complexity of all the solutions presented until recently was O(nm (2)). In this paper, we make complete use of the column-ordering that was presented earlier and show that there must be some interdependencies among the pairwise relationships between SNP sites in order for the given genotypes to allow a perfect phylogeny. Based on these interdependencies, we introduce the FlexTree (flexible tree) data structure that represents all the pairwise relationships in O(m) space. The FlexTree data structure provides a compact representation of all the perfect phylogenies for the given set of genotypes. We also introduce an ordering of the genotypes that allows the genotypes to be added to the FlexTree sequentially. The column ordering, the FlexTree data structure, and the row ordering we introduce make the O(nm) OPPH algorithm possible. We present some results on simulated data which demonstrate that the OPPH algorithm performs quiet impressively when compared to the previous algorithms. The OPPH algorithm is one of the first O(nm) algorithms presented for the PPH problem.  相似文献   

3.
The Pure Parsimony Haplotyping (PPH) problem is a NP-hard combinatorial optimization problem that consists of finding the minimum number of haplotypes necessary to explain a given set of genotypes. PPH has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from mapping complex disease genes to inferring population histories, passing through designing drugs, functional genomics and pharmacogenetics. In this article we investigate, for the first time, a recent version of PPH called the Pure Parsimony Haplotype problem under Uncertain Data (PPH-UD). This version mainly arises when the input genotypes are not accurate, i.e., when some single nucleotide polymorphisms are missing or affected by errors. We propose an exact approach to solution of PPH-UD based on an extended version of Catanzaro et al.[1] class representative model for PPH, currently the state-of-the-art integer programming model for PPH. The model is efficient, accurate, compact, polynomial-sized, easy to implement, solvable with any solver for mixed integer programming, and usable in all those cases for which the parsimony criterion is well suited for haplotype estimation.  相似文献   

4.
The individual haplotyping problem is a computing problem of reconstructing two haplotypes for an individual based on several optimal criteria from one's fragments sequencing data. This paper is based on the fact that the length of a fragment and the number of the fragments covering a SNP (single nucleotide polymorphism) site are both very small compared with the length of a sequenced region and the total number of the fragments and introduces the parameterized haplotyping problems. With m fragments whose maximum length is k(1), n SNP sites and the number of the fragments covering a SNP site no more than k(2), our algorithms can solve the gapless MSR (Minimum SNP Removal) and MFR (Minimum Fragment Removal) problems in the time complexity O(nk(1)k(2) + m log m + nk(2) + mk(1)) and O(mk(2)(2) + mk(1) k(2) + m log m + nk(2) + mk(1))respectively. Since, the value of k(1) and k(2) are both small (about 10) in practice, our algorithms are more efficient and applicable compared with the algorithms of V. Bafna et al. of time complexity O(mn(2)) and O(m(2)n + m(3)), respectively.  相似文献   

5.
A Zoossmann-Diskin 《HOMO》2006,57(1):87-100
The article on the Y chromosomes of Ashkenazi Levites (Behar et al., 2003. Am. J. Hum. Genet. 73, 768-779) is the fourth in a series on the Y chromosomes of the three Jewish male castes: Cohanim (priests), Levites (priests' helpers) and Israelites (lay people). It became apparent that there is a problem with omission of samples when the second article "Origins of Old Testament priests" (Thomas et al., 1998. Nature 394, 138-140) was published. In the fourth article a remarkable 55% of the Ashkenazi Levite samples from the earlier 1998 study are not included. This causes the "Levite modal haplotype" to double its frequency from 21% of the Ashkenazi Levite sample in 1998 to 42% of the Ashkenazi Levite sample in 2003. The authors offer three main explanations: The explanations offered to the problem of omitting samples from subsequent studies after their haplotypes or partial haplotypes are known, are not convincing. Consequently their sample sets cannot be considered random and non-biased. At the least, these laboratories have bad practices of sample handling and many typing errors, which are enough to invalidate their studies.  相似文献   

6.
In Part 1 of this study (Weinbaum et al., 1988) a short time model has been proposed to describe the initial time dependent leakage of macromolecules at short distances (5 microns or less) from the exit of a transient open junction which the authors have hypothesized as a characteristic feature of endothelial cells in the process of turnover (Weinbaum et al., 1985). This open junction pathway has also been proposed (Weinbaum et al., 1988) to be the primary ultrastructural correlate of the 20 nm diameter large pore suggested by Renkin et al. (1977) using the predictions of cylindrical pore theory. The short time model in (Weinbaum et al., 1988), however, has major limitations in that it neglects the interaction between leakage sites, macromolecular entry through other pathways, the finite thickness of the vessel wall and the curvature of the cell perimeter. The longer time model developed herein will attempt to describe each of these features and also present an improved model and analytic solution for the steady state flux and uptake. In the previous steady state model developed by Weinbaum et al. (1985) the effect of the resistance of the transient open junctions and the non-isotropic diffusion in the underlying tissue due to the internal elastic lamina (IEL) were both neglected. New solutions are first presented which describe the effect of these important model refinements on the steady state macromolecular permeability of the major arteries. Time dependent solutions are then presented to predict the transient longer time labeling following the introduction of tracer macromolecules of varying size. These solutions and the corresponding short time solutions in Weinbaum et al. (1988) are the first solutions to our knowledge to describe the difficult time-dependent boundary value problem to determine how the channel exit concentration and flux at a leaky junction vary with time. This is accomplished by casting the boundary value problem in the form of an integral equation for the unknown flux at the cleft exit and then solving this problem using a specially designed numerical technique. The theoretical predictions are used to interpret the behavior of the localized leaks to HRP and albumin that have been reported in Stemerman et al. (1986) and our own recent experiments (Lin et al., 1988).  相似文献   

7.
It is known (Reidys et al., 1997b. Bull. Math. Biol. 59(2), 339-397) that for any two secondary structures S,S' there exists an RNA sequence compatible with both, and that this result does not extend to more than two secondary structures. Indeed, a simple formula for the number of RNA sequences compatible with secondary structures S,S' plays a role in the algorithms of Flamm et al. (2001. RNA 7, 254-265) and of Abfalter et al. (2003. Proceedings of the German Conference on Bioinformatics, ) to design an RNA switch. Here we show that a natural extension of this problem is NP-complete. Unless P=NP, there is no polynomial time algorithm, which when given secondary structures S1,...,S(k), for k4, determines the least number of positions, such that after removal of all base pairs incident to these positions there exists an RNA nucleotide sequence compatible with the given secondary structures. We also consider a restricted version of this problem with a "fixed maximum" number of possible stars and show that it has a simple polynomial time solution.  相似文献   

8.
Prostate cancer is one of the most common malignancies.The development and progression of prostate cancer are driven by a series of genetic and epigenetic events including gene amplification that activates oncogenes and chromosomal deletion that inactivates tumor suppressor genes.Whereas gene amplification occurs in human prostate cancer,gene deletion is more common,and a large number of chromosomal regions have been identified to have frequent deletion in prostate cancer,suggesting that tumor suppressor inactivation is more common than oncogene activation in prostatic carcinogenesis (Knuutila et al.,1998,1999;Dong,2001).Among the most frequently deleted chromosomal regions in prostate cancer,target genes such as NKX3-1 from 8p21,PTENfrom 10q23 andATBF1 from 16q22 have been identified by different approaches (He et al.,1997;Li et al.,1997;Sun et al.,2005),and deletion of these genes in mouse prostates has been demonstrated to induce and/or promote prostatic carcinogenesis.For example,knockout of Nkx3-1 in mice induces hyperplasia and dysplasia (Bhatia-Gaur et al.,1999;Abdulkadir et al.,2002) and promotes prostatic tumorigenesis (Abate-Shen et al.,2003),while knockout of Pten alone causes prostatic neoplasia (Wang et al.,2003).Therefore,gene deletion plays a causal role in prostatic carcinogenesis (Dong,2001).  相似文献   

9.
Expression of recombinant protein in Escherichia coli (E.coli) is generally considered as one of the ideal systems to produce proteins for industrial production.However,the majority of proteins usually fail to fold into their native state and accumulate as insoluble inclusion bodies with no biological activity in E.coli (Yang et al.,2003).Although numerous strategies and approaches are proposed to solve the problem (Nygaard and Harlow,2001;Mogk et al.,2002;Austin,2003),they still fail to improve the solubility of protein and are not ideal for high-throughput applications (Waldo,2003).Furthermore,with the expression condition becoming stricter,the procedures become more complex and the costs grow higher,thus making them inappropriate for application in industrial production.  相似文献   

10.
Models of cellular osmotic behaviour depend on thermodynamic solution theories to calculate chemical potentials in the solutions inside and outside the cell. These solutions are generally thermodynamically non-ideal under cryobiological conditions. The molality-based Elliott et al. form of the multi-solute osmotic virial equation is a solution theory which has been demonstrated to provide accurate predictions in cryobiological solutions, accounting for the non-ideality of these solutions using solute-specific thermodynamic parameters called osmotic virial coefficients. However, this solution theory requires as inputs the exact concentration of every solute in the solution being modeled, which poses a problem for the cytoplasm, where such detailed information is rarely available. This problem can be overcome by using a grouped solute approach for modeling the cytoplasm, where all the non-permeating intracellular solutes are treated as a single non-permeating “grouped” intracellular solute. We have recently shown (Zielinski et al., J Physical Chemistry B, 2017) that such a grouped solute approach is theoretically valid when used with the Elliott et al. model, and Ross-Rodriguez et al. (Biopreservation and Biobanking, 2012) have previously developed a method for measuring the cell type-specific osmotic virial coefficients of the grouped intracellular solute. However, the Ross-Rodriguez et al. method suffers from a lack of precision, which—as we demonstrate in this work—can severely impact the accuracy of osmotic model predictions under certain conditions. Thus, we herein develop a novel method for measuring grouped intracellular solute osmotic virial coefficients which yields more precise values than the existing method and then apply this new method to measure these coefficients for human umbilical vein endothelial cells.  相似文献   

11.
贵州下寒武统牛蹄塘生物群中海绵新材料   总被引:5,自引:1,他引:4  
描述了贵州下寒武统牛蹄塘生物群中海绵化石1新属(Zunyispongiagen.nov.),2新种(Zunyispongiatriangulariagen.etsp.nov.,Choiafanensis.sp.nov.),通过对其形态功能的分析和讨论证实了寒武纪早期海绵动物的骨骼是由细小骨针向粗大骨针演变,骨架结构从不稳定型向稳定型发展。  相似文献   

12.
13.
We have previously shown that superoxide radical anion (O2.-) reacts with hydroethidine (HE) to form a product that is distinctly different from ethidium (E+) (Zhao et al., Free Radic. Biol. Med. 34:1359; 2003). The structure of this product was recently determined as the 2-hydroxyethidium cation (2-OH-E+) (Zhao et al., Proc. Natl. Acad. Sci. USA 102:5727; 2005). In this study, using HPLC and mass spectrometry techniques, we show that 2-OH-E+ is formed from the reaction between HE and nitrosodisulfonate radical dianion (NDS) or Fremy's salt. The reaction kinetics and mechanism were determined using steady-state and time-resolved optical and EPR techniques. Within the first 50 ms, an intermediate was detected. Another intermediate absorbing strongly at 460 nm and weakly at 670 nm was detected within a second. The structure of this species was assigned to an imino quinone derivative of HE. The stoichiometry of the reaction indicates that two molecules of NDS were needed to oxidize a molecule of HE. We postulate that the first step of the reaction involves the hydrogen atom abstraction from HE to form an aminyl radical that reacts with another molecule of NDS to form an adduct that decomposes to an imino quinone derivative of HE. A similar mechanism has been proposed for the reaction between HE and O2.-. The reaction between HE and the Fremy's salt should provide a facile route for the synthesis of 2-OH-E+, a diagnostic marker product of the HE/O2.- reaction.  相似文献   

14.
特马豆克阶是奥陶系底部第一个阶,笔石是特马豆克阶高分辨率地层划分与对比的重要化石类群。江南斜坡带是我国早奥陶世特马豆克期漂浮笔石分异度和丰度最高的相区之一,位于该区的湖南益阳南坝剖面,发育有完整的上特马豆克阶笔石地层,特马豆克阶-弗洛阶界线附近地层连续,上特马豆克阶笔石地层研究已取得较大进展,但下特马豆克阶地层缺乏系统研究。近年来,通过对该剖面笔石标本的不间断采集,新识别出下特马豆克阶笔石带Rhabdinopora flabelliformis parabola带。到目前为止,湖南益阳南坝剖面特马豆克阶可以识别出5个笔石带,自下而上依次为:Rhabdinopora flabelliformis parabola带、Adelograptus tenellus带、Aorograptus victoriae带、Araneograptus murrayi带以及Hunnegraptus copiosus带。基于目前已识别出的笔石带,参考国内外同期笔石地层资料,本文详细讨论华南特马豆克期笔石带序列,并与国内外同期地层进行精确对比。  相似文献   

15.
贵州遵义松林中南村黑沙坡下寒武统牛蹄塘生物群下部层位产有大型三叶虫,计有2属3种,1未定种:Zhenbaspis subconica S. G Zhang in Lu et al. , 1974, Z. longa Zhou in Lee et al. , 1975 ; Zhenbaspis sp. , Runnania similis Lee in Yin et Lee, , 1978, 确认 Zhenbaspis (Zhenxiongaspis) Lin et Yin in Yin et Lee, 1978,为 Zhenbastn‘sChang et Chu in Lu et al. ,1974的同义名。探讨Zhenbaspis的古地理分布及演化趋势。论文还描述与Zhenbaspis,Runnania共生的Tsunyidiscus及Mianxiandiscus,其中Mianxiandiscus产于下生物群即牛蹄塘组近底部,再次证实牛蹄塘下生物群的时代早于澄江生物群。  相似文献   

16.
Understanding how stem cells are maintained in their microenvironment (the niche) is vital for their application in regenerative medicine. Studies of Drosophila male germline stem cells (GSCs) have served as a paradigm in niche-stem cell biology. It is known that the BMP and JAK-STAT pathways are necessary for the maintenance of GSCs in the testis (Kawase et al., 2004; Kiger et al., 2001; Schulz et al., 2004; Shivdasani and Ingham, 2003; Tulina and Matunis, 2001). However, our recent work strongly suggests that BMP signaling is the primary pathway leading to GSC self-renewal (Leatherman and DiNardo, 2010). Here we show that magu controls GSC maintenance by modulating the BMP pathway. We found that magu was specifically expressed from hub cells, and accumulated at the testis tip. Testes from magu mutants exhibited a reduced number of GSCs, yet maintained a normal population of somatic stem cells and hub cells. Additionally, BMP pathway activity was reduced, whereas JAK-STAT activation was retained in mutant testes. Finally, GSC loss caused by the magu mutation could be suppressed by overactivating the BMP pathway in the germline.  相似文献   

17.
新疆布尔津盆地晚始新世一早渐新世岩石及生物地层   总被引:1,自引:1,他引:0  
布尔津盆地中的第三纪地层过去被称为“乌伦古河组”。该套地层的岩性组合独特 ,不同于乌伦古河流域该组命名剖面的乌伦古河组。依据布尔津县城西北 2 0km处的额尔齐斯河北岸出露的地层剖面 ,建立两个岩石地层单位 :下部额尔齐斯河组和上部克孜勒托尕依组。额尔齐斯河组为一套富含铁质的色彩鲜艳的碎屑堆积 ,克孜勒托尕依组是一套浅黄绿色砂岩夹杂色泥岩地层。克孜勒托尕依村附近的建组剖面为一连续沉积剖面 ,含有 3个确切的哺乳动物化石层位、5个化石地点。最下部化石层含有典型的始新世Ergilian期哺乳动物 ,其余两个层位的化石均属早渐新世Shandgolian期哺乳动物。因此 ,该剖面为一Ergilian Shandgolian(晚始新世—早渐新世 )过渡时期的地层剖面 ,是进一步研究该过渡期哺乳动物群替代及Ergilian/Shandgolian地层界线的理想剖面。  相似文献   

18.
Permutations on strings representing gene clusters on genomes have been studied earlier by Uno and Yagiura (2000), Heber and Stoye (2001), Bergeron et al. (2002), Eres et al. (2003), and Schmidt and Stoye (2004) and the idea of a maximal permutation pattern was introduced by Eres et al. (2003). In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees (Booth and Leuker, 1976): this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters, and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal pi pattern (Eres et al., 2003) and each subgraph of the PQ tree corresponds to a nonmaximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome datasets. In our analysis of the whole genomes of human and rat, we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, with E. Coli K-12 and B. Subtilis genomes, we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters, and when comparing eight different Chloroplast genomes, we found only 77 minimal consensus PQ trees out of about 6,700 gene clusters. Further, we show specific instances of functionally related genes in two of the cases.  相似文献   

19.
R.J. Flower 《Aquatic Ecology》2001,35(3-4):261-280
The CASSARINA Project is a co-ordinated joint study of recent environmental change in North African wetland lakes. Nine primary sites were selected for detailed study comprising three sites in each of Morocco, Tunisia and Egypt. Multi-disciplinary studies were undertaken by scientists from each of these countries working in co-operation with colleagues in the UK and Norway. The detailed results are presented in a consecutive suite of papers that describe both modern ecosystem attributes and the recent environmental histories of each site. This paper presents an overview of the aims, structure and initial results of the project.Modern site attributes measured were water quality and phytoplankton (Fathi et al., 2001), zooplankton (Ramdani et al., 2001b), fish (Kraïem et al., 2001) and littoral vegetation (Ramdani et al., 2001a). Baseline water quality data showed that one site (Megene Chitane) was acid with low salinity but the others had high alkalinities with varying degrees of brackishness. All the sites tended to be eutrophic and the phytoplankton was mainly dominated by green or blue-green algae. Where fish were present, growth rates were high with marginally highest rates in the Egyptian Delta lakes (Kraïem et al., 2001). Marginal vegetation surveys showed that emergent macrophytes were still extensive only in the Delta lakes (Ramdani et al., 2001a) where they form important refuges and restrict water pollution. In 1998, one Moroccan wetland lake (Merja Bokka) was drained completely for cultivation.Site specific environmental change records for the 20th century period were obtained using palaeolimnological techniques. Sediment core chronologies (Appleby et al., 2001) were based mainly on radio-isotopes (210Pb and 137Cs). Sedimentary remains of aquatic biota, diatoms, zooplankton, higher plants and benthic animals (Flower et al., 2001; Ramdani et al., 2001c; Birks et al., 2001a) and pollen (Peglar et al., 2001) were investigated (Birks et al., 2001b). Major differences in past species abundances were found and were interpreted in terms relevant to biodiversity and water quality/availability change. Metals and pesticide residues in sediment cores indicated that lake contamination was generally lower than in some European sites but some DDE profiles showed a close correspondence with known usage histories (Peters et al., 2001).Hydrological changes affecting water quality and availability mainly arose from land-use intensification during the 20th century and are shown to be the main driver of biodiversity disturbance at all nine CASSARINA sites. Summarizing floristic and faunistic changes using species richness values indicated that freshening of the Delta lakes during this century generally increased aquatic diversity. Species richness also increased during the final drainage of Bokka but tended to decline in acid Chitane. Modern sampling showed that phytoplankton and epiphytic diatom diversity was higher in the Delta lakes but this was not so for zooplankton. Each biological group reacted differently to environmental disturbance and this lack of concordance makes overall diversity changes difficult to predict.  相似文献   

20.
We investigate the effects of the stochastic nature of ion channels on the faithfulness, precision and reproducibility of electrical signal transmission in weakly active, dendritic membrane under in vitro conditions. The properties of forward and backpropagating action potentials (BPAPs) in the dendritic tree of pyramidal cells are the subject of intense empirical work and theoretical speculation (Larkum et al., 1999; Zhu, 2000; Larkum et al., 2001; Larkum and Zhu, 2002; Schaefer et al., 2003; Williams, 2004; Waters et al., 2005). We numerically simulate the effects of stochastic ion channels on the forward and backward propagation of dendritic spikes in Monte-Carlo simulations on a reconstructed layer 5 pyramidal neuron. We report that in most instances there is little variation in timing or amplitude for a single BPAP, while variable backpropagation can occur for trains of action potentials. Additionally, we find that the generation and forward propagation of dendritic Ca2+ spikes are susceptible to channel variability. This indicates limitations on computations that depend on the precise timing of Ca2+ spikes. Action Editor : Alain Destexhe  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号