期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

K-ary clustering with optimal leaf ordering for gene expression data 总被引：2，自引：0，他引：2

Bar-Joseph Z Demaine ED Gifford DK Srebro N Hamel AM Jaakkola TS 《Bioinformatics (Oxford, England)》2003,19(9):1070-1078

MOTIVATION: A major challenge in gene expression analysis is effective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at the same time providing a global view of the expression data. However, hierarchical clustering is very sensitive to noise, it usually lacks of a method to actually identify distinct clusters, and produces a large number of possible leaf orderings of the hierarchical clustering tree. In this paper we propose a new hierarchical clustering algorithm which reduces susceptibility to noise, permits up to k siblings to be directly related, and provides a single optimal order for the resulting tree. RESULTS: We present an algorithm that efficiently constructs a k-ary tree, where each node can have up to k children, and then optimally orders the leaves of that tree. By combining k clusters at each step our algorithm becomes more robust against noise and missing values. By optimally ordering the leaves of the resulting tree we maintain the pairwise relationships that appear in the original method, without sacrificing the robustness. Our k-ary construction algorithm runs in O(n(3)) regardless of k and our ordering algorithm runs in O(4(k)n(3)). We present several examples that show that our k-ary clustering algorithm achieves results that are superior to the binary tree results in both global presentation and cluster identification. AVAILABILITY: We have implemented the above algorithms in C++ on the Linux operating system. 相似文献

2.

Gene ordering in partitive clustering using microarray expressions

Ray SS Bandyopadhyay S Pal SK 《Journal of biosciences》2007,32(5):1019-1025

A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions.Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution. 相似文献

3.

Comparison of additive trees using circular orders.

V Makarenkov B Leclerc 《Journal of computational biology》2000,7(5):731-744

It has been postulated that existing species have been linked in the past in a way that can be described using an additive tree structure. Any such tree structure reflecting species relationships is associated with a matrix of distances between the species considered which is called a distance matrix or a tree metric matrix. A circular order of elements of X corresponds to a circular (clockwise) scanning of the subset X of vertices of a tree drawn on a plane. This paper describes an optimal algorithm using circular orders to compare the topology of two trees given by their distance matrices. This algorithm allows us to compute the Robinson and Foulds topologic distance between two trees. It employs circular order tree reconstruction to compute an ordered bipartition table of the tree edges for both given distance matrices. These bipartition tables are then compared to determine the Robinson and Foulds topologic distance, known to be an important criterion of tree similarity. The described algorithm has optimal time complexity, requiring O(n(2)) time when performed on two n x n distance matrices. It can be generalized to get another optimal algorithm, which enables the strict consensus tree of k unrooted trees, given their distance matrices, to be constructed in O(kn(2)) time. 相似文献

4.

Forest species diversity mapping based on clustering algorithm

下载免费PDF全文

《植物生态学报》1958,44(6):598

该研究基于机载激光雷达(LiDAR)和高光谱数据, 从森林物种叶片的生理化学源头探寻生化特征与光谱特征的内在关联, 探讨生化多样性、光谱多样性与物种多样性之间的响应机制, 选择最优植被指数并结合最优结构参数, 通过聚类方法构建森林物种多样性遥感估算模型, 在古田山自然保护区开展森林乔木物种多样性监测。研究结果表明: (1)从16种叶片生化组分中, 筛选出叶绿素a、叶绿素b、类胡萝卜素、叶片含水量、比叶面积、纤维素、木质素、氮、磷和碳可通过偏最小二乘法用叶片光谱有效模拟(R² = 0.60-0.79, p < 0.01), 并选择有效的植被指数: 转换型吸收反射指数/优化型土壤调整指数(TCARI/OSAVI)、类胡萝卜素反射指数(CRI)、水波段指数(WBI)、比值植被指数(RVI)、生理反射指数(PRI)和冠层叶绿素浓度指数(CCCI)表征相应的最优生化组分; (2)基于机载LiDAR数据利用结合形态学冠层控制的分水岭算法获得高精度单木分离结果(R ² = 0.77, RMSE = 16.48), 同时采用逐步回归方法从常用的森林结构参数中选取树高和偏度作为最优结构参数(R ² = 0.32, p < 0.01); (3)基于6个最优植被指数和2个最优结构参数, 以20 m × 20 m为窗口通过自适应模糊C均值方法进行聚类, 实现了研究区森林乔木物种丰富度(Richness, R ²= 0.56, RMSE = 1.81)和多样性指数Shannon-Wiener (R ² = 0.83, RMSE = 0.22)与Simpson (R ² = 0.85, RMSE = 0.09)的成图。该研究在冠层尺度上获取了与物种多样性相关的生化、光谱和结构参数, 将单木个体作为最小单元, 利用聚类算法直接估算物种类别差异, 无需判定具体的树种属性, 是利用遥感数据进行区域尺度森林物种多样性监测与成图的实践, 可为亚热带地区常绿阔叶林的物种多样性监测提供借鉴。相似文献

5.

利用聚类算法监测森林乔木物种多样性

下载免费PDF全文

衣海燕曾源赵玉金郑朝菊熊杰赵旦《植物生态学报》2020,44(6):598-615

该研究基于机载激光雷达(LiDAR)和高光谱数据, 从森林物种叶片的生理化学源头探寻生化特征与光谱特征的内在关联, 探讨生化多样性、光谱多样性与物种多样性之间的响应机制, 选择最优植被指数并结合最优结构参数, 通过聚类方法构建森林物种多样性遥感估算模型, 在古田山自然保护区开展森林乔木物种多样性监测。研究结果表明: (1)从16种叶片生化组分中, 筛选出叶绿素a、叶绿素b、类胡萝卜素、叶片含水量、比叶面积、纤维素、木质素、氮、磷和碳可通过偏最小二乘法用叶片光谱有效模拟(R² = 0.60-0.79, p < 0.01), 并选择有效的植被指数: 转换型吸收反射指数/优化型土壤调整指数(TCARI/OSAVI)、类胡萝卜素反射指数(CRI)、水波段指数(WBI)、比值植被指数(RVI)、生理反射指数(PRI)和冠层叶绿素浓度指数(CCCI)表征相应的最优生化组分; (2)基于机载LiDAR数据利用结合形态学冠层控制的分水岭算法获得高精度单木分离结果(R ² = 0.77, RMSE = 16.48), 同时采用逐步回归方法从常用的森林结构参数中选取树高和偏度作为最优结构参数(R ² = 0.32, p < 0.01); (3)基于6个最优植被指数和2个最优结构参数, 以20 m × 20 m为窗口通过自适应模糊C均值方法进行聚类, 实现了研究区森林乔木物种丰富度(Richness, R ²= 0.56, RMSE = 1.81)和多样性指数Shannon-Wiener (R ² = 0.83, RMSE = 0.22)与Simpson (R ² = 0.85, RMSE = 0.09)的成图。该研究在冠层尺度上获取了与物种多样性相关的生化、光谱和结构参数, 将单木个体作为最小单元, 利用聚类算法直接估算物种类别差异, 无需判定具体的树种属性, 是利用遥感数据进行区域尺度森林物种多样性监测与成图的实践, 可为亚热带地区常绿阔叶林的物种多样性监测提供借鉴。相似文献

6.

Inferring phylogeny from whole genomes

Górecki P Tiuryn J 《Bioinformatics (Oxford, England)》2007,23(2):e116-e122

MOTIVATION: Inferring species phylogenies with a history of gene losses and duplications is a challenging and an important task in computational biology. This problem can be solved by duplication-loss models in which the primary step is to reconcile a rooted gene tree with a rooted species tree. Most modern methods of phylogenetic reconstruction (from sequences) produce unrooted gene trees. This limitation leads to the problem of transforming unrooted gene tree into a rooted tree, and then reconciling rooted trees. The main questions are 'What about biological interpretation of choosing rooting?', 'Can we find efficiently the optimal rootings?', 'Is the optimal rooting unique?'. RESULTS: In this paper we present a model of reconciling unrooted gene tree with a rooted species tree, which is based on a concept of choosing rooting which has minimal reconciliation cost. Our analysis leads to the surprising property that all the minimal rootings have identical distributions of gene duplications and gene losses in the species tree. It implies, in our opinion, that the concept of an optimal rooting is very robust, and thus biologically meaningful. Also, it has nice computational properties. We present a linear time and space algorithm for computing optimal rooting(s). This algorithm was used in two different ways to reconstruct the optimal species phylogeny of five known yeast genomes from approximately 4700 gene trees. Moreover, we determined locations (history) of all gene duplications and gene losses in the final species tree. It is interesting to notice that the top five species trees are the same for both methods. AVAILABILITY: Software and documentation are freely available from http://bioputer.mimuw.edu.pl/~gorecki/urec 相似文献

7.

High-throughput inference of protein-protein interfaces from unassigned NMR data

Mettu RR Lilien RH Donald BR 《Bioinformatics (Oxford, England)》2005,21(Z1):i292-i301

SUMMARY: We cast the problem of identifying protein-protein interfaces, using only unassigned NMR spectra, into a geometric clustering problem. Identifying protein-protein interfaces is critical to understanding inter- and intra-cellular communication, and NMR allows the study of protein interaction in solution. However it is often the case that NMR studies of a protein complex are very time-consuming, mainly due to the bottleneck in assigning the chemical shifts, even if the apo structures of the constituent proteins are known. We study whether it is possible, in a high-throughput manner, to identify the interface region of a protein complex using only unassigned chemical shifts and residual dipolar coupling (RDC) data. We introduce a geometric optimization problem where we must cluster the cells in an arrangement on the boundary of a 3-manifold, where the arrangement is induced by a spherical quadratic form [corrected] The arrangement is induced by a spherical quadratic form, which in turn is parameterized by a SO(3)xR2. We show that this formalism derives directly from the physics of RDCs. We present an optimal algorithm for this problem that runs in O(n3 log n) time for an n-residue protein. We then use this clustering algorithm as a subroutine in a practical algorithm for identifying the interface region of a protein complex from unassigned NMR data. We present the results of our algorithm on NMR data for seven proteins from five protein complexes, and show that our approach is useful for high-throughput applications in which we seek to rapidly identify the interface region of a protein complex. AVAILABILITY: Contact authors for source code. 相似文献

8.

Computing phylogenetic diversity for split systems

Spillner A Nguyen BT Moulton V 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(2):235-244

In conservation biology it is a central problem to measure, predict, and preserve biodiversity as species face extinction. In 1992 Faith proposed measuring the diversity of a collection of species in terms of their relationships on a phylogenetic tree, and to use this information to identify collections of species with high diversity. Here we are interested in some variants of the resulting optimization problem that arise when considering species whose evolution is better represented by a network rather than a tree. More specifically, we consider the problem of computing phylogenetic diversity relative to a split system on a collection of species of size n. We show that for general split systems this problem is NP-hard. In addition we provide some efficient algorithms for some special classes of split systems, in particular presenting an optimal O(n) time algorithm for phylogenetic trees and an O(n log n + nk) time algorithm for choosing an optimal subset of size k relative to a circular split system. 相似文献

9.

An optimal algorithm for perfect phylogeny haplotyping.

Ravi Vijayasatya Amar Mukherjee 《Journal of computational biology》2006,13(4):897-928

Inferring haplotype data from genotype data is a crucial step in linking SNPs to human diseases. Given n genotypes over m SNP sites, the haplotype inference (HI) problem deals with finding a set of haplotypes so that each given genotype can be formed by a combining a pair of haplotypes from the set. The perfect phylogeny haplotyping (PPH) problem is one of the many computational approaches to the HI problem. Though it was conjectured that the complexity of the PPH problem was O(nm), the complexity of all the solutions presented until recently was O(nm (2)). In this paper, we make complete use of the column-ordering that was presented earlier and show that there must be some interdependencies among the pairwise relationships between SNP sites in order for the given genotypes to allow a perfect phylogeny. Based on these interdependencies, we introduce the FlexTree (flexible tree) data structure that represents all the pairwise relationships in O(m) space. The FlexTree data structure provides a compact representation of all the perfect phylogenies for the given set of genotypes. We also introduce an ordering of the genotypes that allows the genotypes to be added to the FlexTree sequentially. The column ordering, the FlexTree data structure, and the row ordering we introduce make the O(nm) OPPH algorithm possible. We present some results on simulated data which demonstrate that the OPPH algorithm performs quiet impressively when compared to the previous algorithms. The OPPH algorithm is one of the first O(nm) algorithms presented for the PPH problem. 相似文献

10.

Hierarchical ordering of reticular networks

Mileyko Y Edelsbrunner H Price CA Weitz JS 《PloS one》2012,7(6):e36715

The structure of hierarchical networks in biological and physical systems has long been characterized using the Horton-Strahler ordering scheme. The scheme assigns an integer order to each edge in the network based on the topology of branching such that the order increases from distal parts of the network (e.g., mountain streams or capillaries) to the "root" of the network (e.g., the river outlet or the aorta). However, Horton-Strahler ordering cannot be applied to networks with loops because they they create a contradiction in the edge ordering in terms of which edge precedes another in the hierarchy. Here, we present a generalization of the Horton-Strahler order to weighted planar reticular networks, where weights are assumed to correlate with the importance of network edges, e.g., weights estimated from edge widths may correlate to flow capacity. Our method assigns hierarchical levels not only to edges of the network, but also to its loops, and classifies the edges into reticular edges, which are responsible for loop formation, and tree edges. In addition, we perform a detailed and rigorous theoretical analysis of the sensitivity of the hierarchical levels to weight perturbations. In doing so, we show that the ordering of the reticular edges is more robust to noise in weight estimation than is the ordering of the tree edges. We discuss applications of this generalized Horton-Strahler ordering to the study of leaf venation and other biological networks. 相似文献

11.

Clustering gene expression patterns. 总被引：23，自引：0，他引：23

A Ben-Dor R Shamir Z Yakhini 《Journal of computational biology》1999,6(3-4):281-297

Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an n-gene dataset is O[n2[log(n)]c]. We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its performance is demonstrated on simulated data and on real gene expression data, with very promising results. 相似文献

12.

Neighbor joining algorithms for inferring phylogenies via LCA distances.

Ilan Gronau Shlomo Moran 《Journal of computational biology》2007,14(1):1-15

Reconstructing phylogenetic trees efficiently and accurately from distance estimates is an ongoing challenge in computational biology from both practical and theoretical considerations. We study algorithms which are based on a characterization of edge-weighted trees by distances to LCAs (Least Common Ancestors). This characterization enables a direct application of ultrametric reconstruction techniques to trees which are not necessarily ultrametric. A simple and natural neighbor joining criterion based on this observation is used to provide a family of efficient neighbor-joining algorithms. These algorithms are shown to reconstruct a refinement of the Buneman tree, which implies optimal robustness to noise under criteria defined by Atteson. In this sense, they outperform many popular algorithms such as Saitou and Nei's NJ. One member of this family is used to provide a new simple version of the 3-approximation algorithm for the closest additive metric under the iota (infinity) norm. A byproduct of our work is a novel technique which yields a time optimal O (n (2)) implementation of common clustering algorithms such as UPGMA. 相似文献

13.

ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time

Cai Y Sun Y 《Nucleic acids research》2011,39(14):e95

Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm. 相似文献

14.

Efficient detection of unusual words. 总被引：3，自引：0，他引：3

A Apostolico M E Bock S Lonardi X Xu 《Journal of computational biology》2000,7(1-2):71-94

Words that are, by some measure, over- or underrepresented in the context of larger sequences have been variously implicated in biological functions and mechanisms. In most approaches to such anomaly detections, the words (up to a certain length) are enumerated more or less exhaustively and are individually checked in terms of observed and expected frequencies, variances, and scores of discrepancy and significance thereof. Here we take the global approach of annotating the suffix tree of a sequence with some such values and scores, having in mind to use it as a collective detector of all unexpected behaviors, or perhaps just as a preliminary filter for words suspicious enough to undergo a more accurate scrutiny. We consider in depth the simple probabilistic model in which sequences are produced by a random source emitting symbols from a known alphabet independently and according to a given distribution. Our main result consists of showing that, within this model, full tree annotations can be carried out in a time-and-space optimal fashion for the mean, variance and some of the adopted measures of significance. This result is achieved by an ad hoc embedding in statistical expressions of the combinatorial structure of the periods of a string. Specifically, we show that the expected value and variance of all substrings in a given sequence of n symbols can be computed and stored in (optimal) O(n2) overall worst-case, O (n log n) expected time and space. The O (n2) time bound constitutes an improvement by a linear factor over direct methods. Moreover, we show that under several accepted measures of deviation from expected frequency, the candidates over- or underrepresented words are restricted to the O(n) words that end at internal nodes of a compact suffix tree, as opposed to the theta(n2) possible substrings. This surprising fact is a consequence of properties in the form that if a word that ends in the middle of an arc is, say, overrepresented, then its extension to the nearest node of the tree is even more so. Based on this, we design global detectors of favored and unfavored words for our probabilistic framework in overall linear time and space, discuss related software implementations and display the results of preliminary experiments. 相似文献

15.

Fast, optimal alignment of three sequences using linear gap costs 总被引：2，自引：0，他引：2

Powell DR Allison L Dix TI 《Journal of theoretical biology》2000,207(3):325-336

Alignment algorithms can be used to infer a relationship between sequences when the true relationship is unknown. Simple alignment algorithms use a cost function that gives a fixed cost to each possible point mutation-mismatch, deletion, insertion. These algorithms tend to find optimal alignments that have many small gaps. It is more biologically plausible to have fewer longer gaps rather than many small gaps in an alignment. To address this issue, linear gap cost algorithms are in common use for aligning biological sequence data. More reliable inferences are obtained by aligning more than two sequences at a time. The obvious dynamic programming algorithm for optimally aligning k sequences of length n runs in O(n(k)) time. This is impractical if k>/=3 and n is of any reasonable length. Thus, for this problem there are many heuristics for aligning k sequences, however, they are not guaranteed to find an optimal alignment. In this paper, we present a new algorithm guaranteed to find the optimal alignment for three sequences using linear gap costs. This gives the same results as the dynamic programming algorithm for three sequences, but typically does so much more quickly. It is particularly fast when the (three-way) edit distance is small. Our algorithm uses a speed-up technique based on Ukkonen's greedy algorithm (Ukkonen, 1983) which he presented for two sequences and simple costs. 相似文献

16.

Tsukuba BB: a branch and bound algorithm for local multiple alignment of DNA and protein sequences.

P Horton 《Journal of computational biology》2001,8(3):283-303

In this paper we present a branch and bound algorithm for local gapless multiple sequence alignment (motif alignment) and its implementation. The algorithm uses both score-based bounding and a novel bounding technique based on the "consistency" of the alignment. A sequence order independent search tree is used in conjunction with a technique for avoiding redundant calculations inherent in the structure of the tree. This is the first program to exploit the fact that the motif alignment problem is easier for short motifs. Indeed, for a short fixed motif width, the running time of the algorithm is asymptotically linear in the size of the input. We tested the performance of the program on a dataset of 300 E. coli promoter sequences and a dataset of 85 lipocalin protein sequences. For a motif width of 4, the optimal alignment of the entire set of sequences can be found. For the more natural motif width of 6, the program can align 21 sequences of length 100, more than twice the number of sequences which can be aligned by the best previous exact algorithm. The algorithm can relax the constraint of requiring each sequence to be aligned, and align 105 of the 300 promoter sequences with a motif width of 6. For the lipocalin dataset, we introduce a technique for reducing the effective alphabet size with a minimal loss of useful information. With this technique, we show that the program can find meaningful motifs in a reasonable amount of time by optimizing the score over three motif positions. 相似文献

17.

Evolutionary profiles derived from the QR factorization of multiple structural alignments gives an economy of information

O'Donoghue P Luthey-Schulten Z 《Journal of molecular biology》2005,346(3):875-894

We present a new algorithm, based on the multidimensional QR factorization, to remove redundancy from a multiple structural alignment by choosing representative protein structures that best preserve the phylogenetic tree topology of the homologous group. The classical QR factorization with pivoting, developed as a fast numerical solution to eigenvalue and linear least-squares problems of the form Ax=b, was designed to re-order the columns of A by increasing linear dependence. Removing the most linear dependent columns from A leads to the formation of a minimal basis set which well spans the phase space of the problem at hand. By recasting the problem of redundancy in multiple structural alignments into this framework, in which the matrix A now describes the multiple alignment, we adapted the QR factorization to produce a minimal basis set of protein structures which best spans the evolutionary (phase) space. The non-redundant and representative profiles obtained from this procedure, termed evolutionary profiles, are shown in initial results to outperform well-tested profiles in homology detection searches over a large sequence database. A measure of structural similarity between homologous proteins, Q(H), is presented. By properly accounting for the effect and presence of gaps, a phylogenetic tree computed using this metric is shown to be congruent with the maximum-likelihood sequence-based phylogeny. The results indicate that evolutionary information is indeed recoverable from the comparative analysis of protein structure alone. Applications of the QR ordering and this structural similarity metric to analyze the evolution of structure among key, universally distributed proteins involved in translation, and to the selection of representatives from an ensemble of NMR structures are also discussed. 相似文献

18.

The utility of indels in population genetics: the Tpi intron for host race genealogy of Acrocercops transecta (Insecta: Lepidoptera)

Ohshima I Yoshizawa K 《Molecular phylogenetics and evolution》2011,59(2):469-476

We investigated the utility of indel data for genealogical and population genetic analyses using the Tpi intron of the leaf mining moth Acrocercops transecta (Insecta: Lepidoptera). Genealogical analyses revealed that indel data were less homoplasious than DNA sequence data and that indel data contained a sufficient signal to provide a high resolution tree that was highly congruent with the tree estimated from DNA sequences. Although some conflicts were identified in the distributions of multi-residue indels, such conflicts were especially useful for the unambiguous detection of recombinations. For the first time, we adopted a Bayesian clustering method for indel characters to infer genetic structure of the moth. We concluded that indel characters have the potential to be a powerful tool in the analysis of population genetics and population structure as well as in the detection of gene flow. 相似文献

19.

Design and Performance Analysis of Divisible Load Scheduling Strategies on Arbitrary Graphs 总被引：1，自引：0，他引：1

Jingnan Yao Bharadwaj Veeravalli 《Cluster computing》2004,7(2):191-207

In this paper, we consider the problem of scheduling divisible loads on arbitrary graphs with the objective to minimize the total processing time of the entire load submitted for processing. We consider an arbitrary graph network comprising heterogeneous processors interconnected via heterogeneous links in an arbitrary fashion. The divisible load is assumed to originate at any processor in the network. We transform the problem into a multi-level unbalanced tree network and schedule the divisible load. We design systematic procedures to identify and eliminate any redundant processor–link pairs (those pairs whose consideration in scheduling will penalize the performance) and derive an optimal tree structure to obtain an optimal processing time, for a fixed sequence of load distribution. Since the algorithm thrives to determine an equivalent number of processors (resources) that can be used for processing the entire load, we refer to this approach as resource-aware optimal load distribution (RAOLD) algorithm. We extend our study by applying the optimal sequencing theorem proposed for single-level tree networks in the literature for multi-level tree for obtaining an optimal solution. We evaluate the performance for a wide range of arbitrary graphs with varying connectivity probabilities and processor densities. We also study the effect of network scalability and connectivity. We demonstrate the time performance when the point of load origination differs in the network and highlight certain key features that may be useful for algorithm and/or network system designers. We evaluate the time performance with rigorous simulation experiments under different system parameters for the ease of a complete understanding. 相似文献

20.

Emergent spatial structure and pathogen epidemics: the influence of management and stochasticity in agroecosystems

《Ecological Complexity》2021

Organisms susceptible to disease, from humans to crops, inevitably have spatial geometry that influence disease dynamics. Understanding how spatial structure emerges through time in ecological systems and how that structure influences disease dynamics is of practical importance for natural and human management systems. Here we use the perennial crop, coffee, Coffea arabica, along with its pathogen, the coffee leaf rust, Hemileia vastatrix, as a model system to understand how spatial structure is created in agroecosystems and its subsequent influence on the dynamics of the system. Here, we create a simple null model of the socio-ecological process of death and stochastic replanting of coffee plants on a plot. We then use spatial networks to quantify the spatial structures and make comparisons of our stochastic null model to empirically observed spatial distributions of coffee. We then present a simple model of pathogen spread on spatial networks across a range of spatial geometries emerging from our null model and show how both local and regional management of agroecosystems interact with space and time to alter disease dynamics. Our results suggest that our null model of evolving spatial structure can capture many critical features of how the spatial arrangement of plants changes through time in coffee agroecosystems. Additionally, we find small changes in management factors that can influence the scale of pathogen transmission, such as shade tree removal, and result in a rapid transition to epidemics with lattice-like spatial arrangements but not with irregular planting geometries. The results presented here may have practical implications for farmers in Latin America who are in the process of replanting and overhauling management of their coffee farms in response to a coffee leaf rust epidemic in 2013. We suggest that shade reduction in conjunction with more lattice-like planting schemes may result in coffee being more prone to epidemic-like dynamics of the coffee leaf rust in the future. 相似文献