期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast computation of genetic likelihoods on human pedigree data.

T M Goradia K Lange P L Miller P M Nadkarni 《Human heredity》1992,42(1):42-62

Gene mapping and genetic epidemiology require large-scale computation of likelihoods based on human pedigree data. Although computation of such likelihoods has become increasingly sophisticated, fast calculations are still impeded by complex pedigree structures, by models with many underlying loci and by missing observations on key family members. The current paper 'introduces' a new method of array factorization that substantially accelerates linkage calculations with large numbers of markers. This method is not limited to nuclear families or to families with complete phenotyping. Vectorization and parallelization are two general-purpose hardware techniques for accelerating computations. These techniques can assist in the rapid calculation of genetic likelihoods. We describe our experience using both of these methods with the existing program MENDEL. A vectorized version of MENDEL was run on an IBM 3090 supercomputer. A parallelized version of MENDEL was run on parallel machines of different architectures and on a network of workstations. Applying these revised versions of MENDEL to two challenging linkage problems yields substantial improvements in computational speed. 相似文献

2.

Modular implementation of dynamic algorithm switching in parallel simulations

Pilsung Kang 《Cluster computing》2012,15(3):321-332

We present a modular approach to implementing dynamic algorithm switching for parallel scientific software. By using a compositional framework based on function call interception techniques, our proposed method transparently integrates algorithm switching code with a given program without directly modifying the original code structure. Through fine-grained control of algorithmic behavior of an application at the level of functions, our approach supports design and implementation of application-specific switching scenarios in a modular way. Our approach encourages algorithm switching to dynamically perform at the loop end of a parallel simulation, where cooperating processes in concurrent execution typically synchronize and intermediate computation results are consistent. In this way, newly added switching operations do not cause race conditions that may produce unreliable computation results in parallel simulations. By applying our method to a real-world scientific application and adapting its algorithmic behavior to the properties of input problems, we demonstrate the applicability and effectiveness of our approach to constructing efficient parallel simulations. 相似文献

3.

High-throughput, detailed, cell-specific neuroanatomy of dendritic spines using microinjection and confocal microscopy

Dumitriu D Rodriguez A Morrison JH 《Nature protocols》2011,6(9):1391-1411

Morphological features such as size, shape and density of dendritic spines have been shown to reflect important synaptic functional attributes and potential for plasticity. Here we describe in detail a protocol for obtaining detailed morphometric analysis of spines using microinjection of fluorescent dyes, high-resolution confocal microscopy, deconvolution and image analysis with NeuronStudio. Recent technical advancements include better preservation of tissue, resulting in prolonged ability to microinject, and algorithmic improvements that compensate for the residual z-smear inherent in all optical imaging. Confocal imaging parameters were probed systematically to identify both optimal resolution and the highest efficiency. When combined, our methods yield size and density measurements comparable to serial section transmission electron microscopy in a fraction of the time. An experiment containing three experimental groups with eight subjects each can take as little as 1 month if optimized for speed, or approximately 4-5 months if the highest resolution and morphometric detail is sought. 相似文献

4.

Optimized combinatorial clustering for stochastic processes

Jumi Kim Wookey Lee Justin Jongsu Song Soo-Bok Lee 《Cluster computing》2017,20(2):1135-1148

As a new data processing era like Big Data, Cloud Computing, and Internet of Things approaches, the amount of data being collected in databases far exceeds the ability to reduce and analyze these data without the use of automated analysis techniques, data mining. As the importance of data mining has grown, one of the critical issues to emerge is how to scale data mining techniques to larger and complex databases so that it is particularly imperative for computationally intensive data mining tasks such as identifying natural clusters of instances. In this paper, we suggest an optimized combinatorial clustering algorithm for noisy performance which is essential for large data with random sampling. The algorithm outperforms conventional approaches through various numerical and qualitative thresholds like mean and standard deviation of accuracy and computation speed. 相似文献

5.

Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium

Po-Ru Loh Mark Lipson Nick Patterson Priya Moorjani Joseph K. Pickrell David Reich Bonnie Berger 《Genetics》2013,193(4):1233-1254

Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese. 相似文献

6.

High-performance electron tomography of complex biological specimens

Fernández JJ Lawrence AF Roca J García I Ellisman MH Carazo JM 《Journal of structural biology》2002,138(1-2):6-20

We have evaluated reconstruction methods using smooth basis functions in the electron tomography of complex biological specimens. In particular, we have investigated series expansion methods, with special emphasis on parallel computation. Among the methods investigated, the component averaging techniques have proven to be most efficient and have generally shown fast convergence rates. The use of smooth basis functions provides the reconstruction algorithms with an implicit regularization mechanism, very appropriate for noisy conditions. Furthermore, we have applied high-performance computing (HPC) techniques to address the computational requirements demanded by the reconstruction of large volumes. One of the standard techniques in parallel computing, domain decomposition, has yielded an effective computational algorithm which hides the latencies due to interprocessor communication. We present comparisons with weighted back-projection (WBP), one of the standard reconstruction methods in the areas of computational demand and reconstruction quality under noisy conditions. These techniques yield better results, according to objective measures of quality, than the weighted backprojection techniques after a very few iterations. As a consequence, the combination of efficient iterative algorithms and HPC techniques has proven to be well suited to the reconstruction of large biological specimens in electron tomography, yielding solutions in reasonable computation times. 相似文献

7.

A novel method for high accuracy sumoylation site prediction from protein sequences

Jialin Xu Yun He Boqin Qiang Jiangang Yuan Xiaozhong Peng Xian-Ming Pan 《BMC bioinformatics》2008,9(1):1-9

Background

The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome [1] would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.

Results

To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.

Conclusion

We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms. 相似文献

8.

Linkage mechanisms in the vertebrate skull: Structure and function of three‐dimensional,parallel transmission systems

下载免费PDF全文

Aaron M. Olsen Mark W. Westneat 《Journal of morphology》2016,277(12):1570-1583

Many musculoskeletal systems, including the skulls of birds, fishes, and some lizards consist of interconnected chains of mobile skeletal elements, analogous to linkage mechanisms used in engineering. Biomechanical studies have applied linkage models to a diversity of musculoskeletal systems, with previous applications primarily focusing on two‐dimensional linkage geometries, bilaterally symmetrical pairs of planar linkages, or single four‐bar linkages. Here, we present new, three‐dimensional (3D), parallel linkage models of the skulls of birds and fishes and use these models (available as free kinematic simulation software), to investigate structure–function relationships in these systems. This new computational framework provides an accessible and integrated workflow for exploring the evolution of structure and function in complex musculoskeletal systems. Linkage simulations show that kinematic transmission, although a suitable functional metric for linkages with single rotating input and output links, can give misleading results when applied to linkages with substantial translational components or multiple output links. To take into account both linear and rotational displacement we define force mechanical advantage for a linkage (analogous to lever mechanical advantage) and apply this metric to measure transmission efficiency in the bird cranial mechanism. For linkages with multiple, expanding output points we propose a new functional metric, expansion advantage, to measure expansion amplification and apply this metric to the buccal expansion mechanism in fishes. Using the bird cranial linkage model, we quantify the inaccuracies that result from simplifying a 3D geometry into two dimensions. We also show that by combining single‐chain linkages into parallel linkages, more links can be simulated while decreasing or maintaining the same number of input parameters. This generalized framework for linkage simulation and analysis can accommodate linkages of differing geometries and configurations, enabling novel interpretations of the mechanics of force transmission across a diversity of vertebrate feeding mechanisms and enhancing our understanding of musculoskeletal function and evolution. J. Morphol. 277:1570–1583, 2016. © 2016 Wiley Periodicals, Inc. 相似文献

9.

ABC-X: a generalized,automatically configurable artificial bee colony framework

Doğan Aydın Gürcan Yavuz Thomas Stützle 《Swarm Intelligence》2017,11(1):1-38

The artificial bee colony (ABC) algorithm is a popular metaheuristic that was originally conceived for tackling continuous function optimization tasks. Over the last decade, a large number of variants of ABC have been proposed, making it by now a well-studied swarm intelligence algorithm. Typically, in a paper on algorithmic variants of ABC algorithms, one or at most two of its algorithmic components are modified. Possible changes include variations on the search equations, the selection of candidate solutions to be explored, or the adoption of features from other algorithmic techniques. In this article, we propose to follow a different direction and to build a generalized ABC algorithm, which we call ABC-X. ABC-X collects algorithmic components available from known ABC algorithms into a common algorithm framework that allows not only to instantiate known ABC variants but, more importantly, also many ABC algorithm variants that have never been explored before in the literature. Automatic algorithm configuration techniques can generate from this template new ABC variants that perform better than known ABC algorithms, even when their numerical parameters are fine-tuned using the same automatic configuration process. 相似文献

10.

Integration of human and mouse genetics reveals pendrin function in hearing and deafness

Dror AA Brownstein Z Avraham KB 《Cellular physiology and biochemistry》2011,28(3):535-544

Genomic technology has completely changed the way in which we are able to diagnose human genetic mutations. Genomic techniques such as the polymerase chain reaction, linkage analysis, Sanger sequencing, and most recently, massively parallel sequencing, have allowed researchers and clinicians to identify mutations for patients with Pendred syndrome and DFNB4 non-syndromic hearing loss. While thus far most of the mutations have been in the SLC26A4 gene coding for the pendrin protein, other genetic mutations may contribute to these phenotypes as well. Furthermore, mouse models for deafness have been invaluable to help determine the mechanisms for SLC26A4-associated deafness. Further work in these areas of research will help define genotype-phenotype correlations and develop methods for therapy in the future. 相似文献

11.

Two-locus disease models with two marker loci: the power of affected-sib-pair tests. 总被引：5，自引：3，他引：2

下载免费PDF全文

M. Knapp S. A. Seuchter M. P. Baur 《American journal of human genetics》1994,55(5):1030-1041

Recently, Schork et al. found that two-trait-locus, two-marker-locus (parametric) linkage analysis can provide substantially more linkage information than can standard one-trait-locus, one-marker-locus methods. However, because of the increased burden of computation, Schork et al. do not expect that their approach will be applied in an initial genome scan. Further, the specification of a suitable two-locus segregation model can be crucial. Affected-sibpair tests are computationally simple and do not require an explicit specification of the disease model. In the past, however, these tests mainly have been applied to data with a single marker locus. Here, we consider sib-pair tests that make it possible to analyze simultaneously two marker loci. The power of these tests is investigated for different (epistatic and heterogeneous) two-trait-locus models, each trait locus being linked to one of the marker loci. We compare these tests both with the test that is optimal for a certain model and with the strategy that analyzes each marker locus separately. The results indicate that a straightforward extension of the well-known mean test for two marker loci can be much more powerful than single-marker-locus analysis and that is power is only slightly inferior to the power of the optimal test. 相似文献

12.

Linkage analysis versus association analysis: distinguishing between two models that explain disease-marker associations. 总被引：9，自引：5，他引：4

下载免费PDF全文

S E Hodge 《American journal of human genetics》1993,53(2):367-384

Human genetics researchers have been intrigued for many years by weak-to-moderate associations between markers and diseases. However, in most cases of association, the cause of this phenomenon is still not known. Recently, interest has grown in pursuing association studies for complex diseases, either instead of or in addition to linkage studies. Hence, it is timely to reconsider what a disease-marker association, particularly in the weak-to-moderate range (relative risk < 10), can tell us about disease etiology. To this end, this study accomplishes three aims: (1) It formulates two different models explaining weak-to-moderate associations and derives the relationship between them. One is a linkage disequilibrium model, and the other is a "susceptibility," or pure association, model. The importance of drawing the distinction between these two models and the implications for our understanding of the genetics of human disease will also be discussed. It will be argued that the linkage disequilibrium model represents true linkage but that the susceptibility model does not. (2) It examines two family-based association tests proposed recently by Parsian et al. and Spielman et al. and derives formulas for their behavior under the two models described above. It demonstrates that these tests yield almost identical results under these two models. It shows that, whereas these tests can confirm an association, they cannot determine whether the association is caused by the linkage disequilibrium model or the susceptibility model. The study also characterizes the probabilities yielded by the family association tests in the presence of weak-to-moderate associations, which will aid researchers using these tests. (3) It proposes two approaches, both based on linkage analysis, which can distinguish between the two models described above. One approach involves a straightforward linkage analysis of the data; the other involves a partitioned association-linkage (PAL) test, as suggested by Greenberg. Formulas are derived for testing identity by descent in affected sib pairs by using both approaches. (4) Finally, the formulas and arguments are illustrated with two examples from the literature and one computer-simulated data set. 相似文献

13.

Salt stress under the scalpel – dissecting the genetics of salt tolerance

Mitchell J. L. Morton Mariam Awlia Nadia Al‐Tamimi Stephanie Saade Yveline Pailles Snia Negro Mark Tester 《The Plant journal : for cell and molecular biology》2019,97(1):148-163

Salt stress limits the productivity of crops grown under saline conditions, leading to substantial losses of yield in saline soils and under brackish and saline irrigation. Salt tolerant crops could alleviate these losses while both increasing irrigation opportunities and reducing agricultural demands on dwindling freshwater resources. However, despite significant efforts, progress towards this goal has been limited, largely because of the genetic complexity of salt tolerance for agronomically important yield‐related traits. Consequently, the focus is shifting to the study of traits that contribute to overall tolerance, thus breaking down salt tolerance into components that are more genetically tractable. Greater consideration of the plasticity of salt tolerance mechanisms throughout development and across environmental conditions furthers this dissection. The demand for more sophisticated and comprehensive methodologies is being met by parallel advances in high‐throughput phenotyping and sequencing technologies that are enabling the multivariate characterisation of vast germplasm resources. Alongside steady improvements in statistical genetics models, forward genetics approaches for elucidating salt tolerance mechanisms are gaining momentum. Subsequent quantitative trait locus and gene validation has also become more accessible, most recently through advanced techniques in molecular biology and genomic analysis, facilitating the translation of findings to the field. Besides fuelling the improvement of established crop species, this progress also facilitates the domestication of naturally salt tolerant orphan crops. Taken together, these advances herald a promising era of discovery for research into the genetics of salt tolerance in plants. 相似文献

14.

Controlling false discoveries in genome scans for selection

下载免费PDF全文

Olivier François Helena Martins Kevin Caye Sean D. Schoville 《Molecular ecology》2016,25(2):454-469

Population differentiation (PD) and ecological association (EA) tests have recently emerged as prominent statistical methods to investigate signatures of local adaptation using population genomic data. Based on statistical models, these genomewide testing procedures have attracted considerable attention as tools to identify loci potentially targeted by natural selection. An important issue with PD and EA tests is that incorrect model specification can generate large numbers of false‐positive associations. Spurious association may indeed arise when shared demographic history, patterns of isolation by distance, cryptic relatedness or genetic background are ignored. Recent works on PD and EA tests have widely focused on improvements of test corrections for those confounding effects. Despite significant algorithmic improvements, there is still a number of open questions on how to check that false discoveries are under control and implement test corrections, or how to combine statistical tests from multiple genome scan methods. This tutorial study provides a detailed answer to these questions. It clarifies the relationships between traditional methods based on allele frequency differentiation and EA methods and provides a unified framework for their underlying statistical tests. We demonstrate how techniques developed in the area of genomewide association studies, such as inflation factors and linear mixed models, benefit genome scan methods and provide guidelines for good practice while conducting statistical tests in landscape and population genomic applications. Finally, we highlight how the combination of several well‐calibrated statistical tests can increase the power to reject neutrality, improving our ability to infer patterns of local adaptation in large population genomic data sets. 相似文献

15.

Efficient clustering of large EST data sets on parallel computers

下载免费PDF全文

Kalyanaraman A Aluru S Kothari S Brendel V 《Nucleic acids research》2003,31(11):2963-2974

Clustering expressed sequence tags (ESTs) is a powerful strategy for gene identification, gene expression studies and identifying important genetic variations such as single nucleotide polymorphisms. To enable fast clustering of large-scale EST data, we developed PaCE (for Parallel Clustering of ESTs), a software program for EST clustering on parallel computers. In this paper, we report on the design and development of PaCE and its evaluation using Arabidopsis ESTs. The novel features of our approach include: (i) design of memory efficient algorithms to reduce the memory required to linear in the size of the input, (ii) a combination of algorithmic techniques to reduce the computational work without sacrificing the quality of clustering, and (iii) use of parallel processing to reduce run-time and facilitate clustering of larger data sets. Using a combination of these techniques, we report the clustering of 168 200 Arabidopsis ESTs in 15 min on an IBM xSeries cluster with 30 dual-processor nodes. We also clustered 327 632 rat ESTs in 47 min and 420 694 Triticum aestivum ESTs in 3 h and 15 min. We demonstrate the quality of our software using benchmark Arabidopsis EST data, and by comparing it with CAP3, a software widely used for EST assembly. Our software allows clustering of much larger EST data sets than is possible with current software. Because of its speed, it also facilitates multiple runs with different parameters, providing biologists a tool to better analyze EST sequence data. Using PaCE, we clustered EST data from 23 plant species and the results are available at the PlantGDB website. 相似文献

16.

Efficient Record Linkage Algorithms Using Complete Linkage Clustering

Abdullah-Al Mamun Robert Aseltine Sanguthevar Rajasekaran 《PloS one》2016,11(4)

Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. 相似文献

17.

Computation techniques in the conformational analysis of carbohydrates

A. G. Gerbst A. A. Grachev A. S. Shashkov N. E. Nifantiev 《Russian Journal of Bioorganic Chemistry》2007,33(1):24-37

A growing number of modern studies of carbohydrates is devoted to spatial mechanisms of their participation in the cell recognition processes and directed design of inhibitors of these processes. No progress in this field is possible without the development of theoretical conformational analysis of carbohydrates. In this review, we generalize literature data on the potentialities of using various molecular-mechanic force fields, the methods of quantum mechanics, and molecular dynamics to study the conformation of glycoside linkage. A possibility of analyzing the reactivity of carbohydrates with the computation techniques is also discussed in brief. 相似文献

18.

Massively parallel acceleration methods for image handling operations

Nakhoon Baek Kwan-Hee Yoo 《Cluster computing》2017,20(2):1149-1154

In the image handling and image processing areas, most operations can be executed in a pixel-by-pixel or cluster-by-cluster manner. These parallel and simultaneous executions have many benefits, and many researchers showed remarkable improvements. In this paper, we started from a specific and practical image handling and feature extraction sequences. We focused on the detailed design and robust implementation on the modern massively parallel architecture of CUDA. We present the enhanced features of our implementation and their design details. Our final result shows 13 times faster execution speed, in comparison with its previous CPU-based implementation. These methods can be applied to the variety of image manipulation processes. 相似文献

19.

Improvement of mapping accuracy by unifying linkage and association analysis

Lou XY Ma JZ Yang MC Zhu J Liu PY Deng HW Elston RC Li MD 《Genetics》2006,172(1):647-661

It is well known that pedigree/family data record information on the coexistence in founder haplotypes of alleles at nearby loci and the cotransmission from parent to offspring that reveal different, but complementary, profiles of the genetic architecture. Either conventional linkage analysis that assumes linkage equilibrium or family-based association tests (FBATs) capture only partial information, leading to inefficiency. For example, FBATs will fail to detect even very tight linkage in the case where no allelic association exists, while a violation of the assumption of linkage equilibrium will result in biased estimation and reduced efficiency in linkage mapping. In this article, by using a data augmentation technique and the EM algorithm, we propose a likelihood-based approach that embeds both linkage and association analyses into a unified framework for general pedigree data. Relative to either linkage or association analysis, the proposed approach is expected to have greater estimation accuracy and power. Monte Carlo simulations support our theoretical expectations and demonstrate that our new methodology: (1) is more powerful than either FBATs or classic linkage analysis; (2) can unbiasedly estimate genetic parameters regardless of whether association exists, thus remedying the bias and less precision of traditional linkage analysis in the presence of association; and (3) is capable of identifying tight linkage alone. The new approach also holds the theoretical advantage that it can extract statistical information to the maximum extent and thereby improve mapping accuracy and power because it integrates multilocus population-based association study and pedigree-based linkage analysis into a coherent framework. Furthermore, our method is numerically stable and computationally efficient, as compared to existing parametric methods that use the simplex algorithm or Newton-type methods to maximize high-order multidimensional likelihood functions, and also offers the computation of Fisher's information matrix. Finally, we apply our methodology to a genetic study on bone mineral density (BMD) for the vitamin D receptor (VDR) gene and find that VDR is significantly linked to BMD at the one-third region of the wrist. 相似文献

20.

QuickNGS elevates Next-Generation Sequencing data analysis to a new level of automation

Prerana Wagle Milo? Nikoli? Peter Frommolt 《BMC genomics》2015,16(1)

Background

Next-Generation Sequencing (NGS) has emerged as a widely used tool in molecular biology. While time and cost for the sequencing itself are decreasing, the analysis of the massive amounts of data remains challenging. Since multiple algorithmic approaches for the basic data analysis have been developed, there is now an increasing need to efficiently use these tools to obtain results in reasonable time.

Results

We have developed QuickNGS, a new workflow system for laboratories with the need to analyze data from multiple NGS projects at a time. QuickNGS takes advantage of parallel computing resources, a comprehensive back-end database, and a careful selection of previously published algorithmic approaches to build fully automated data analysis workflows. We demonstrate the efficiency of our new software by a comprehensive analysis of 10 RNA-Seq samples which we can finish in only a few minutes of hands-on time. The approach we have taken is suitable to process even much larger numbers of samples and multiple projects at a time.

Conclusion

Our approach considerably reduces the barriers that still limit the usability of the powerful NGS technology and finally decreases the time to be spent before proceeding to further downstream analysis and interpretation of the data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1695-x) contains supplementary material, which is available to authorized users. 相似文献