首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Feng R  Zhou G  Zhang M  Zhang H 《Biometrics》2009,65(2):584-589
Summary .  Twin studies are essential for assessing disease inheritance. Data generated from twin studies are traditionally analyzed using specialized computational programs. For many researchers, especially those who are new to twin studies, understanding and using those specialized computational programs can be a daunting task. Given that SAS (Statistical Analysis Software) is the most popular software for statistical analysis, we suggest that the use of SAS procedures for twin data may be a helpful alternative and demonstrate that we can obtain similar results from SAS to those produced by specialized computational programs. This numerical validation is practically useful, because a natural concern with general statistical software is whether it can deal with data that are generated from special study designs such as twin studies and if it can test a particular hypothesis. We concluded through our extensive simulation that SAS procedures can be used easily as a very convenient alternative to specialized programs for twin data analysis.  相似文献   

2.

Background  

Within research each experiment is different, the focus changes and the data is generated from a continually evolving barrage of technologies. There is a continual introduction of new techniques whose usage ranges from in-house protocols through to high-throughput instrumentation. To support these requirements data management systems are needed that can be rapidly built and readily adapted for new usage.  相似文献   

3.
As studies on vehicular ad hoc networks have been conducted actively in recent years, convenient and reliable services can be provided to vehicles through traffic information, surrounding information, and file sharing. To provide services for multiple requests, road side units (RSUs) should receive requests from vehicles and provide a scheduling scheme for data transfer according to priority. In this paper, we propose a new scheduling scheme by which multiple RSUs are connected through wired networks and data is transferred through the collaboration of RSUs. The proposed scheme transfers safety and non-safety data by employing a collaborative strategy of multiple RSUs as well as reducing the deadline miss ratio and average response time. When safety data is generated, data is transferred from the previous RSU in advance, while priority is assigned considering the deadline and reception rate. Since non-safety data is an on-demand data processed by user requests, the proposed scheme provides a method that reduces the deadline miss ratio upon loads generated in RSUs. To prove the superiority of the proposed scheme, we perform a performance evaluation in which the number and velocities of vehicles were changed. It is shown through the performance evaluation that the proposed scheme has better deadline miss ratios and faster response time than the existing schemes.  相似文献   

4.

Background  

Last years' mapping of diverse genomes has generated huge amounts of biological data which are currently dispersed through many databases. Integration of the information available in the various databases is required to unveil possible associations relating already known data. Biological data are often imprecise and noisy. Fuzzy set theory is specially suitable to model imprecise data while association rules are very appropriate to integrate heterogeneous data.  相似文献   

5.
6.
Next-generation sequencing technologies have generated, and continue to produce, an increasingly large corpus of biological data. The data generated are inherently compositional as they convey only relative information dependent upon the capacity of the instrument, experimental design and technical bias. There is considerable information to be gained through network analysis by studying the interactions between components within a system. Network theory methods using compositional data are powerful approaches for quantifying relationships between biological components and their relevance to phenotype, environmental conditions or other external variables. However, many of the statistical assumptions used for network analysis are not designed for compositional data and can bias downstream results. In this mini-review, we illustrate the utility of network theory in biological systems and investigate modern techniques while introducing researchers to frameworks for implementation. We overview (1) compositional data analysis, (2) data transformations and (3) network theory along with insight on a battery of network types including static-, temporal-, sample-specific- and differential-networks. The intention of this mini-review is not to provide a comprehensive overview of network methods, rather to introduce microbiology researchers to (semi)-unsupervised data-driven approaches for inferring latent structures that may give insight into biological phenomena or abstract mechanics of complex systems.  相似文献   

7.
Roark DE 《Biophysical chemistry》2004,108(1-3):121-126
Biophysical chemistry experiments, such as sedimentation-equilibrium analyses, require computational techniques to reduce the effects of random errors of the measurement process. The existing approaches have primarily relied on assumption of polynomial models and least-squares approximation. Such models by constraining the data to remove random fluctuations may distort the data and cause loss of information. The better the removal of random errors the greater is the likely introduction of systematic errors through the constraining fit itself. An alternative technique, reverse smoothing, is suggested that makes use of a more model-free approach of exponential smoothing of the first derivative. Exponential smoothing approaches have been generally unsatisfactory because they introduce significant data lag. The approaches given here compensates for the lag defect and appears promising for the smoothing of many experimental data sequences, including the macromolecular concentration data generated by sedimentation-equilibria experiments. Test results on simulated sedimentation-equilibrium data indicate that a 4-fold reduction in error may be typical over standard analyses techniques.  相似文献   

8.
《Genomics》2019,111(6):1387-1394
To decipher the genetic architecture of human disease, various types of omics data are generated. Two common omics data are genotypes and gene expression. Often genotype data for a large number of individuals and gene expression data for a few individuals are generated due to biological and technical reasons, leading to unequal sample sizes for different omics data. Unavailability of standard statistical procedure for integrating such datasets motivates us to propose a two-step multi-locus association method using latent variables. Our method is powerful than single/separate omics data analysis and it unravels comprehensively deep-seated signals through a single statistical model. Extensive simulation confirms that it is robust to various genetic models as its power increases with sample size and number of associated loci. It provides p-values very fast. Application to real dataset on psoriasis identifies 17 novel SNPs, functionally related to psoriasis-associated genes, at much smaller sample size than standard GWAS.  相似文献   

9.

Background  

Many proteomics initiatives require a seamless bioinformatics integration of a range of analytical steps between sample collection and systems modeling immediately assessable to the participants involved in the process. Proteomics profiling by 2D gel electrophoresis to the putative identification of differentially expressed proteins by comparison of mass spectrometry results with reference databases, includes many components of sample processing, not just analysis and interpretation, are regularly revisited and updated. In order for such updates and dissemination of data, a suitable data structure is needed. However, there are no such data structures currently available for the storing of data for multiple gels generated through a single proteomic experiments in a single XML file. This paper proposes a data structure based on XML standards to fill the void that exists between data generated by proteomics experiments and storing of data.  相似文献   

10.
SUMMARY: Large volumes of microarray data are generated and deposited in public databases. Most of this data is in the form of tab-delimited text files or Excel spreadsheets. Combining data from several of these files to reanalyze these data sets is time consuming. Microarray Data Assembler is specifically designed to simplify this task. The program can list files and data sources, convert selected text files into Excel files and assemble data across multiple Excel worksheets and workbooks. This program thus makes data assembling easy, saves time and helps avoid manual error. AVAILABILITY: The program is freely available for non-profit use, via email request from the author, after signing a Material Transfer Agreement with Johns Hopkins University.  相似文献   

11.
A pseudo-random generator is an algorithm to generate a sequence of objects determined by a truly random seed which is not truly random. It has been widely used in many applications, such as cryptography and simulations. In this article, we examine current popular machine learning algorithms with various on-line algorithms for pseudo-random generated data in order to find out which machine learning approach is more suitable for this kind of data for prediction based on on-line algorithms. To further improve the prediction performance, we propose a novel sample weighted algorithm that takes generalization errors in each iteration into account. We perform intensive evaluation on real Baccarat data generated by Casino machines and random number generated by a popular Java program, which are two typical examples of pseudo-random generated data. The experimental results show that support vector machine and k-nearest neighbors have better performance than others with and without sample weighted algorithm in the evaluation data set.  相似文献   

12.
The Biological Records Centre collects data on the occurrence of species at a particular time in a particular place. These data are used for the preparation of distribution maps, for lists of species from localities, and lists of localities for species.
The unit of recording at the national level is the 10 km square: from each square a list as complete as possible is collected. For common species presence alone is sufficient, but for rare or critical species more detailed data on locality and source are required.
Mapping data are stored on magnetic tape from which 80-column cards are generated by computer. These cards prepare distribution maps on an electric typewriter through a card reader. Details of locality, habitat, etc. are stored on two sets of 80-column individual record cards, one stored by species and the other by localities.
The Centre is encouraging the collection of data in counties, where the 2times2 km square is used, and for Europe where the 50 times 50 km square is proposed. It is important to establish not only biological records centres but a complete biological recording network.  相似文献   

13.
Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.  相似文献   

14.
Correct phosphorylation site assignment is a critical aspect of phosphoproteomic analysis. Large-scale phosphopeptide data sets that are generated through liquid chromatography-coupled tandem mass spectrometry (LC-MS/MS) analysis often contain hundreds or thousands of phosphorylation sites that require validation. To this end, we have created PhosphoScore, an open-source assignment program that is compatible with phosphopeptide data from multiple MS levels (MS(n)). The algorithm takes into account both the match quality and normalized intensity of observed spectral peaks compared to a theoretical spectrum. PhosphoScore produced >95% correct MS(2) assignments from known synthetic data, > 98% agreement with an established MS(2) assignment algorithm (Ascore), and >92% agreement with visual inspection of MS(3) and MS(4) spectra.  相似文献   

15.
Myoglobin has the ability to react with hydrogen peroxide, generating high-valent complexes similar to peroxidases (compounds I and II), and in the presence of excess hydrogen peroxide a third intermediate, compound III, with an oxymyoglobin-type structure is generated from compound II. The compound III is, however, easily one-electron reduced to peroxymyoglobin by synchrotron radiation during crystallographic data collection. We have generated and solved the 1.30 A (1 A=0.1 nm) resolution crystal structure of the peroxymyoglobin intermediate, which is isoelectric to compound 0 and has a Fe-O distance of 1.8 A and O-O bond of 1.3 A in accordance with a Fe(II)-O-O- (or Fe(III)-O-O2-) structure. The generation of the peroxy intermediate through reduction of compound III by X-rays shows the importance of using single-crystal microspectrophotometry when doing crystallography on metalloproteins. After having collected crystallographic data on a peroxy-generated myoglobin crystal, we were able (by a short annealing) to break the O-O bond leading to formation of compound II. These results indicate that the cryoradiolytic-generated peroxymyoglobin is biologically relevant through its conversion into compound II upon heating. Additionally, we have observed that the Xe1 site is occupied by a water molecule, which might be the leaving group in the compound II to compound III reaction.  相似文献   

16.
Recent Hi-C technology enables more comprehensive chromosomal conformation research, including the detection of structural variations, especially translocations. In this paper, we formulate the interchromosomal translocation detection as a problem of scan clustering in a spatial point process. We then develop TranScan, a new translocation detection method through scan statistics with the control of false discovery. The simulation shows that TranScan is more powerful than an existing sophisticated scan clustering method, especially under strong signal situations. Evaluation of TranScan against current translocation detection methods on realistic breakpoint simulations generated from real data suggests better discriminative power under the receiver-operating characteristic curve. Power analysis also highlights TranScan's consistent outperformance when sequencing depth and heterozygosity rate is varied. Comparatively, Type I error rate is lowest when evaluated using a karyotypically normal cell line. Both the simulation and real data analysis indicate that TranScan has great potentials in interchromosomal translocation detection using Hi-C data.  相似文献   

17.
Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses.  相似文献   

18.
19.
SUMMARY: GAAS, Gene Array Analyzer Software supports multi-user efficient management and suitable analyses of large amounts of gene expression data across replicated experiments. Its management framework handles input data generated by different technologies. A multi-user environment allows each user to store his/her own data visualization scheme, analysis parameters used, values and formats of the output data. The analysis engine performs: background and spot quality evaluation, data normalization, differential gene expression analyses in single and multiple replica experiments. Results of expression profiles can be interactively navigated through graphical interfaces and stored into output databases.  相似文献   

20.
Song X  Davidian M  Tsiatis AA 《Biometrics》2002,58(4):742-753
Joint models for a time-to-event (e.g., survival) and a longitudinal response have generated considerable recent interest. The longitudinal data are assumed to follow a mixed effects model, and a proportional hazards model depending on the longitudinal random effects and other covariates is assumed for the survival endpoint. Interest may focus on inference on the longitudinal data process, which is informatively censored, or on the hazard relationship. Several methods for fitting such models have been proposed, most requiring a parametric distributional assumption (normality) on the random effects. A natural concern is sensitivity to violation of this assumption; moreover, a restrictive distributional assumption may obscure key features in the data. We investigate these issues through our proposal of a likelihood-based approach that requires only the assumption that the random effects have a smooth density. Implementation via the EM algorithm is described, and performance and the benefits for uncovering noteworthy features are illustrated by application to data from an HIV clinical trial and by simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号