for Journals by Title or ISSN
for Articles by Keywords
help
  Subjects -> COMPUTER SCIENCE (Total: 1992 journals)
    - ANIMATION AND SIMULATION (29 journals)
    - ARTIFICIAL INTELLIGENCE (98 journals)
    - AUTOMATION AND ROBOTICS (98 journals)
    - CLOUD COMPUTING AND NETWORKS (60 journals)
    - COMPUTER ARCHITECTURE (9 journals)
    - COMPUTER ENGINEERING (9 journals)
    - COMPUTER GAMES (16 journals)
    - COMPUTER PROGRAMMING (24 journals)
    - COMPUTER SCIENCE (1159 journals)
    - COMPUTER SECURITY (45 journals)
    - DATA BASE MANAGEMENT (13 journals)
    - DATA MINING (32 journals)
    - E-BUSINESS (22 journals)
    - E-LEARNING (29 journals)
    - ELECTRONIC DATA PROCESSING (21 journals)
    - IMAGE AND VIDEO PROCESSING (39 journals)
    - INFORMATION SYSTEMS (105 journals)
    - INTERNET (92 journals)
    - SOCIAL WEB (50 journals)
    - SOFTWARE (34 journals)
    - THEORY OF COMPUTING (8 journals)

COMPUTER SCIENCE (1159 journals)                  1 2 3 4 5 6 | Last

Showing 1 - 200 of 872 Journals sorted alphabetically
3D Printing and Additive Manufacturing     Full-text available via subscription   (Followers: 13)
Abakós     Open Access   (Followers: 4)
ACM Computing Surveys     Hybrid Journal   (Followers: 23)
ACM Journal on Computing and Cultural Heritage     Hybrid Journal   (Followers: 9)
ACM Journal on Emerging Technologies in Computing Systems     Hybrid Journal   (Followers: 13)
ACM Transactions on Accessible Computing (TACCESS)     Hybrid Journal   (Followers: 3)
ACM Transactions on Algorithms (TALG)     Hybrid Journal   (Followers: 16)
ACM Transactions on Applied Perception (TAP)     Hybrid Journal   (Followers: 6)
ACM Transactions on Architecture and Code Optimization (TACO)     Hybrid Journal   (Followers: 9)
ACM Transactions on Autonomous and Adaptive Systems (TAAS)     Hybrid Journal   (Followers: 7)
ACM Transactions on Computation Theory (TOCT)     Hybrid Journal   (Followers: 12)
ACM Transactions on Computational Logic (TOCL)     Hybrid Journal   (Followers: 4)
ACM Transactions on Computer Systems (TOCS)     Hybrid Journal   (Followers: 18)
ACM Transactions on Computer-Human Interaction     Hybrid Journal   (Followers: 14)
ACM Transactions on Computing Education (TOCE)     Hybrid Journal   (Followers: 5)
ACM Transactions on Design Automation of Electronic Systems (TODAES)     Hybrid Journal   (Followers: 1)
ACM Transactions on Economics and Computation     Hybrid Journal  
ACM Transactions on Embedded Computing Systems (TECS)     Hybrid Journal   (Followers: 4)
ACM Transactions on Information Systems (TOIS)     Hybrid Journal   (Followers: 21)
ACM Transactions on Intelligent Systems and Technology (TIST)     Hybrid Journal   (Followers: 8)
ACM Transactions on Interactive Intelligent Systems (TiiS)     Hybrid Journal   (Followers: 3)
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)     Hybrid Journal   (Followers: 10)
ACM Transactions on Reconfigurable Technology and Systems (TRETS)     Hybrid Journal   (Followers: 7)
ACM Transactions on Sensor Networks (TOSN)     Hybrid Journal   (Followers: 9)
ACM Transactions on Speech and Language Processing (TSLP)     Hybrid Journal   (Followers: 11)
ACM Transactions on Storage     Hybrid Journal  
ACS Applied Materials & Interfaces     Full-text available via subscription   (Followers: 25)
Acta Automatica Sinica     Full-text available via subscription   (Followers: 3)
Acta Universitatis Cibiniensis. Technical Series     Open Access  
Ad Hoc Networks     Hybrid Journal   (Followers: 11)
Adaptive Behavior     Hybrid Journal   (Followers: 11)
Advanced Engineering Materials     Hybrid Journal   (Followers: 26)
Advanced Science Letters     Full-text available via subscription   (Followers: 9)
Advances in Adaptive Data Analysis     Hybrid Journal   (Followers: 8)
Advances in Artificial Intelligence     Open Access   (Followers: 16)
Advances in Calculus of Variations     Hybrid Journal   (Followers: 2)
Advances in Catalysis     Full-text available via subscription   (Followers: 5)
Advances in Computational Mathematics     Hybrid Journal   (Followers: 15)
Advances in Computer Science : an International Journal     Open Access   (Followers: 14)
Advances in Computing     Open Access   (Followers: 2)
Advances in Data Analysis and Classification     Hybrid Journal   (Followers: 51)
Advances in Engineering Software     Hybrid Journal   (Followers: 26)
Advances in Geosciences (ADGEO)     Open Access   (Followers: 10)
Advances in Human Factors/Ergonomics     Full-text available via subscription   (Followers: 26)
Advances in Human-Computer Interaction     Open Access   (Followers: 20)
Advances in Materials Sciences     Open Access   (Followers: 16)
Advances in Operations Research     Open Access   (Followers: 11)
Advances in Parallel Computing     Full-text available via subscription   (Followers: 7)
Advances in Porous Media     Full-text available via subscription   (Followers: 4)
Advances in Remote Sensing     Open Access   (Followers: 39)
Advances in Science and Research (ASR)     Open Access   (Followers: 6)
Advances in Technology Innovation     Open Access   (Followers: 2)
AEU - International Journal of Electronics and Communications     Hybrid Journal   (Followers: 8)
African Journal of Information and Communication     Open Access   (Followers: 8)
African Journal of Mathematics and Computer Science Research     Open Access   (Followers: 4)
Air, Soil & Water Research     Open Access   (Followers: 9)
AIS Transactions on Human-Computer Interaction     Open Access   (Followers: 6)
Algebras and Representation Theory     Hybrid Journal   (Followers: 1)
Algorithms     Open Access   (Followers: 11)
American Journal of Computational and Applied Mathematics     Open Access   (Followers: 4)
American Journal of Computational Mathematics     Open Access   (Followers: 4)
American Journal of Information Systems     Open Access   (Followers: 5)
American Journal of Sensor Technology     Open Access   (Followers: 4)
Anais da Academia Brasileira de Ciências     Open Access   (Followers: 2)
Analog Integrated Circuits and Signal Processing     Hybrid Journal   (Followers: 7)
Analysis in Theory and Applications     Hybrid Journal   (Followers: 1)
Animation Practice, Process & Production     Hybrid Journal   (Followers: 5)
Annals of Combinatorics     Hybrid Journal   (Followers: 3)
Annals of Data Science     Hybrid Journal   (Followers: 11)
Annals of Mathematics and Artificial Intelligence     Hybrid Journal   (Followers: 7)
Annals of Pure and Applied Logic     Open Access   (Followers: 2)
Annals of Software Engineering     Hybrid Journal   (Followers: 13)
Annual Reviews in Control     Hybrid Journal   (Followers: 6)
Anuario Americanista Europeo     Open Access  
Applicable Algebra in Engineering, Communication and Computing     Hybrid Journal   (Followers: 2)
Applied and Computational Harmonic Analysis     Full-text available via subscription   (Followers: 2)
Applied Artificial Intelligence: An International Journal     Hybrid Journal   (Followers: 14)
Applied Categorical Structures     Hybrid Journal   (Followers: 2)
Applied Clinical Informatics     Hybrid Journal   (Followers: 2)
Applied Computational Intelligence and Soft Computing     Open Access   (Followers: 12)
Applied Computer Systems     Open Access   (Followers: 1)
Applied Informatics     Open Access  
Applied Mathematics and Computation     Hybrid Journal   (Followers: 33)
Applied Medical Informatics     Open Access   (Followers: 11)
Applied Numerical Mathematics     Hybrid Journal   (Followers: 5)
Applied Soft Computing     Hybrid Journal   (Followers: 16)
Applied Spatial Analysis and Policy     Hybrid Journal   (Followers: 4)
Architectural Theory Review     Hybrid Journal   (Followers: 3)
Archive of Applied Mechanics     Hybrid Journal   (Followers: 5)
Archive of Numerical Software     Open Access  
Archives and Museum Informatics     Hybrid Journal   (Followers: 135)
Archives of Computational Methods in Engineering     Hybrid Journal   (Followers: 4)
Artifact     Hybrid Journal   (Followers: 2)
Artificial Life     Hybrid Journal   (Followers: 7)
Asia Pacific Journal on Computational Engineering     Open Access  
Asia-Pacific Journal of Information Technology and Multimedia     Open Access   (Followers: 1)
Asian Journal of Computer Science and Information Technology     Open Access  
Asian Journal of Control     Hybrid Journal  
Assembly Automation     Hybrid Journal   (Followers: 2)
at - Automatisierungstechnik     Hybrid Journal   (Followers: 1)
Australian Educational Computing     Open Access   (Followers: 1)
Automatic Control and Computer Sciences     Hybrid Journal   (Followers: 4)
Automatic Documentation and Mathematical Linguistics     Hybrid Journal   (Followers: 5)
Automatica     Hybrid Journal   (Followers: 11)
Automation in Construction     Hybrid Journal   (Followers: 6)
Autonomous Mental Development, IEEE Transactions on     Hybrid Journal   (Followers: 8)
Basin Research     Hybrid Journal   (Followers: 5)
Behaviour & Information Technology     Hybrid Journal   (Followers: 52)
Biodiversity Information Science and Standards     Open Access  
Bioinformatics     Hybrid Journal   (Followers: 279)
Biomedical Engineering     Hybrid Journal   (Followers: 16)
Biomedical Engineering and Computational Biology     Open Access   (Followers: 14)
Biomedical Engineering, IEEE Reviews in     Full-text available via subscription   (Followers: 17)
Biomedical Engineering, IEEE Transactions on     Hybrid Journal   (Followers: 33)
Briefings in Bioinformatics     Hybrid Journal   (Followers: 44)
British Journal of Educational Technology     Hybrid Journal   (Followers: 128)
Broadcasting, IEEE Transactions on     Hybrid Journal   (Followers: 10)
c't Magazin fuer Computertechnik     Full-text available via subscription   (Followers: 2)
CALCOLO     Hybrid Journal  
Calphad     Hybrid Journal  
Canadian Journal of Electrical and Computer Engineering     Full-text available via subscription   (Followers: 14)
Catalysis in Industry     Hybrid Journal   (Followers: 1)
CEAS Space Journal     Hybrid Journal   (Followers: 1)
Cell Communication and Signaling     Open Access   (Followers: 1)
Central European Journal of Computer Science     Hybrid Journal   (Followers: 5)
CERN IdeaSquare Journal of Experimental Innovation     Open Access  
Chaos, Solitons & Fractals     Hybrid Journal   (Followers: 3)
Chemometrics and Intelligent Laboratory Systems     Hybrid Journal   (Followers: 15)
ChemSusChem     Hybrid Journal   (Followers: 7)
China Communications     Full-text available via subscription   (Followers: 7)
Chinese Journal of Catalysis     Full-text available via subscription   (Followers: 2)
CIN Computers Informatics Nursing     Full-text available via subscription   (Followers: 12)
Circuits and Systems     Open Access   (Followers: 16)
Clean Air Journal     Full-text available via subscription   (Followers: 2)
CLEI Electronic Journal     Open Access  
Clin-Alert     Hybrid Journal   (Followers: 1)
Cluster Computing     Hybrid Journal   (Followers: 1)
Cognitive Computation     Hybrid Journal   (Followers: 4)
COMBINATORICA     Hybrid Journal  
Combustion Theory and Modelling     Hybrid Journal   (Followers: 13)
Communication Methods and Measures     Hybrid Journal   (Followers: 12)
Communication Theory     Hybrid Journal   (Followers: 20)
Communications Engineer     Hybrid Journal   (Followers: 1)
Communications in Algebra     Hybrid Journal   (Followers: 3)
Communications in Partial Differential Equations     Hybrid Journal   (Followers: 3)
Communications of the ACM     Full-text available via subscription   (Followers: 54)
Communications of the Association for Information Systems     Open Access   (Followers: 18)
COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering     Hybrid Journal   (Followers: 3)
Complex & Intelligent Systems     Open Access  
Complex Adaptive Systems Modeling     Open Access  
Complex Analysis and Operator Theory     Hybrid Journal   (Followers: 2)
Complexity     Hybrid Journal   (Followers: 6)
Complexus     Full-text available via subscription  
Composite Materials Series     Full-text available via subscription   (Followers: 9)
Computación y Sistemas     Open Access  
Computation     Open Access  
Computational and Applied Mathematics     Hybrid Journal   (Followers: 2)
Computational and Mathematical Methods in Medicine     Open Access   (Followers: 2)
Computational and Mathematical Organization Theory     Hybrid Journal   (Followers: 2)
Computational and Structural Biotechnology Journal     Open Access   (Followers: 2)
Computational and Theoretical Chemistry     Hybrid Journal   (Followers: 9)
Computational Astrophysics and Cosmology     Open Access   (Followers: 1)
Computational Biology and Chemistry     Hybrid Journal   (Followers: 12)
Computational Chemistry     Open Access   (Followers: 2)
Computational Cognitive Science     Open Access   (Followers: 2)
Computational Complexity     Hybrid Journal   (Followers: 4)
Computational Condensed Matter     Open Access  
Computational Ecology and Software     Open Access   (Followers: 9)
Computational Economics     Hybrid Journal   (Followers: 9)
Computational Geosciences     Hybrid Journal   (Followers: 15)
Computational Linguistics     Open Access   (Followers: 23)
Computational Management Science     Hybrid Journal  
Computational Mathematics and Modeling     Hybrid Journal   (Followers: 8)
Computational Mechanics     Hybrid Journal   (Followers: 4)
Computational Methods and Function Theory     Hybrid Journal  
Computational Molecular Bioscience     Open Access   (Followers: 2)
Computational Optimization and Applications     Hybrid Journal   (Followers: 7)
Computational Particle Mechanics     Hybrid Journal   (Followers: 1)
Computational Research     Open Access   (Followers: 1)
Computational Science and Discovery     Full-text available via subscription   (Followers: 2)
Computational Science and Techniques     Open Access  
Computational Statistics     Hybrid Journal   (Followers: 13)
Computational Statistics & Data Analysis     Hybrid Journal   (Followers: 31)
Computer     Full-text available via subscription   (Followers: 87)
Computer Aided Surgery     Hybrid Journal   (Followers: 3)
Computer Applications in Engineering Education     Hybrid Journal   (Followers: 7)
Computer Communications     Hybrid Journal   (Followers: 10)
Computer Engineering and Applications Journal     Open Access   (Followers: 5)
Computer Journal     Hybrid Journal   (Followers: 8)
Computer Methods in Applied Mechanics and Engineering     Hybrid Journal   (Followers: 21)
Computer Methods in Biomechanics and Biomedical Engineering     Hybrid Journal   (Followers: 10)
Computer Methods in the Geosciences     Full-text available via subscription   (Followers: 1)
Computer Music Journal     Hybrid Journal   (Followers: 16)
Computer Physics Communications     Hybrid Journal   (Followers: 6)
Computer Science - Research and Development     Hybrid Journal   (Followers: 7)
Computer Science and Engineering     Open Access   (Followers: 17)
Computer Science and Information Technology     Open Access   (Followers: 12)
Computer Science Education     Hybrid Journal   (Followers: 13)
Computer Science Journal     Open Access   (Followers: 20)
Computer Science Master Research     Open Access   (Followers: 10)

        1 2 3 4 5 6 | Last

Journal Cover Bioinformatics
  [SJR: 4.643]   [H-I: 271]   [279 followers]  Follow
    
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 1367-4803 - ISSN (Online) 1460-2059
   Published by Oxford University Press Homepage  [370 journals]
  • DISC: DISulfide linkage Characterization from tandem mass spectra
    • Authors: Liu Y; Sun W, Shan B, et al.
      Abstract: MotivationEnzymatic digestion under appropriate reducing conditions followed by mass spectrometry analysis has emerged as the primary method for disulfide bond analysis. The large amount of mass spectral data collected in the mass spectrometry experiment requires effective computational approaches to automate the interpretation process. Although different approaches have been developed for such purpose, they always choose to ignore the frequently observed internal ion fragments and they lack a reasonable quality control strategy and calibrated scoring scheme for the statistical validation and ranking of the reported results.ResultsIn this research, we present a new computational approach, DISC (DISulfide bond Characterization), for matching an input MS/MS spectrum against the putative disulfide linkage structures hypothetically constructed from the protein database. More specifically, we consider different ion types including a variety of internal ions that frequently observed in mass spectra resulted from disulfide linked peptides, and introduce an effective two-layer scoring scheme to evaluate the significance of the matching between spectrum and structure, based on which we have also developed a useful target-decoy strategy for providing quality control and reporting false discovery rate in the final results. Systematic experiments conducted on both low-complexity and high-complexity datasets demonstrated the efficiency of our proposed method for the identification of disulfide bonds from MS/MS spectra, and showed its potential in characterizing disulfide bonds at the proteome scale instead of just a single protein.Availability and implementationSoftware is available for downloading at http://www.csd.uwo.ca/yliu766/.Contactyliu766@uwo.caSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-10-23
       
  • MetaCache: context-aware classification of metagenomic reads using
           minhashing
    • Authors: Müller A; Hundt C, Hildebrandt A, et al.
      Abstract: MotivationMetagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy.ResultsWe introduce MetaCache—a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache’s database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data.Availability and implementationMetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache.Contactbertil.schmidt@uni-mainz.deSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-08-17
       
  • Improving protein fold recognition by extracting fold-specific features
           from predicted residue–residue contacts
    • Authors: Zhu J; Zhang H, Li S, et al.
      Abstract: MotivationAccurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue–residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge.ResultsIn this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue–residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent performance improvement, indicating robustness of our approach. Furthermore, bi-clustering results of the extracted features are compatible with fold hierarchy of proteins, implying that these features are fold-specific. Together, these results suggest that the features extracted from predicted contacts are orthogonal to alignment-related features, and the combination of them could greatly facilitate fold recognition at superfamily/fold levels and template-based prediction of protein structures.Availability and implementationSource code of DeepFR is freely available through https://github.com/zhujianwei31415/deepfr, and a web server is available through http://protein.ict.ac.cn/deepfr.Contactzheng@itp.ac.cn or dbu@ict.ac.cnSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-08-16
       
  • Prediction of delayed retention of antibodies in hydrophobic interaction
           chromatography from sequence using machine learning
    • Authors: Jain T; Boland T, Lilov A, et al.
      Abstract: MotivationThe hydrophobicity of a monoclonal antibody is an important biophysical property relevant for its developability into a therapeutic. In addition to characterizing heterogeneity, Hydrophobic Interaction Chromatography (HIC) is an assay that is often used to quantify the hydrophobicity of an antibody to assess downstream risks. Earlier studies have shown that retention times in this assay can be correlated to amino-acid or atomic propensities weighted by the surface areas obtained from protein 3-dimensional structures. The goal of this study is to develop models to enable prediction of delayed HIC retention times directly from sequence.ResultsWe utilize the randomforest machine learning approach to estimate the surface exposure of amino-acid side-chains in the variable region directly from the antibody sequence. We obtain mean-absolute errors of 4.6% for the prediction of surface exposure. Using experimental HIC data along with the estimated surface areas, we derive an amino-acid propensity scale that enables prediction of antibodies likely to have delayed retention times in the assay. We achieve a cross-validation Area Under Curve of 0.85 for the Receiver Operating Characteristic curve of our model. The low computational expense and high accuracy of this approach enables real-time assessment of hydrophobic character to enable prioritization of antibodies during the discovery process and rational engineering to reduce hydrophobic liabilities.Availability and implementationStructure data, aligned sequences, experimental data and prediction scores for test-cases, and R scripts used in this work are provided as part of the Supplementary MaterialSupplementary Material.Contacttushar.jain@adimab.comSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-08-16
       
  • Genome-scale regression analysis reveals a linear relationship for
           promoters and enhancers after combinatorial drug treatment
    • Authors: Rapakoulia T; Gao X, Huang Y, et al.
      Abstract: MotivationDrug combination therapy for treatment of cancers and other multifactorial diseases has the potential of increasing the therapeutic effect, while reducing the likelihood of drug resistance. In order to reduce time and cost spent in comprehensive screens, methods are needed which can model additive effects of possible drug combinations.ResultsWe here show that the transcriptional response to combinatorial drug treatment at promoters, as measured by single molecule CAGE technology, is accurately described by a linear combination of the responses of the individual drugs at a genome wide scale. We also find that the same linear relationship holds for transcription at enhancer elements. We conclude that the described approach is promising for eliciting the transcriptional response to multidrug treatment at promoters and enhancers in an unbiased genome wide way, which may minimize the need for exhaustive combinatorial screens.Availability and implementationThe CAGE sequence data used in this study is available in the DDBJ Sequence Read Archive (http://trace.ddbj.nig.ac.jp/index_e.html), accession number DRP001113.Contactxin.gao@kaust.edu.sa or erik.arner@riken.jpSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-08-14
       
  • Conditional asymptotic inference for the kernel association test
    • Authors: Wang K.
      Abstract: MotivationThe kernel association test (KAT) is popular in biological studies for its ability to combine weak effects potentially of opposite direction. Its P-value is typically assessed via its (unconditional) asymptotic distribution. However, such an asymptotic distribution is known only for continuous traits and for dichotomous traits. Furthermore, the derived P-values are known to be conservative when sample size is small, especially for the important case of dichotomous traits. One alternative is the permutation test, a widely accepted approximation to the exact finite sample conditional inference. But it is time-consuming to use in practice due to stringent significance criteria commonly seen in these analyses.ResultsBased on a previous theoretical result a conditional asymptotic distribution for the KAT is introduced. This distribution provides an alternative approximation to the exact distribution of the KAT. An explicit expression of this distribution is provided from which P-values can be easily computed. This method applies to any type of traits. The usefulness of this approach is demonstrated via extensive simulation studies using real genotype data and an analysis of genetic data from the Ocular Hypertension Treatment Study. Numerical results showed that the new method can control the type I error rate and is a bit conservative when compared to the permutation method. Nevertheless the proposed method may be used as a fast screening method. A time-consuming permutation procedure may be conducted at locations that show signals of association.Availability and implementationAn implementation of the proposed method is provided in the R package iGasso.Contactkai-wang@uiowa.edu
      PubDate: 2017-08-14
       
  • MAPseq: highly efficient k-mer search with confidence estimates, for rRNA
           sequence analysis
    • Authors: Matias Rodrigues J; Schmidt T, Tackmann J, et al.
      Abstract: MotivationRibosomal RNA profiling has become crucial to studying microbial communities, but meaningful taxonomic analysis and inter-comparison of such data are still hampered by technical limitations, between-study design variability and inconsistencies between taxonomies used.ResultsHere we present MAPseq, a framework for reference-based rRNA sequence analysis that is up to 30% more accurate (F½ score) and up to one hundred times faster than existing solutions, providing in a single run multiple taxonomy classifications and hierarchical operational taxonomic unit mappings, for rRNA sequences in both amplicon and shotgun sequencing strategies, and for datasets of virtually any size.Availability and implementationSource code and binaries are freely available at https://github.com/jfmrod/mapseqContactmering@imls.uzh.chSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-08-14
       
  • CRISPR-DAV: CRISPR NGS data analysis and visualization pipeline
    • Authors: Wang X; Tilford C, Neuhaus I, et al.
      Abstract: SummaryThe simplicity and precision of CRISPR/Cas9 system has brought in a new era of gene editing. Screening for desired clones with CRISPR-mediated genomic edits in a large number of samples is made possible by next generation sequencing (NGS) due to its multiplexing. Here we present CRISPR-DAV (CRISPR Data Analysis and Visualization) pipeline to analyze the CRISPR NGS data in a high throughput manner. In the pipeline, Burrows-Wheeler Aligner and Assembly Based ReAlignment are used for small and large indel detection, and results are presented in a comprehensive set of charts and interactive alignment view.Availability and implementationCRISPR-DAV is available at GitHub and Docker Hub repositories: https://github.com/pinetree1/crispr-dav.git and https://hub.docker.com/r/pinetree1/crispr-dav/.Contactxuning.wang@bms.com
      PubDate: 2017-08-14
       
  • PyLasso: a PyMOL plugin to identify lassos
    • Authors: Gierut A; Niemyska W, Dabrowski-Tumanski P, et al.
      Abstract: SummaryEntanglement in macromolecules is an important phenomenon and a subject of multidisciplinary research. As recently discovered, around 4% of proteins form new entangled motifs, called lassos. Here we present the PyLasso—a PyMOL plugin to identify and analyse properties of lassos in proteins and other (bio)polymers, as well as in other biological, physical and mathematical systems. The PyLasso is a useful tool for all researchers working on modeling of macromolecules, structure prediction, properties of polymers, entanglement in fluids and fields, etc.Availability and implementationThe PyLasso and tutorial videos are available at http://pylasso.cent.uw.edu.plContactjsulkowska@cent.uw.edu.pl
      PubDate: 2017-08-14
       
  • Metagene projection characterizes GEN2.2 and CAL-1 as relevant human
           plasmacytoid dendritic cell models
    • Authors: Carmona-Sáez P; Varela N, Luque M, et al.
      Abstract: MotivationPlasmacytoid dendritic cells (pDC) play a major role in the regulation of adaptive and innate immunity. Human pDC are difficult to isolate from peripheral blood and do not survive in culture making the study of their biology challenging. Recently, two leukemic counterparts of pDC, CAL-1 and GEN2.2, have been proposed as representative models of human pDC. Nevertheless, their relationship with pDC has been established only by means of particular functional and phenotypic similarities. With the aim of characterizing GEN2.2 and CAL-1 in the context of the main circulating immune cell populations we have performed microarray gene expression profiling of GEN2.2 and carried out an integrated analysis using publicly available gene expression datasets of CAL-1 and the main circulating primary leukocyte lineages.ResultsOur results show that GEN2.2 and CAL-1 share common gene expression programs with primary pDC, clustering apart from the rest of circulating hematopoietic lineages. We have also identified common differentially expressed genes that can be relevant in pDC biology. In addition, we have revealed the common and differential pathways activated in primary pDC and cell lines upon CpG stimulatio.Availability and implementationR code and data are available in the supplementary material.Contactpedro.carmona@genyo.es or concepcion.maranon@genyo.esSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-08-07
       
  • Holmes: a graphical tool for development, simulation and analysis of Petri
           net based models of complex biological systems
    • Authors: Radom M; Rybarczyk A, Szawulak B, et al.
      Abstract: SummaryModel development and its analysis is a fundamental step in systems biology. The theory of Petri nets offers a tool for such a task. Since the rapid development of computer science, a variety of tools for Petri nets emerged, offering various analytical algorithms. From this follows a problem of using different programs to analyse a single model. Many file formats and different representations of results make the analysis much harder. Especially for larger nets the ability to visualize the results in a proper form provides a huge help in the understanding of their significance. We present a new tool for Petri nets development and analysis called Holmes. Our program contains algorithms for model analysis based on different types of Petri nets, e.g. invariant generator, Maximum Common Transitions (MCT) sets and cluster modules, simulation algorithms or knockout analysis tools. A very important feature is the ability to visualize the results of almost all analytical modules. The integration of such modules into one graphical environment allows a researcher to fully devote his or her time to the model building and analysis.Availability and implementationAvailable at http://www.cs.put.poznan.pl/mradom/Holmes/holmes.htmlContactpiotr@cs.put.poznan.pl
      PubDate: 2017-08-07
       
  • EMHP: an accurate automated hole masking algorithm for single-particle
           cryo-EM image processing
    • Authors: Berndsen Z; Bowman C, Jang H, et al.
      Abstract: SummaryThe Electron Microscopy Hole Punch (EMHP) is a streamlined suite of tools for quick assessment, sorting and hole masking of electron micrographs. With recent advances in single-particle electron cryo-microscopy (cryo-EM) data processing allowing for the rapid determination of protein structures using a smaller computational footprint, we saw the need for a fast and simple tool for data pre-processing that could run independent of existing high-performance computing (HPC) infrastructures. EMHP provides a data preprocessing platform in a small package that requires minimal python dependencies to function.Availability and implementationhttps://www.bitbucket.org/chazbot/emhp Apache 2.0 LicenseContactbowman@scripps.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-08-07
       
  • Lep-MAP3: robust linkage mapping even for low-coverage whole genome
           sequencing data
    • Authors: Rastas P.
      Abstract: MotivationAccurate and dense linkage maps are useful in family-based linkage and association studies, quantitative trait locus mapping, analysis of genome synteny and other genomic data analyses. Moreover, linkage mapping is one of the best ways to detect errors in de novo genome assemblies, as well as to orient and place assembly contigs within chromosomes. A small mapping cross of tens of individuals will detect many errors where distant parts of the genome are erroneously joined together. With more individuals and markers, even more local errors can be detected and more contigs can be oriented. However, the tools that are currently available for constructing linkage maps are not well suited for large, possible low-coverage, whole genome sequencing datasets.ResultsHere we present a linkage mapping software Lep-MAP3, capable of mapping high-throughput whole genome sequencing datasets. Such data allows cost-efficient genotyping of millions of single nucleotide polymorphisms (SNPs) for thousands of individual samples, enabling, among other analyses, comprehensive validation and refinement of de novo genome assemblies. The algorithms of Lep-MAP3 can analyse low-coverage datasets and reduce data filtering and curation on any data. This yields more markers in the final maps with less manual work even on problematic datasets. We demonstrate that Lep-MAP3 obtains very good performance already on 5x sequencing coverage and outperforms the fastest available software on simulated data on accuracy and often on speed. We also construct de novo linkage maps on 7-12x whole-genome data on the Red postman butterfly (Heliconius erato) with almost 3 million markers.Availability and implementationLep-MAP3 is available with the source code under GNU general public license from http://sourceforge.net/projects/lep-map3.Contactpasi.rastas@helsinki.fiSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-08-03
       
  • RealityConvert: a tool for preparing 3D models of biochemical structures
           for augmented and virtual reality
    • Authors: Borrel A; Fourches D.
      Abstract: MotivationThere is a growing interest for the broad use of Augmented Reality (AR) and Virtual Reality (VR) in the fields of bioinformatics and cheminformatics to visualize complex biological and chemical structures. AR and VR technologies allow for stunning and immersive experiences, offering untapped opportunities for both research and education purposes. However, preparing 3D models ready to use for AR and VR is time-consuming and requires a technical expertise that severely limits the development of new contents of potential interest for structural biologists, medicinal chemists, molecular modellers and teachers.ResultsHerein we present the RealityConvert software tool and associated website, which allow users to easily convert molecular objects to high quality 3D models directly compatible for AR and VR applications. For chemical structures, in addition to the 3D model generation, RealityConvert also generates image trackers, useful to universally call and anchor that particular 3D model when used in AR applications. The ultimate goal of RealityConvert is to facilitate and boost the development and accessibility of AR and VR contents for bioinformatics and cheminformatics applications.Availability and implementationhttp://www.realityconvert.comContactdfourch@ncsu.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-08-02
       
  • Cloud-based interactive analytics for terabytes of genomic variants data
    • Authors: Pan C; McInnes G, Deflaux N, et al.
      Abstract: MotivationLarge scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired.ResultsWe present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information.Availability and implementationOur analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs.Contactcuiping@stanford.edu or ptsao@stanford.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-07-26
       
  • Using uncertainty to link and rank evidence from biomedical literature for
           model curation
    • Authors: Zerva C; Batista-Navarro R, Day P, et al.
      Abstract: MotivationIn recent years, there has been great progress in the field of automated curation of biomedical networks and models, aided by text mining methods that provide evidence from literature. Such methods must not only extract snippets of text that relate to model interactions, but also be able to contextualize the evidence and provide additional confidence scores for the interaction in question. Although various approaches calculating confidence scores have focused primarily on the quality of the extracted information, there has been little work on exploring the textual uncertainty conveyed by the author. Despite textual uncertainty being acknowledged in biomedical text mining as an attribute of text mined interactions (events), it is significantly understudied as a means of providing a confidence measure for interactions in pathways or other biomedical models. In this work, we focus on improving identification of textual uncertainty for events and explore how it can be used as an additional measure of confidence for biomedical models.ResultsWe present a novel method for extracting uncertainty from the literature using a hybrid approach that combines rule induction and machine learning. Variations of this hybrid approach are then discussed, alongside their advantages and disadvantages. We use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction. Our approach achieves F-scores of 0.76 and 0.88 based on the BioNLP-ST and Genia-MK corpora, respectively, making considerable improvements over previously published work. Moreover, we evaluate our proposed system on pathways related to two different areas, namely leukemia and melanoma cancer research.Availability and implementationThe leukemia pathway model used is available in Pathway Studio while the Ras model is available via PathwayCommons. Online demonstration of the uncertainty extraction system is available for research purposes at http://argo.nactem.ac.uk/test. The related code is available on https://github.com/c-zrv/uncertainty_components.git. Details on the above are available in the Supplementary MaterialSupplementary Material.Contactsophia.ananiadou@manchester.ac.ukSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-07-24
       
  • wft4galaxy: a workflow testing tool for galaxy
    • Authors: Piras M; Pireddu L, Zanetti G.
      Abstract: MotivationWorkflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way to automatically test Galaxy workflows and ensure their correctness has appeared in the literature.ResultsWith wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container—the latter reducing installation effort to a minimum.Availability and implementationAvailable at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0.Contactmarcoenrico.piras@crs4.it
      PubDate: 2017-07-24
       
  • Mechanisms to protect the privacy of families when using the transmission
           disequilibrium test in genome-wide association studies
    • Authors: Wang M; Ji Z, Wang S, et al.
      Abstract: MotivationInappropriate disclosure of human genomes may put the privacy of study subjects and of their family members at risk. Existing privacy-preserving mechanisms for Genome-Wide Association Studies (GWAS) mainly focus on protecting individual information in case–control studies. Protecting privacy in family-based studies is more difficult. The transmission disequilibrium test (TDT) is a powerful family-based association test employed in many rare disease studies. It gathers information about families (most frequently involving parents, affected children and their siblings). It is important to develop privacy-preserving approaches to disclose TDT statistics with a guarantee that the risk of family ‘re-identification’ stays below a pre-specified risk threshold. ‘Re-identification’ in this context means that an attacker can infer that the presence of a family in a study.MethodsIn the context of protecting family-level privacy, we developed and evaluated a suite of differentially private (DP) mechanisms for TDT. They include Laplace mechanisms based on the TDT test statistic, P-values, projected P-values and exponential mechanisms based on the TDT test statistic and the shortest Hamming distance (SHD) score.ResultsUsing simulation studies with a small cohort and a large one, we showed that that the exponential mechanism based on the SHD score preserves the highest utility and privacy among all proposed DP methods. We provide a guideline on applying our DP TDT in a real dataset in analyzing Kawasaki disease with 187 families and 906 SNPs. There are some limitations, including: (1) the performance of our implementation is slow for real-time results generation and (2) handling missing data is still challenging.Availability and implementationThe software dpTDT is available in https://github.com/mwgrassgreen/dpTDT.Contactmengw1@stanford.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-07-21
       
  • PLACNETw: a web-based tool for plasmid reconstruction from bacterial
           genomes
    • Authors: Vielva L; de Toro M, Lanza V, et al.
      Abstract: SummaryPLACNET is a graph-based tool for reconstruction of plasmids from next generation sequence pair-end datasets. PLACNET graphs contain two types of nodes (assembled contigs and reference genomes) and two types of edges (scaffold links and homology to references). Manual pruning of the graphs is a necessary requirement in PLACNET, but this is difficult for users without solid bioinformatic background. PLACNETw, a webtool based on PLACNET, provides an interactive graphic interface, automates BLAST searches, and extracts the relevant information for decision making. It allows a user with domain expertise to visualize the scaffold graphs and related information of contigs as well as reference sequences, so that the pruning operations can be done interactively from a personal computer without the need for additional tools. After successful pruning, each plasmid becomes a separate connected component subgraph. The resulting data are automatically downloaded by the user. Availability and implementationPLACNETw is freely available at https://castillo.dicom.unican.es/upload/.Contactdelacruz@unican.esSupplementary informationA tutorial video and several solved examples are available at https://castillo.dicom.unican.es/placnetw_video/ and https://castillo.dicom.unican.es/examples/.
      PubDate: 2017-07-21
       
  • Detect differentially methylated regions using non-homogeneous hidden
           Markov model for methylation array data
    • Authors: Shen L; Zhu J, Robert Li S, et al.
      Abstract: MotivationDNA methylation is an important epigenetic mechanism in gene regulation and the detection of differentially methylated regions (DMRs) is enthralling for many disease studies. There are several aspects that we can improve over existing DMR detection methods: (i) methylation statuses of nearby CpG sites are highly correlated, but this fact has seldom been modelled rigorously due to the uneven spacing; (ii) it is practically important to be able to handle both paired and unpaired samples; and (iii) the capability to detect DMRs from a single pair of samples is demanded.ResultsWe present DMRMark (DMR detection based on non-homogeneous hidden Markov model), a novel Bayesian framework for detecting DMRs from methylation array data. It combines the constrained Gaussian mixture model that incorporates the biological knowledge with the non-homogeneous hidden Markov model that models spatial correlation. Unlike existing methods, our DMR detection is achieved without predefined boundaries or decision windows. Furthermore, our method can detect DMRs from a single pair of samples and can also incorporate unpaired samples. Both simulation studies and real datasets from The Cancer Genome Atlas showed the significant improvement of DMRMark over other methods.Availability and implementationDMRMark is freely available as an R package at the CRAN R package repository.Contactxfan@cuhk.edu.hkSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-07-20
       
  • proFIA : a data preprocessing workflow for flow injection analysis coupled
           to high-resolution mass spectrometry
    • Authors: Delabrière A; Hohenester U, Colsch B, et al.
      Abstract: MotivationFlow Injection Analysis coupled to High-Resolution Mass Spectrometry (FIA-HRMS) is a promising approach for high-throughput metabolomics. FIA-HRMS data, however, cannot be preprocessed with current software tools which rely on liquid chromatography separation, or handle low resolution data only.ResultsWe thus developed the proFIA package, which implements a suite of innovative algorithms to preprocess FIA-HRMS raw files, and generates the table of peak intensities. The workflow consists of 3 steps: (i) noise estimation, peak detection and quantification, (ii) peak grouping across samples and (iii) missing value imputation. In addition, we have implemented a new indicator to quantify the potential alteration of the feature peak shape due to matrix effect. The preprocessing is fast (less than 15 s per file), and the value of the main parameters (ppm and dmz) can be easily inferred from the mass resolution of the instrument. Application to two metabolomics datasets (including spiked serum samples) showed high precision (96%) and recall (98%) compared with manual integration. These results demonstrate that proFIA achieves very efficient and robust detection and quantification of FIA-HRMS data, and opens new opportunities for high-throughput phenotyping.Availability and implementationThe proFIA software (as well as the plasFIA dataset) is available as an R package on the Bioconductor repository (http://bioconductor.org/packages/proFIA), and as a Galaxy module on the Main Toolshed (https://toolshed.g2.bx.psu.edu), and on the Workflow4Metabolomics online infrastructure (http://workflow4metabolomics.org).Contactalexis.delabriere@cea.fr or etienne.thevenot@cea.frSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-07-14
       
  • GLASS: assisted and standardized assessment of gene variations from Sanger
           sequence trace data
    • Authors: Pal K; Bystry V, Reigl T, et al.
      Abstract: MotivationSanger sequencing is still being employed for sequence variant detection by many laboratories, especially in a clinical setting. However, chromatogram interpretation often requires manual inspection and in some cases, considerable expertise.ResultsWe present GLASS, a web-based Sanger sequence trace viewer, editor, aligner and variant caller, built to assist with the assessment of variations in ‘curated’ or user-provided genes. Critically, it produces a standardized variant output as recommended by the Human Genome Variation Society.Availability and implementationGLASS is freely available at http://bat.infspire.org/genomepd/glass/ with source code at https://github.com/infspiredBAT/GLASS.Contactnikos.darzentas@gmail.com or malcikova.jitka@fnbrno.czSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-07-13
       
  • cgHeliParm: analysis of dsDNA helical parameters for coarse-grained
           MARTINI molecular dynamics simulations
    • Authors: Faustino I; Marrink S.
      Abstract: SummaryWe introduce cgHeliParm, a python program that provides the conformational analysis of Martini-based coarse-grained double strand DNA molecules. The software calculates the helical parameters such as base, base pair and base pair step parameters. cgHeliParm can be used for the analysis of coarse grain Martini molecular dynamics trajectories without transformation into atomistic models.Availability and implementationThis package works with Python 2.7 on MacOS and Linux. The program is freely available for download from https://github.com/ifaust83/cgheliparm. Together with the main script, the base reference files CG_X_std.lib, a number of examples and R scripts are also available from the same website. A tutorial on the use and application is also available at http://cgmartini.nl/index.php/tutorials-general-introduction/tutorial-martini-dna.Contacti.faustino@rug.nlSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-07-13
       
  • HUGIn: Hi-C Unifying Genomic Interrogator
    • Authors: Martin J; Xu Z, Reiner A, et al.
      Abstract: MotivationHigh throughput chromatin conformation capture (3C) technologies, such as Hi-C and ChIA-PET, have the potential to elucidate the functional roles of non-coding variants. However, most of published genome-wide unbiased chromatin organization studies have used cultured cell lines, limiting their generalizability.ResultsWe developed a web browser, HUGIn, to visualize Hi-C data generated from 21 human primary tissues and cell lines. HUGIn enables assessment of chromatin contacts both constitutive across and specific to tissue(s) and/or cell line(s) at any genomic loci, including GWAS SNPs, eQTLs and cis-regulatory elements, facilitating the understanding of both GWAS and eQTL results and functional genomics data.Availability and implementationHUGIn is available at http://yunliweb.its.unc.edu/HUGInContactyunli@med.unc.edu or hum@ccf.orgSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-06-05
       
  • Spresso: an ultrafast compound pre-screening method based on compound
           decomposition
    • Authors: Yanagisawa K; Komine S, Suzuki S, et al.
      Abstract: MotivationRecently, the number of available protein tertiary structures and compounds has increased. However, structure-based virtual screening is computationally expensive owing to docking simulations. Thus, methods that filter out obviously unnecessary compounds prior to computationally expensive docking simulations have been proposed. However, the calculation speed of these methods is not fast enough to evaluate ≥ 10 million compounds.ResultsIn this article, we propose a novel, docking-based pre-screening protocol named Spresso (Speedy PRE-Screening method with Segmented cOmpounds). Partial structures (fragments) are common among many compounds; therefore, the number of fragment variations needed for evaluation is smaller than that of compounds. Our method increases calculation speeds by ∼200-fold compared to conventional methods.Availability and ImplementationSpresso is written in C ++ and Python, and is available as an open-source code (http://www.bi.cs.titech.ac.jp/spresso/) under the GPLv3 license.Contactakiyama@c.titech.ac.jpSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-03-30
       
  • MapReduce for accurate error correction of next-generation sequencing data
    • Authors: Zhao L; Chen Q, Li W, et al.
      Abstract: MotivationNext-generation sequencing platforms have produced huge amounts of sequence data. This is revolutionizing every aspect of genetic and genomic research. However, these sequence datasets contain quite a number of machine-induced errors—e.g. errors due to substitution can be as high as 2.5%. Existing error-correction methods are still far from perfect. In fact, more errors are sometimes introduced than correct corrections, especially by the prevalent k-mer based methods. The existing methods have also made limited exploitation of on-demand cloud computing.ResultsWe introduce an error-correction method named MEC, which uses a two-layered MapReduce technique to achieve high correction performance. In the first layer, all the input sequences are mapped to groups to identify candidate erroneous bases in parallel. In the second layer, the erroneous bases at the same position are linked together from all the groups for making statistically reliable corrections. Experiments on real and simulated datasets show that our method outperforms existing methods remarkably. Its per-position error rate is consistently the lowest, and the correction gain is always the highest.Availability and ImplementationThe source code is available at bioinformatics.gxu.edu.cn/ngs/mec.Contactswongls@comp.nus.edu.sg or jinyan.li@uts.edu.auSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-02-16
       
  • An efficient concordant integrative analysis of multiple large-scale
           two-sample expression data sets
    • Authors: Lai Y; Zhang F, Nayak T, et al.
      Abstract: MotivationWe have proposed a mixture model based approach to the concordant integrative analysis of multiple large-scale two-sample expression datasets. Since the mixture model is based on the transformed differential expression test P-values (z-scores), it is generally applicable to the expression data generated by either microarray or RNA-seq platforms. The mixture model is simple with three normal distribution components for each dataset to represent down-regulation, up-regulation and no differential expression. However, when the number of datasets increases, the model parameter space increases exponentially due to the component combination from different datasets.ResultsIn this study, motivated by the well-known generalized estimating equations (GEEs) for longitudinal data analysis, we focus on the concordant components and assume that the proportions of non-concordant components follow a special structure. We discuss the exchangeable, multiset coefficient and autoregressive structures for model reduction, and their related expectation-maximization (EM) algorithms. Then, the parameter space is linear with the number of datasets. In our previous study, we have applied the general mixture model to three microarray datasets for lung cancer studies. We show that more gene sets (or pathways) can be detected by the reduced mixture model with the exchangeable structure. Furthermore, we show that more genes can also be detected by the reduced model. The Cancer Genome Atlas (TCGA) data have been increasingly collected. The advantage of incorporating the concordance feature has also been clearly demonstrated based on TCGA RNA sequencing data for studying two closely related types of cancer.Availability and ImplementationAdditional results are included in a supplemental file. Computer program R-functions are freely available at http://home.gwu.edu/∼ylai/research/Concordance.Contactylai@gwu.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-02-08
       
  • TimesVector: a vectorized clustering approach to the analysis of time
           series transcriptome data from multiple phenotypes
    • Authors: Jung I; Jo K, Kang H, et al.
      Abstract: MotivationIdentifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions.ResultsWe present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully.Availability and ImplementationThe TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/.Contactsunkim.bioinfo@snu.ac.krSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
      PubDate: 2017-01-17
       
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
 
Home (Search)
Subjects A-Z
Publishers A-Z
Customise
APIs
Your IP address: 54.82.81.154
 
About JournalTOCs
API
Help
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016