for Journals by Title or ISSN
for Articles by Keywords
help
  Subjects -> COMPUTER SCIENCE (Total: 1991 journals)
    - ANIMATION AND SIMULATION (29 journals)
    - ARTIFICIAL INTELLIGENCE (98 journals)
    - AUTOMATION AND ROBOTICS (98 journals)
    - CLOUD COMPUTING AND NETWORKS (61 journals)
    - COMPUTER ARCHITECTURE (9 journals)
    - COMPUTER ENGINEERING (9 journals)
    - COMPUTER GAMES (16 journals)
    - COMPUTER PROGRAMMING (24 journals)
    - COMPUTER SCIENCE (1157 journals)
    - COMPUTER SECURITY (45 journals)
    - DATA BASE MANAGEMENT (13 journals)
    - DATA MINING (32 journals)
    - E-BUSINESS (22 journals)
    - E-LEARNING (29 journals)
    - ELECTRONIC DATA PROCESSING (21 journals)
    - IMAGE AND VIDEO PROCESSING (39 journals)
    - INFORMATION SYSTEMS (105 journals)
    - INTERNET (92 journals)
    - SOCIAL WEB (50 journals)
    - SOFTWARE (34 journals)
    - THEORY OF COMPUTING (8 journals)

COMPUTER SCIENCE (1157 journals)                  1 2 3 4 5 6 | Last

Showing 1 - 200 of 872 Journals sorted alphabetically
3D Printing and Additive Manufacturing     Full-text available via subscription   (Followers: 13)
Abakós     Open Access   (Followers: 4)
ACM Computing Surveys     Hybrid Journal   (Followers: 22)
ACM Journal on Computing and Cultural Heritage     Hybrid Journal   (Followers: 9)
ACM Journal on Emerging Technologies in Computing Systems     Hybrid Journal   (Followers: 13)
ACM Transactions on Accessible Computing (TACCESS)     Hybrid Journal   (Followers: 3)
ACM Transactions on Algorithms (TALG)     Hybrid Journal   (Followers: 16)
ACM Transactions on Applied Perception (TAP)     Hybrid Journal   (Followers: 6)
ACM Transactions on Architecture and Code Optimization (TACO)     Hybrid Journal   (Followers: 9)
ACM Transactions on Autonomous and Adaptive Systems (TAAS)     Hybrid Journal   (Followers: 7)
ACM Transactions on Computation Theory (TOCT)     Hybrid Journal   (Followers: 12)
ACM Transactions on Computational Logic (TOCL)     Hybrid Journal   (Followers: 4)
ACM Transactions on Computer Systems (TOCS)     Hybrid Journal   (Followers: 18)
ACM Transactions on Computer-Human Interaction     Hybrid Journal   (Followers: 13)
ACM Transactions on Computing Education (TOCE)     Hybrid Journal   (Followers: 4)
ACM Transactions on Design Automation of Electronic Systems (TODAES)     Hybrid Journal   (Followers: 1)
ACM Transactions on Economics and Computation     Hybrid Journal  
ACM Transactions on Embedded Computing Systems (TECS)     Hybrid Journal   (Followers: 4)
ACM Transactions on Information Systems (TOIS)     Hybrid Journal   (Followers: 20)
ACM Transactions on Intelligent Systems and Technology (TIST)     Hybrid Journal   (Followers: 8)
ACM Transactions on Interactive Intelligent Systems (TiiS)     Hybrid Journal   (Followers: 3)
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)     Hybrid Journal   (Followers: 10)
ACM Transactions on Reconfigurable Technology and Systems (TRETS)     Hybrid Journal   (Followers: 7)
ACM Transactions on Sensor Networks (TOSN)     Hybrid Journal   (Followers: 8)
ACM Transactions on Speech and Language Processing (TSLP)     Hybrid Journal   (Followers: 11)
ACM Transactions on Storage     Hybrid Journal  
ACS Applied Materials & Interfaces     Full-text available via subscription   (Followers: 25)
Acta Automatica Sinica     Full-text available via subscription   (Followers: 3)
Acta Universitatis Cibiniensis. Technical Series     Open Access  
Ad Hoc Networks     Hybrid Journal   (Followers: 11)
Adaptive Behavior     Hybrid Journal   (Followers: 11)
Advanced Engineering Materials     Hybrid Journal   (Followers: 26)
Advanced Science Letters     Full-text available via subscription   (Followers: 8)
Advances in Adaptive Data Analysis     Hybrid Journal   (Followers: 8)
Advances in Artificial Intelligence     Open Access   (Followers: 16)
Advances in Calculus of Variations     Hybrid Journal   (Followers: 2)
Advances in Catalysis     Full-text available via subscription   (Followers: 5)
Advances in Computational Mathematics     Hybrid Journal   (Followers: 15)
Advances in Computer Science : an International Journal     Open Access   (Followers: 14)
Advances in Computing     Open Access   (Followers: 2)
Advances in Data Analysis and Classification     Hybrid Journal   (Followers: 51)
Advances in Engineering Software     Hybrid Journal   (Followers: 25)
Advances in Geosciences (ADGEO)     Open Access   (Followers: 10)
Advances in Human Factors/Ergonomics     Full-text available via subscription   (Followers: 26)
Advances in Human-Computer Interaction     Open Access   (Followers: 20)
Advances in Materials Sciences     Open Access   (Followers: 16)
Advances in Operations Research     Open Access   (Followers: 11)
Advances in Parallel Computing     Full-text available via subscription   (Followers: 7)
Advances in Porous Media     Full-text available via subscription   (Followers: 4)
Advances in Remote Sensing     Open Access   (Followers: 38)
Advances in Science and Research (ASR)     Open Access   (Followers: 6)
Advances in Technology Innovation     Open Access   (Followers: 2)
AEU - International Journal of Electronics and Communications     Hybrid Journal   (Followers: 8)
African Journal of Information and Communication     Open Access   (Followers: 8)
African Journal of Mathematics and Computer Science Research     Open Access   (Followers: 4)
Air, Soil & Water Research     Open Access   (Followers: 9)
AIS Transactions on Human-Computer Interaction     Open Access   (Followers: 6)
Algebras and Representation Theory     Hybrid Journal   (Followers: 1)
Algorithms     Open Access   (Followers: 11)
American Journal of Computational and Applied Mathematics     Open Access   (Followers: 4)
American Journal of Computational Mathematics     Open Access   (Followers: 4)
American Journal of Information Systems     Open Access   (Followers: 7)
American Journal of Sensor Technology     Open Access   (Followers: 4)
Anais da Academia Brasileira de Ciências     Open Access   (Followers: 2)
Analog Integrated Circuits and Signal Processing     Hybrid Journal   (Followers: 7)
Analysis in Theory and Applications     Hybrid Journal   (Followers: 1)
Animation Practice, Process & Production     Hybrid Journal   (Followers: 5)
Annals of Combinatorics     Hybrid Journal   (Followers: 3)
Annals of Data Science     Hybrid Journal   (Followers: 11)
Annals of Mathematics and Artificial Intelligence     Hybrid Journal   (Followers: 7)
Annals of Pure and Applied Logic     Open Access   (Followers: 2)
Annals of Software Engineering     Hybrid Journal   (Followers: 12)
Annual Reviews in Control     Hybrid Journal   (Followers: 6)
Anuario Americanista Europeo     Open Access  
Applicable Algebra in Engineering, Communication and Computing     Hybrid Journal   (Followers: 2)
Applied and Computational Harmonic Analysis     Full-text available via subscription   (Followers: 2)
Applied Artificial Intelligence: An International Journal     Hybrid Journal   (Followers: 14)
Applied Categorical Structures     Hybrid Journal   (Followers: 2)
Applied Clinical Informatics     Hybrid Journal   (Followers: 2)
Applied Computational Intelligence and Soft Computing     Open Access   (Followers: 12)
Applied Computer Systems     Open Access   (Followers: 1)
Applied Informatics     Open Access  
Applied Mathematics and Computation     Hybrid Journal   (Followers: 33)
Applied Medical Informatics     Open Access   (Followers: 11)
Applied Numerical Mathematics     Hybrid Journal   (Followers: 5)
Applied Soft Computing     Hybrid Journal   (Followers: 16)
Applied Spatial Analysis and Policy     Hybrid Journal   (Followers: 4)
Architectural Theory Review     Hybrid Journal   (Followers: 3)
Archive of Applied Mechanics     Hybrid Journal   (Followers: 5)
Archive of Numerical Software     Open Access  
Archives and Museum Informatics     Hybrid Journal   (Followers: 132)
Archives of Computational Methods in Engineering     Hybrid Journal   (Followers: 4)
Artifact     Hybrid Journal   (Followers: 2)
Artificial Life     Hybrid Journal   (Followers: 6)
Asia Pacific Journal on Computational Engineering     Open Access  
Asia-Pacific Journal of Information Technology and Multimedia     Open Access   (Followers: 1)
Asian Journal of Computer Science and Information Technology     Open Access  
Asian Journal of Control     Hybrid Journal  
Assembly Automation     Hybrid Journal   (Followers: 2)
at - Automatisierungstechnik     Hybrid Journal   (Followers: 1)
Australian Educational Computing     Open Access   (Followers: 1)
Automatic Control and Computer Sciences     Hybrid Journal   (Followers: 4)
Automatic Documentation and Mathematical Linguistics     Hybrid Journal   (Followers: 5)
Automatica     Hybrid Journal   (Followers: 11)
Automation in Construction     Hybrid Journal   (Followers: 6)
Autonomous Mental Development, IEEE Transactions on     Hybrid Journal   (Followers: 8)
Basin Research     Hybrid Journal   (Followers: 5)
Behaviour & Information Technology     Hybrid Journal   (Followers: 52)
Bioinformatics     Hybrid Journal   (Followers: 306)
Biomedical Engineering     Hybrid Journal   (Followers: 16)
Biomedical Engineering and Computational Biology     Open Access   (Followers: 13)
Biomedical Engineering, IEEE Reviews in     Full-text available via subscription   (Followers: 17)
Biomedical Engineering, IEEE Transactions on     Hybrid Journal   (Followers: 32)
Briefings in Bioinformatics     Hybrid Journal   (Followers: 44)
British Journal of Educational Technology     Hybrid Journal   (Followers: 129)
Broadcasting, IEEE Transactions on     Hybrid Journal   (Followers: 10)
c't Magazin fuer Computertechnik     Full-text available via subscription   (Followers: 2)
CALCOLO     Hybrid Journal  
Calphad     Hybrid Journal  
Canadian Journal of Electrical and Computer Engineering     Full-text available via subscription   (Followers: 14)
Catalysis in Industry     Hybrid Journal   (Followers: 1)
CEAS Space Journal     Hybrid Journal  
Cell Communication and Signaling     Open Access   (Followers: 1)
Central European Journal of Computer Science     Hybrid Journal   (Followers: 5)
CERN IdeaSquare Journal of Experimental Innovation     Open Access  
Chaos, Solitons & Fractals     Hybrid Journal   (Followers: 3)
Chemometrics and Intelligent Laboratory Systems     Hybrid Journal   (Followers: 15)
ChemSusChem     Hybrid Journal   (Followers: 7)
China Communications     Full-text available via subscription   (Followers: 7)
Chinese Journal of Catalysis     Full-text available via subscription   (Followers: 2)
CIN Computers Informatics Nursing     Full-text available via subscription   (Followers: 12)
Circuits and Systems     Open Access   (Followers: 16)
Clean Air Journal     Full-text available via subscription   (Followers: 2)
CLEI Electronic Journal     Open Access  
Clin-Alert     Hybrid Journal   (Followers: 1)
Cluster Computing     Hybrid Journal   (Followers: 1)
Cognitive Computation     Hybrid Journal   (Followers: 4)
COMBINATORICA     Hybrid Journal  
Combustion Theory and Modelling     Hybrid Journal   (Followers: 13)
Communication Methods and Measures     Hybrid Journal   (Followers: 12)
Communication Theory     Hybrid Journal   (Followers: 20)
Communications Engineer     Hybrid Journal   (Followers: 1)
Communications in Algebra     Hybrid Journal   (Followers: 3)
Communications in Partial Differential Equations     Hybrid Journal   (Followers: 3)
Communications of the ACM     Full-text available via subscription   (Followers: 53)
Communications of the Association for Information Systems     Open Access   (Followers: 18)
COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering     Hybrid Journal   (Followers: 3)
Complex & Intelligent Systems     Open Access  
Complex Adaptive Systems Modeling     Open Access  
Complex Analysis and Operator Theory     Hybrid Journal   (Followers: 2)
Complexity     Hybrid Journal   (Followers: 6)
Complexus     Full-text available via subscription  
Composite Materials Series     Full-text available via subscription   (Followers: 9)
Computación y Sistemas     Open Access  
Computation     Open Access  
Computational and Applied Mathematics     Hybrid Journal   (Followers: 2)
Computational and Mathematical Methods in Medicine     Open Access   (Followers: 2)
Computational and Mathematical Organization Theory     Hybrid Journal   (Followers: 2)
Computational and Structural Biotechnology Journal     Open Access   (Followers: 2)
Computational and Theoretical Chemistry     Hybrid Journal   (Followers: 9)
Computational Astrophysics and Cosmology     Open Access   (Followers: 1)
Computational Biology and Chemistry     Hybrid Journal   (Followers: 12)
Computational Chemistry     Open Access   (Followers: 2)
Computational Cognitive Science     Open Access   (Followers: 2)
Computational Complexity     Hybrid Journal   (Followers: 4)
Computational Condensed Matter     Open Access  
Computational Ecology and Software     Open Access   (Followers: 9)
Computational Economics     Hybrid Journal   (Followers: 9)
Computational Geosciences     Hybrid Journal   (Followers: 14)
Computational Linguistics     Open Access   (Followers: 23)
Computational Management Science     Hybrid Journal  
Computational Mathematics and Modeling     Hybrid Journal   (Followers: 8)
Computational Mechanics     Hybrid Journal   (Followers: 4)
Computational Methods and Function Theory     Hybrid Journal  
Computational Molecular Bioscience     Open Access   (Followers: 2)
Computational Optimization and Applications     Hybrid Journal   (Followers: 7)
Computational Particle Mechanics     Hybrid Journal   (Followers: 1)
Computational Research     Open Access   (Followers: 1)
Computational Science and Discovery     Full-text available via subscription   (Followers: 2)
Computational Science and Techniques     Open Access  
Computational Statistics     Hybrid Journal   (Followers: 13)
Computational Statistics & Data Analysis     Hybrid Journal   (Followers: 31)
Computer     Full-text available via subscription   (Followers: 87)
Computer Aided Surgery     Hybrid Journal   (Followers: 3)
Computer Applications in Engineering Education     Hybrid Journal   (Followers: 7)
Computer Communications     Hybrid Journal   (Followers: 10)
Computer Engineering and Applications Journal     Open Access   (Followers: 5)
Computer Journal     Hybrid Journal   (Followers: 7)
Computer Methods in Applied Mechanics and Engineering     Hybrid Journal   (Followers: 22)
Computer Methods in Biomechanics and Biomedical Engineering     Hybrid Journal   (Followers: 10)
Computer Methods in the Geosciences     Full-text available via subscription   (Followers: 1)
Computer Music Journal     Hybrid Journal   (Followers: 16)
Computer Physics Communications     Hybrid Journal   (Followers: 6)
Computer Science - Research and Development     Hybrid Journal   (Followers: 7)
Computer Science and Engineering     Open Access   (Followers: 17)
Computer Science and Information Technology     Open Access   (Followers: 11)
Computer Science Education     Hybrid Journal   (Followers: 12)
Computer Science Journal     Open Access   (Followers: 20)
Computer Science Master Research     Open Access   (Followers: 10)
Computer Science Review     Hybrid Journal   (Followers: 10)

        1 2 3 4 5 6 | Last

Journal Cover Chemometrics and Intelligent Laboratory Systems
  [SJR: 0.697]   [H-I: 92]   [15 followers]  Follow
    
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 0169-7439
   Published by Elsevier Homepage  [3048 journals]
  • PRFFECT: A versatile tool for spectroscopists
    • Abstract: Publication date: Available online 11 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Benjamin R. Smith, Matthew J. Baker, David S. Palmer
      PRFFECT is a computer program to aid with spectral preprocessing and the development of classification models. Via a simple text interface, PRFFECT allows users to select wavenumber ranges, perform spectral preprocessing, carry out data partitioning (into training and testing datasets), run a Random Forest classification, compute statistical results, and identify important descriptors for the classification. The preprocessing options provided fall into four categories: binning, smoothing, normalisation, and baseline correction. The program outputs a wide-variety of useful data, including classification metrics and graphs showing the importance of individual wavenumbers to the classification models. As proof-of-concept, PRFFECT has been benchmarked on preprocessing and classification of four food analysis datasets. Sensitivities and specificities above 0.92 were obtained in all cases. The results show that different preprocessing procedures are optimal for different datasets. The PRFFECT software is available freely to the community via GitHub. Link: https://github.com/Palmer-Lab/PRFFECT.

      PubDate: 2017-11-16T00:40:41Z
       
  • Chemometrics-enhanced high performance liquid chromatography strategy for
           simultaneous determination on seven nitroaromatic compounds in
           environmental water
    • Abstract: Publication date: Available online 11 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Yihuan Zhao, Yuan Yuan, Jianfang Chen, Menglong Li, Xuemei Pu
      Nitroaromatic compounds pose a significant risk to human health and other living organisms. However, conventionally high performance liquid chromatography (HPLC) methods are complicated and time-consuming for complete separation of multiple nitroaromatic compounds due to their high structure-similarity. In the work, a facile yet effective strategy, which combines HPLC embedded by diode array detection (HPLC-DAD) with multivariate curve resolution-alternating least squares (MCR-ALS), is explored to improve the HPLC analysis. Based on the strategy, seven nitroaromatic compounds with similar structures could be rapidly quantified under a simple isocratic elution condition (acetonitrile/water: 65:35, v/v) within 10 min. With the aid of the second-order advantage of MCR-ALS, acceptable quantification results are still achieved for the real water samples, despite of significantly overlapping peaks resulted from the similar structures of the analytes and the unexpected interferences from the real water sample. The error analysis and the elliptical joint confidence region (EJCR) test further confirm the reliability of the predicted result. Thereby, the HPLC method coupled with the second-order calibration algorithm could provide a simple, quick and accurate strategy for simultaneous determination on the multiple nitroaromatic compounds in environmental monitoring.

      PubDate: 2017-11-16T00:40:41Z
       
  • Optimal design of experiments for excipient compatibility studies
    • Abstract: Publication date: 15 December 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 171
      Author(s): Wannes G.M. Akkermans, Hans Coppenolle, Peter Goos
      A crucial stage in the development of medical drugs is to study which additives, usually called excipients, impact the active ingredient stability. This type of study is generally named an excipient compatibility study and requires a mixture experiment. Subsequently, the effect of the storage conditions, more specifically the relative humidity and temperature, on the stability is investigated. This so-called accelerated life test involves a factorial type of experiment. It has become, however, customary to include the storage conditions in the compatibility study. This provides valuable information concerning potential interactions between excipient combinations and storage conditions. Experiments that combine a mixture experiment with a factorial experiment are generally named mixture-process variable experiments. A limited number of designs for mixture-process variable experiments are available in the literature. One problem is that the proposed designs offer little flexibility. Another is that the required number of runs becomes prohibitively large for large numbers of mixture components. In this paper, we examine flexible, optimal designs for realistic mixture-process variable experiments. Our motivation is to provide guidance to pharmaceutical formulation scientists concerning state-of-the art models and designs for excipient compatibility studies. Using several proof-of-concept examples, we demonstrate that I-optimal designs offer both flexibility and small variances of prediction. We also discuss a real-life example, which could be used as a blueprint for future studies. Because many excipient compatibility studies are not completely randomized, we pay special attention to their logistics and to the resulting randomization restrictions, which lead to split-plot and strip-plot experiments.

      PubDate: 2017-11-08T12:51:16Z
       
  • Steel surface defects recognition based on multi-type statistical features
           and enhanced twin support vector machine
    • Abstract: Publication date: 15 December 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 171
      Author(s): Maoxiang Chu, Rongfen Gong, Song Gao, Jie Zhao
      For steel surface defect recognition, feature extraction and classification are very important steps. In this paper, multi-type statistical features and enhanced twin support vector machine classifier are formulated and applied. Firstly, four types of statistical features for different attributes of defect region are proposed. They are insensitive to affine transformation in scale and rotation. And those attributes include shape distance and local binary pattern operators with sign and magnitude. Then, dummy boundary samples and representative samples are extracted from steel surface defect dataset. Dummy boundary samples include the sparse boundary information of dataset. They can reduce the adverse impact of noise samples. Representative samples with local and global properties are used to replace samples with quadratic loss. They can exclude noise samples. Based on dummy boundary samples and representative samples, enhanced twin support vector machine is formulated. On one hand, it can solve multi-class classification problem. On the other hand, it has anti-noise ability and high classification efficiency. At last, enhanced twin support vector machine classifier and multi-type statistical features are applied to recognize five types of steel surface defects. The experimental results show that our proposed multi-class classifier has perfect performance in efficiency and accuracy. And multi-type statistical features are in favor of improving classification performance.

      PubDate: 2017-11-08T12:51:16Z
       
  • An effective high-quality prediction intervals construction method based
           on parallel bootstrapped RVM for complex chemical processes
    • Abstract: Publication date: 15 December 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 171
      Author(s): Yuan Xu, Chuan Mi, Qun-Xiong Zhu, Jing-Yang Gao, Yan-Lin He
      Data-driven techniques have been becoming increasingly popular and widely used for prediction in complex chemical processes. In general, prediction results are usually provided with point estimations. However, point estimations cannot meet the requirement of accuracy due to the characteristics of high-dimension, high nonlinearity, and containing noise of process data. In order to deal with the trend and the uncertainty of process data, an effective prediction intervals (PIs) method based on bootstrap and relevance vector machine (Bootstrapped RVM) is proposed in this paper. In the proposed method, bootstrap is adopted to obtain PIs and RVM is used as a regression tool. In order to accelerate the training and testing phases, a parallel algorithm is utilized in the proposed Bootstrapped RVM method. In addition, to better evaluating the quality of PIs, some performance indicators are improved. Finally, the proposed method is validated by using a standard function and High Density Polyethylene (HDPE) data. Compared with some other PIs methods, the simulation results show that the proposed method can achieve better performance in terms of prediction accuracy and training time.

      PubDate: 2017-11-08T12:51:16Z
       
  • Biomass concentration prediction via an input-weighed model based on
           artificial neural network and peer-learning cuckoo search
    • Abstract: Publication date: 15 December 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 171
      Author(s): Qiangda Yang, Hongbo Gao, Weijun Zhang
      Biomass concentration (BC) is considered as one of the most important biochemical parameters. Its reliable on-line estimation is crucial in the real-time status monitoring and quality control of fermentation processes. Considering that each input variable may have different influence on BC in actual fermentation processes, a novel input-weighted empirical model based on the radial basis function neural network (RBFN) and a new peer-learning cuckoo search (PLCS) algorithm, is proposed in this paper to predict BC. The determination of input variable weights and RBFN parameters for the proposed BC prediction model is framed as one and the same optimization problem. Inspired by a common social phenomenon that the mutual learning between team members (peers) would be extremely helpful for their team to accomplish a work efficiently, a PLCS algorithm is proposed to solve the resulting optimization (RO) problem, and thereby accomplish the development of the proposed BC prediction model. The effectiveness and superiority of this new prediction model is validated using the production data from a lab-scale nosiheptide fermentation process. Moreover, the performance of PLCS is also demonstrated on the RO problem with these data and some benchmark functions.

      PubDate: 2017-11-08T12:51:16Z
       
  • Identification of hindered internal rotational mode for complex chemical
           species: A data mining approach with multivariate logistic regression
           model
    • Abstract: Publication date: Available online 8 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Triet H.M. Le, Tung T. Tran, Lam K. Huynh
      Thermodynamic properties are essential to understand and describe many chemical/biological processes in the real environment. To obtain correct thermodynamic data of chemical species for a wide range of temperatures, a rigorous Hindered Internal Rotation (HIR) treatment must be considered. Such a treatment requires detailed information about the internal rotation (i.e., rotational axis, group, frequency and symmetry and hindrance potential). However, it is very tedious, even prone-to-error, for chemists to prepare the input parameters for such a treatment. Among the HIR parameters, the rotational frequency (or mode) is the most difficult element due to the complex molecular structure and mixing vibrational modes of chemical species. Recently, a rule-based framework has been proposed to help chemists with this tedious process (Le et al., Comput. Theor. Chem., 2017, 61). This approach has been demonstrated to work well for simple species; however, it still lacked the ability to handle more complex cases. Therefore, in this study, a data mining approach is proposed to overcome the challenges of the previous algorithm. Within this framework, the HIR pattern was found using the features extracted from existing data provided by chemists. More specifically, multivariate logistic regression was implemented to analyze the chemical data to better predict the rotational frequency (mode) of chemical species as well as to highlight the effect of each attribute of the rotation. The experimental results were demonstrated to be more accurate than the previous study in terms of both accuracy and completeness. It also gives meaningful insights into the HIR itself. The proposed approach of this research will be integrated into MSMC-GUI (https://sites.google.com/site/msmccode/manual/gui-1) to provide chemists with both an interactive and robust tool to prepare the data for their thermodynamic calculations on-the-fly.

      PubDate: 2017-11-08T12:51:16Z
       
  • Dealing with three-way data containing missing values by new weighted
           method for second-order calibration
    • Abstract: Publication date: Available online 7 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Yong Li, Hai-long Wu, Xiang-yang Yu
      Multi-way data arrays contain missing values for several reasons, such as various malfunctions of instruments, responses being outside instrument ranges, irregular measurement intervals between samples and data postprocessing. In the present study, one new method, weighted penalty alternating trilinear decomposition (W-APTLD), based on the weighted trilinear model and the idea of alternative trilinear decomposition was given to analyze three-way data arrays containing missing values. In addition, one improved core consistency diagnostic method (W-CORCONDIA) was proposed to estimate the chemical ranks of three-way data arrays containing missing values. The results of one simulation and two real data sets demonstrate that the new method W-APTLD could be used to deal with missing values and reserves the second-order advantage. When meeting excessive factors, W-APTLD could give more accurate results than weighted PARAFAC (W-PARAFAC), PARAFAC with single imputation (PARAFAC-SI) and incomplete data PARAFAC (INDAFAC). The convergence rate of W-APTLD was much faster than W-PARAFAC and PARAFAC-SI but slower than INDAFAC. Better than W-PARAFAC and PARAFAC-SI, W-APTLD could overcome the problem due to severe collinearity. In addition, this new method could be extended to analyze higher-way data arrays containing missing values.

      PubDate: 2017-11-08T12:51:16Z
       
  • Robust analysis of spectra with strong background signals by
           First-Derivative Indirect Hard Modeling (FD-IHM)
    • Abstract: Publication date: Available online 7 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): P. Beumers, D. Engel, T. Brands, H.J. Koß, A. Bardow
      Spectral analysis of mixtures often faces challenges due to nonlinear effects such as peak shifts or strong background signals. Nonlinear mixture effects can be effectively treated by the Indirect Hard Modeling (IHM) Method. In IHM, mixture effects are captured by adapting hard models of pure component spectra when fitting a mixture model. However, IHM requires a suitable background treatment, which can become laborious. Background signals do not arise from the components of interest but often superimpose their spectra. In statistical methods for spectral analysis, background treatment is often conducted by derivatives of a spectrum. Derivatives effectively damp broad background signals. Standard IHM is not applicable to derivatives of spectra as the negative parts of a derivative spectrum cannot be modeled by pseudo-Voigt peaks which are always positive. In this work, we propose First-Derivative Indirect Hard Modeling (FD-IHM). FD-IHM uses the analytical derivatives of the peak functions. The analytical derivatives are fitted to numerical derivatives of the spectra. Thereby, we combine background treatment by first derivatives with the IHM method to treat nonlinear effects. The presented FD-IHM is validated using Raman spectra of ethanol/acetone mixtures. To introduce a variety of background signals, we used fluorescence dye, scattering bodies (yeast) and various background light sources. Classical IHM allows us to predict the test sets with a root-mean-square error of prediction (RMSEP) ranging from 0.60 wt% to 2.06 wt%, but careful manual background treatment had to be applied. With FD-IHM, we reduce the RMSEP error by 21%–73% without any background treatment. Thus, FD-IHM allows for both, efficient and accurate analysis of spectra with large background signals.

      PubDate: 2017-11-08T12:51:16Z
       
  • Sampling Error Profile Analysis for calibration transfer in multivariate
           calibration
    • Abstract: Publication date: Available online 7 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Feiyu Zhang, Wanchao Chen, Ruoqiu Zhang, Boyang Ding, Heming Yao, Jiong Ge, Lei Ju, Wuye Yang, Yiping Du
      A new strategy named Sampling Error Profile Analysis (SEPA) is proposed in the optimization for some parameters in piecewise direct standardization (PDS), such as the number of principal components and window size, and the evaluation for the calibration transfer. Partial least squares (PLS) with mean-centering is used in PDS for calibration transfer. Random re-sampling is carried out in SEPA to obtain a series of subsets and build same number sub-models that produce corresponding number root mean square errors (RMSE), of which the mean value and standard deviation are calculated. To take both accuracy and stability into account, the sum of the mean value and standard deviation are used for parameter optimization and model evaluation. The performance of the proposed strategy has been tested on two data sets: a ternary mixture dataset and a corn dataset. Compared with PDS, SEPA-PDS obtained lower prediction errors, indicating that the transfer model would be more robust and effective when using the parameters optimized by SEPA. Compared with other two commonly used calibration transfer methods of slope and bias correction (SBC) and spectral space transformation (SST), SEPA-PDS acquired more satisfactory results.

      PubDate: 2017-11-08T12:51:16Z
       
  • Dynamic hypersphere based support vector data description for batch
           process monitoring
    • Abstract: Publication date: Available online 6 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Jianlin Wang, Weimin Liu, Kepeng Qiu, Tao Yu, Liqiang Zhao
      Support Vector Data Description (SVDD) is an efficient monitoring method that captures the spherically shaped boundary around the normal batch data and sets the control limit related to support vectors (SVs) for online monitoring. Using nonlinear transformation functions, SVDD constructs an irregular hypersphere in high dimensional space. When the batch process is complicated, the accuracy of monitoring will decrease with traditional control limit of SVDD. In this paper, dynamic hypersphere based support vector data description (DH-SVDD) is proposed for batch process monitoring. In training process, static hypersphere is built by the important SVs of training dataset. In testing process, dynamic hypersphere is built by the important SVs of combined dataset with current test sample and training dataset. If there is a significant change between these two hyperspheres, it means that the current test sample is an outlier. Thus, DH-SVDD has a relatively high monitoring accuracy because it fully considers relationship between the current test sample and the historical training dataset in high dimensional space. Comparison is conducted between the proposed DH-SVDD and traditional methods such as K-chart-SVDD, max limit SVDD and validation limit SVDD. The effectiveness of the DH-SVDD is also verified by a semiconductor etch process and a fed-batch penicillin fermentation process.

      PubDate: 2017-11-08T12:51:16Z
       
  • SRO_ANN: An integrated MatLab toolbox for multiple surface response
           optimization using radial basis functions
    • Abstract: Publication date: Available online 4 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Pablo C. Giordano, Héctor C. Goicoechea, Alejandro C. Olivieri
      SRO_ANN, a MatLab® toolbox for implementing multiple surface response optimization by artificial neural networks (SRO_ANN) is presented. Radial basis functions, a type of artificial neural networks, are applied through an easily managed graphical user interface. A detailed description of the interface is provided, including a simulated and two literature examples which allow one to show the potentiality of the software. The discussed experimental examples correspond to: (1) the maximization of the research octane number (RON) of fuels, influenced by three factors (reaction temperature, operating pressure and low liquid hourly space velocity), and (2) the optimization of the calcification process for diced tomatoes, evaluated through three different responses (calcium content, firmness and pH), which are affected by three factors (calcium concentration, solution temperature and treatment time). The results show that the application of a nonparametric tool can enhance the performance of optimization modeling tasks.

      PubDate: 2017-11-08T12:51:16Z
       
  • HYPER-Tools. A graphical user-friendly interface for multivariate and
           hyperspectral image analysis
    • Abstract: Publication date: Available online 4 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): José Manuel Amigo, Nabiallah Mobaraki
      HYPER-Tools is a new graphical user-friendly interface (GUI) especially designed for the analysis of multivariate and hyperspectral images. This easy-to-use interface works under Matlab environment and integrates fundamental types of spectral and spatial pre-processing methods as well as the main chemometric tools (exploratory data analysis, clustering, regression, and classification) for multivariate and hyperspectral image analysis. The main feature of HYPER-Tools is the powerful visualization tools implemented and the interaction of the user with the interface, meaning that the user does barely need Matlab skill to use it. Together with the GUI several tutorials and videos are provided in the official website (https://www.hypertools.org/) showing the working procedure of HYPER-Tools step by step in different situations.

      PubDate: 2017-11-08T12:51:16Z
       
  • Industrial Mooney viscosity prediction using fast semi-supervised
           empirical model
    • Abstract: Publication date: 15 December 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 171
      Author(s): Wenjian Zheng, Xuejin Gao, Yi Liu, Limei Wang, Jianguo Yang, Zengliang Gao
      In industrial rubber mixing processes, the quality index (i.e., Mooney viscosity) cannot be online measured directly. Traditional data-driven empirical models for online prediction of the Mooney viscosity have not utilized the information hidden in lots of unlabeled data (e.g., process input variables during each mixing batch). A simple semi-supervised nonlinear soft sensor method for the Mooney viscosity prediction is developed. It integrates extreme learning machine (ELM) and the graph Laplacian regularization into a unified modeling framework. The useful information in unlabeled data can be explored and introduced into the prediction model. Furthermore, a bagging-based ensemble strategy is combined into semi-supervised ELM (SELM) to obtain more accurate predictions. The Mooney viscosity prediction in an industrial internal mixer exhibits its promising prediction performance of the proposed method by incorporating the information in unlabeled data efficiently.

      PubDate: 2017-11-02T12:45:23Z
       
  • Authenticity assessment and protection of high-quality Nebbiolo-based
           Italian wines through machine learning
    • Abstract: Publication date: Available online 31 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Luigi Portinale, Giorgio Leonardi, Marco Arlorio, Jean Daniel Coïsson, Fabiano Travaglia, Monica Locatelli
      This paper discusses an intelligent data analysis approach, based on machine learning techniques, and aimed at the definition of methods for chemical data analysis assessment of the authenticity and protection, against fake versions, of some of the highest value Nebbiolo-based wines from Piedmont (Italy). This is an important and very relevant issue in the wine market, where commercial frauds related to such a kind of products are estimated to be worth millions of Euros. The objective is twofold: to show that the problem can be addressed without expensive and hyper-specialized wine chemical analyses, and to demonstrate the actual usefulness of classification algorithms for data mining and machine learning on the resulting chemical profiles. Following Wagstaff's proposal for practical exploitation of machine learning approaches, we describe how data have been collected and prepared for the production of different datasets, how suitable classification models have been identified and how the interpretation of the results suggests the emergence of an active role of machine learning classification techniques, based on standard chemical profiling, for the assesment of the authenticity of the wines target of the study. Experiments have been performed with both datasets of real samples and with syntethic datasets which have been artificially generated from real data.

      PubDate: 2017-11-02T12:45:23Z
       
  • Authentication and inference of seal stamps on Chinese traditional
           painting by using multivariate classification and near-infrared
           spectroscopy
    • Abstract: Publication date: Available online 31 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Zewei Chen, An Gu, Xin Zhang, Zhuoyong Zhang
      Chinese traditional paintings occupy an important position in Chinese cultural heritage and it is very important for archeologist and artist to identify their authenticity, which is difficult to be realized. Near-infrared spectroscopy (NIRS) coupled with multivariate models was used for authenticating stamps of 12 seals on a Chinese traditional painting in this work. The robustness of linear and nonlinear multivariate models, i.e. partial least squares-discriminant analysis (PLS-DA) and support vector machine (SVM), were evaluated by adding 5 different levels of noise (from 1% to 5%) into 3 original NIR spectra of each the stamps. These spectral data with noise added were fused together with original spectra to establish identification models and then to evaluate the abilities of the two models to tolerate noise disturbance. Accuracies of 92.6% and 100% were yielded by linear PLS-DA and nonlinear SVM methods respectively. The results demonstrate the feasibility of multivariate approaches in authenticating stamps of seals on the Chinese traditional painting. It is also important and necessary to infer the approximate eras of seal stamps on Chinese traditional painting in archeological study. By comparing the Mahalanobis distances between the 12 stamps on the painting, hierarchical cluster analysis (HCA) was adopted to assist the inference of eras for those unknown seal stamps on the Chinese traditional painting. This work demonstrates that NIR spectroscopy combined with multivariate models can be utilized as a non-destructive approach for authentication of stamps on Chinese traditional painting. HCA can also provide useful information to speculate the time period of the stamps of unknown seals on the Chinese traditional painting.

      PubDate: 2017-11-02T12:45:23Z
       
  • An improved multi-kernel RVM integrated with CEEMD for high-quality
           intervals prediction construction and its intelligent modeling application
           
    • Abstract: Publication date: Available online 31 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Yuan Xu, Mingqing Zhang, Qunxiong Zhu, Yanlin He
      Most of existing modeling methods are based on point prediction. However, the accuracy of point prediction cannot meet the actual demand due to existence of high noise, volatility, complexity and irregularity inherent in the chemical process data. In order to solve this problem, a hybrid high-quality prediction intervals (PIs) method integrating complementary ensemble empirical mode decomposition (CEEMD), sample entropy (SE), and improved multi-kernel relevant vector machine (RVM) is proposed in the paper. The proposed PIs method mainly consists of three aspects: Firstly, CEEMD is adopted to decompose the original data into several independent intrinsic mode functions (IMFS), and then SE is used to analyze the complexity of the extracted IMFs to obtain recombinant components; Secondly, an improved multi-kernel RVM (MRVM) is presented to predict recombinant components independently, in which the linear kernel and the Gaussian kernel are combined; Thirdly, the predicted components are aggregated to obtain an ensemble result using another MRVM for constructing the high-quality PIs. To verify the performance of the proposed PIs method, a purified Terephthalic acid (PTA) solvent system is selected. Comparative simulation results demonstrate that the proposed PIs method greatly outperforms on coverage probability and sharpness in all the step predictions.

      PubDate: 2017-11-02T12:45:23Z
       
  • Comparing multiple statistical methods for inverse prediction in nuclear
           forensics applications
    • Abstract: Publication date: Available online 29 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): John R. Lewis, Adah Zhang, Christine Anderson-Cook
      Forensic science seeks to predict source characteristics using measured observables. Statistically, this objective can be thought of as an inverse problem where interest is in the unknown source characteristics or factors (X) of some underlying causal model producing the observables or responses ( Y = g ( X ) + e r r o r ). This paper reviews several statistical methods for use in inverse problems and demonstrates that comparing results from multiple methods can be used to assess predictive capability. Motivation for assessing inverse predictions comes from the desired application to historical and future experiments involving nuclear material production for forensics research in which inverse predictions, along with an assessment of predictive capability, are desired. Four methods are reviewed in this article. Two are forward modeling methods and two are direct inverse modeling methods. Forward modeling involves building a forward casual model of the responses (Y) as a function of the source characteristics (X) using content knowledge and data ideally obtained from a well-designed experiment. The model is then inverted to produce estimates of X given a new set of responses. Direct inverse modeling involves building prediction models of the source characteristics ( X ) as a function of the responses (Y) – subverting estimation of any underlying causal relationship. Through use of simulations and a data set from an actual plutonium production experiment, it is shown that agreement of predictions across methods is an indication of strong predictive capability, whereas disagreement indicates the current data are not conducive to making good predictions.

      PubDate: 2017-11-02T12:45:23Z
       
  • Stable variable selection of class-imbalanced data with precision-recall
           criterion
    • Abstract: Publication date: Available online 26 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Guang-Hui Fu, Feng Xu, Bing-Yang Zhang, Lun-Zhao Yi
      Screening important variables for class-imbalanced data is still a challenging task. In this study, we propose an algorithm for stably selecting key variables on class-imbalanced data based on the precision-recall curve (PRC), where the PRC is utilized as the assessment criterion in the model building stage, and sparse regularized logistic regression combined with subsampling (SRLRS) is designed to perform stable variable selection. Considering the characteristic of class-imbalanced data, we also proposed classification-based partition for cross validation, as well as leaving half of majority observations out and leaving one minority observation out (LHO-LOO) for subsampling. Simulation results and real data showed that our algorithm is highly suitable for handling class-imbalanced data, and that the PRC can be an alternative evaluation criterion for model selection when handling class-imbalanced data.

      PubDate: 2017-11-02T12:45:23Z
       
  • A strategy on the definition of applicability domain of model based on
           population analysis
    • Abstract: Publication date: 15 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 170
      Author(s): Yong-Huan Yun, Dong-Ming Wu, Guang-Yi Li, Qiao-Yan Zhang, Xia Yang, Qin-Fen Li, Dong-Sheng Cao, Qing-Song Xu
      In recent years, there have been growing concerns about quality evaluation of predictions of developed quantitative structure-activity relationship (QSAR) models. Well-defined applicability domain (AD) is very crucial in the validation of QSAR models as stated in the third principle of Organization for Economic Co-operation and Development (OECD). In this study, a new perspective on defining AD of model based on population analysis (PA) strategy, including model population analysis (MPA) and approach population analysis (APA), was proposed. MPA employed classical AD approaches to define AD with a vast amount of sub-datasets derived from training set. On the basis of MPA, the classical AD approaches could distinguish part of the samples that cannot be distinguished by full training samples. APA was then used to get a union of all results generated by the used AD approaches to give a consensus list of samples as falling outside the AD. In order to investigate the performance of PA strategy in defining AD with the classical AD approaches, two QSAR datasets were used. The results show that implementing PA strategy can assist three classical AD approaches to distinguish the additional samples that cannot be distinguished using full training dataset. When excluding the additional samples, the root mean square error of prediction of test set decreased, suggesting that PA strategy has a potential to distinguish the samples that cannot be reliably predicted.

      PubDate: 2017-10-26T06:03:27Z
       
  • ADME properties evaluation in drug discovery: Prediction of plasma protein
           binding using NSGA-II combining PLS and consensus modeling
    • Abstract: Publication date: 15 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 170
      Author(s): Ning-Ning Wang, Zhen-Ke Deng, Chen Huang, Jie Dong, Min-Feng Zhu, Zhi-Jiang Yao, Alex F. Chen, Ai-Ping Lu, Qi Mi, Dong-Sheng Cao
      Plasma protein binding affinity of a drug compound has a strong influence on its pharmacodynamic behavior because it can affect the drug uptake and distribution. In this study, we collected a sizeable dataset consisting of 1830 drug compounds from several accessible sources. A descriptor pool composed of four different types of descriptors (2-D, 3-D, Estate and MACCS) was firstly built and non-dominated sorting genetic algorithm (NSGA-II) combining partial least square (PLS) regression was applied to select important descriptors for model building. Finally, we obtained a consensus model (for five-fold cross-validation: Q2 = 0.750; RMSE = 16.151) based on five different predictive models built using random forest (RF), support vector machine (SVM), Cubist, Gaussian process (GP), and Boosting. Further, a test set and two external validation datasets were applied to validate its robustness and practicality. For the test set, RT 2 = 0.787 and RMSET = 14.154; when two external datasets were applied, REx 2 = 0.704 and 0.703, RMSEEx = 18.194 and 17.233 respectively. Additionally, according to OECD principles, y-randomization, Williams plot and scaffold analyses were proposed to validate the reliability and practical application domain of our predictive model. Overall, our consensus model shows a good prediction performance and generalization ability in predicting plasma protein binding (PPB). After analyzing those important descriptors selected by NSGA-II and RF, we concluded that the PPB of a drug compound is mainly related to its lipophilicity, aromatic rings, and partial charge properties. In summary, this study developed a robust and practical consensus model for PPB prediction and it could be used to the distribution evaluation and risk assessment in the early stage of drug development.

      PubDate: 2017-10-26T06:03:27Z
       
  • Rapid identification of milk samples by high and low frequency unfolded
           partial least squares discriminant analysis combined with near-infrared
           spectroscopy
    • Abstract: Publication date: 15 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 170
      Author(s): Xihui Bian, Caixia Zhang, Peng Liu, Junfu Wei, Xiaoyao Tan, Ligang Lin, Na Chang, Yugao Guo
      A high and low frequency unfolded partial least squares discriminant analysis (HLFUPLS-DA) for building a pattern recognition model of near-infrared (NIR) spectra is proposed to identify milk samples. In the approach, the spectra are decomposed into different frequency components by empirical mode decomposition (EMD) at first. Then the former high frequency components are summed as a high frequency matrix and vice versa. Thirdly, the high and low frequency matrices are extended to an extended matrix in the variable dimension. Finally, PLS-DA model is built between the extended matrix and the target vectors. Coupled with NIR spectroscopy, HLUPLS-DA is applied to identify milk samples of different qualities. Comparing with PLS-DA and other signal processing techniques combined with PLS-DA, the proposed method is proved to be a promising tool for spectral qualitative analysis of complex samples.

      PubDate: 2017-10-26T06:03:27Z
       
  • A reliable multiclass classification model for identifying the subtypes of
           parotid neoplasms constructed with variable combination population
           analysis and partial least squares regression based on Raman spectra
    • Abstract: Publication date: 15 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 170
      Author(s): Yongning Yang, Fanfan Xie, Bing Yan, Yi Li, Junmei Xu, Yuan Liu, Zhining Wen, Menglong Li
      Pleomorphic adenoma (PA), Warthin's tumor (WT) and mucoepidermoid carcinoma (MEC) are three common subtypes of salivary gland tumors, for which the occurrence site is located in the parotid gland. Accurately diagnosing the subtypes of parotid tumors plays a vital role in the surgical treatment. Unfortunately, the current studies mainly focus on the binary classification of parotid tumors. The preoperative multi-classification of them has still been underexplored. For the purpose of broadening the application area of the predictive models and facilitating the clinical preoperative diagnosis, we suggested a multi-classification model, which was constructed by combining the variable combination population analysis (VCPA) algorithm with the partial least squares regression (PLSR), to simultaneously discriminate the three subtypes of parotid tumors as well as the normal parotid gland tissue based on the Raman spectra of the tissue samples. In addition, we investigated the impact of generating Raman spectra from different sampling locations on the reliability of the predictive models. For the validation set, the overall accuracy in predicting the subtypes of parotid tumors and the normal parotid gland tissue was 0.867. Similarly, the accuracies achieved by the models constructed with the Raman spectra from two different sampling locations were 0.877 and 0.883, respectively, indicating the minor influence of the sampling locations on the predictive models. Our findings can be helpful for establishing the method of rapidly diagnosing the salivary gland tumors preoperatively in clinics. Moreover, the characteristic wavenumbers used in model construction were highly associated with the variations of the structures and contents of nucleic acids, collagen, proteins, lipids and DNA/RNA in gland tissue, which revealed the mainly difference among three types of parotid tumors and can be conductive to better understanding the molecular mechanisms of them.

      PubDate: 2017-10-26T06:03:27Z
       
  • Chemometric algorithms for analyzing high dimensional temperature
           dependent near infrared spectra
    • Abstract: Publication date: 15 November 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 170
      Author(s): Xiaoyu Cui, Jin Zhang, Wensheng Cai, Xueguang Shao
      High dimensional data analysis has gained widespread acceptance with the rapid development of analytical instruments and experimental techniques. Benefiting from the second–order advantage, high order chemometric algorithms have shown a great ability to match the nature of data and extract the latent components from the data. In this study, multiway principal component analysis (NPCA), parallel factor analysis (PARAFAC) and alternating trilinear decomposition (ATLD) were employed, respectively, to extract the information from temperature dependent near infrared (NIR) spectra of alcohol aqueous solutions. The variations of the structure induced by temperature and concentration in the solutions were analyzed by the three algorithms. Spectral features can be observed from the loadings obtained by NPCA, which explain the maximum variances. Spectral profiles computed by PARAFAC and ATLD contain the spectral information of the components. The former prefers to show the information of ethanol, water and ethanol–water cluster, while the latter opts for describing the information of the ethanol and different water clusters in the solution. However, all the three algorithms are able to capture the quantitative information from the spectra. Therefore, high order chemometric algorithms may provide powerful tools for analyzing temperature dependent NIR spectra to obtain the structural and quantitative information of the aqueous solutions.
      Graphical abstract image

      PubDate: 2017-10-26T06:03:27Z
       
  • Detection of formaldehyde oxidation catalysis by MCR-ALS analysis of
           multiset ToF-SIMS data in positive and negative modes
    • Abstract: Publication date: Available online 25 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Xin Zhang, Nicolas Nuns, Jean-François Lamonier, Romà Tauler, Ludovic Duponchel
      Toxicity of formaldehyde is extremely high even at very low concentration in air. We have used ToF-SIMS (Time of Flight Secondary Ion Mass Spectroscopy) to explore formaldehyde oxidation catalysed by birnessite (Kx (Mn4+,Mn3+)2O4) in the temperature range from 25 to 200 °C. ToF-SIMS is a powerful tool for the generation of chemical maps because of its high spatial resolution, its sensitivity and relatively low acquisition time. In this method the mass analysis of positive and negative ions sputtered out of the uppermost layers of the analyzed sample is performed. ToF-SIMS produces large raw data sets with rich chemical information but rather complex to be analyzed and interpreted. This is why the application of chemometric methods is proposed in this work for the exploration of ToF-SIMS complex data sets. In this work, MCR-ALS (Multivariate Curve Resolution-Alternating Least Squares) has been applied to resolve both the positive and negative ions present in ToF-SIMS data sets simultaneously analyzed using a data matrix augmentation strategy. Birnessite without catalyzed formaldehyde was first analyzed to resolve background contributions and use them to implement a selectivity constraint for MCR-ALS to remove them in the analysis of the formaldehyde oxidation data sets. Results show that, following the temperature increase, concentration of formaldehyde combined with manganese ions decreased whereas concentration of manganese oxide increased. Conformation changes of manganese formaldehyde metal complex were then inferred. It is concluded that the formation of the metal complex species formed between two manganese ions and only one formaldehyde molecule is very unlikely to exist.

      PubDate: 2017-10-26T06:03:27Z
       
  • Structure-aware enhancement of imaging mass spectrometry data for semantic
           segmentation
    • Abstract: Publication date: Available online 22 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Luming Liang, Zhi-min Zhang
      Mass Spectrometry Imaging data contains structural information, where similar mass spectra represent the same object. However, due to data contaminations during the measurement, the structural information in the image is in-apparent. We develop a new approach to enhance these structures and then semantically segmenting the given Mass Spectrometry Imaging data by following the enhanced structures. After the pipelined steps of image enhancement, raw segmentation and semantic clustering, meaningful color-coded image segmentation is produced, which greatly captures the main structure of the image and also suppress pixel-wise variations introduced during the measurement. Comparisons show the effectiveness of our pipeline. A biological application based on our enhancement and segmentation shows that our method can be used to identify regions of tissue sections.

      PubDate: 2017-10-26T06:03:27Z
       
  • Kernel dynamic latent variable model for process monitoring with
           application to hot strip mill process
    • Abstract: Publication date: Available online 21 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Gang Li, Kaixiang Peng, Tao Yuan, Maiying Zhong
      Dynamic models are preferred rather than static models in the process monitoring of modern manufacturing. Compared with static models, dynamic models can reflect not only correlations but also causality among measurements and manipulated variables. Linear dynamic models are very common due to the simplicity of representation and parameter estimation. However, because of natural nonlinearity of a dynamic process, it is ineffective to apply linear models within a long term and varying condition. Nonlinear dynamic models are hence desired under such a circumstance. In this paper, a kernel dynamic latent variable (KDLV) model is proposed to describe the nonlinearity between original measurements and dynamic latent variables. This model is an extension of dynamic latent variable model in the aspect of nonlinearity, and keeps all merits of it. In order to build such a model, a KDLV search algorithm is proposed to acquire key model parameters from data, then a KDLV modeling procedure is derived to complete the whole model. After the KDLV model is trained from data, corresponding detection strategy is also developed to perform fault detection. The KDLV based fault detection is applied to the monitoring of hot strip mill process and comparison study is also conducted on both DLV and DKPCA models.

      PubDate: 2017-10-26T06:03:27Z
       
  • The construction of D- and I-optimal designs for mixture experiments with
           linear constraints on the components
    • Abstract: Publication date: Available online 20 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Roelof Coetzer, Linda M. Haines
      Mixture experiments in which linear constraints are imposed on the components of the mixture are used extensively in practice. The problem of constructing designs which are in some sense optimal for this experimental setting is not straightforward. More specifically, the design space is a polytope embedded in a regular simplex. In the present paper, a new approach to the problem, which builds on the fact that points in the polytope can be represented as convex combinations of the vertices of that polytope, is introduced. Some theory underpinning this idea and rooted in the notion of barycentric coordinates is developed and algorithms for the construction of exact and approximate D- and I-optimal designs for the Scheffé model are delineated. The methodology is illustrated by means of examples involving three- and four-component mixtures.

      PubDate: 2017-10-26T06:03:27Z
       
  • ChemBCPP: A freely available web server for calculating commonly used
           physicochemical properties
    • Abstract: Publication date: 15 December 2017
      Source:Chemometrics and Intelligent Laboratory Systems, Volume 171
      Author(s): Jie Dong, Ning-Ning Wang, Ke-Yi Liu, Min-Feng Zhu, Yong-Huan Yun, Wen-Bin Zeng, Alex F. Chen, Dong-Sheng Cao
      The behavior of a chemical in human or environment mostly depends on its several key physicochemical properties, such as aqueous solubility, octanol-water partition coefficient (logP), boiling point (BP), density, flash point (FP), viscosity, surface tension (ST), vapor pressure (VP) and melting point (MP). Commonly, these properties are important for the environmental sciences and drug discovery, such as the absorption, distribution, metabolism, excretion, and toxicity (ADMET) for medicinal compounds and the common risk assessment for problematic chemicals. At present, the quantitative structure-property relationship (QSPR) model was widely applied to save time and money investment in the early stage of chemical research. Although some satisfactory models were already obtained, most of them are not available for the public researchers and thus cannot be directly applied to practical research projects. Herein, in this study, we developed a user-friendly web server named ChemBCPP that can be used to predict aforementioned 8 important physicochemical properties and calculate several other commonly used properties just by uploading a molecular structure or file. In addition, for a new chemical entity, users can not only get its predicted value but also obtain a leverage value (h value) which can be used to evaluate the reliability of predictive result. We believe that ChemBCPP could be widely applied in environmental science, chemical synthesis and drug ADMET fields with the demand for high quality of chemical properties. ChemBCPP could be freely available via http://chembcpp.scbdd.com.
      Graphical abstract image

      PubDate: 2017-10-17T23:25:58Z
       
  • A note on the calculation of reference change values for two consecutive
           normally distributed laboratory results
    • Abstract: Publication date: Available online 13 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): M. Regis, Th.A. Postma, E.R. van den Heuvel
      Population reference limits are inadequate for personalized analyses of medical laboratory results. Reference change values have been recommended as a valid alternative in assessing individual changes across sequential measurements. In this paper, we investigate the accuracy (type I error) and power (complement of type II error) of reference change values under three different statistical modeling scenarios and show that oversimplified hypotheses lead to misinterpretation of laboratory results. The power is strongly affected by the statistical modeling assumptions: it is shown that positive shifts in the individual average health condition are difficult to detect, while it is much easier to identify negative shifts.

      PubDate: 2017-10-17T23:25:58Z
       
  • Calibration of a chemometric model by using a mathematical process model
           instead of offline measurements in case of a H. polymorpha cultivation
    • Abstract: Publication date: Available online 13 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): O. Paquet-Durand, T. Ladner, J. Büchs, B. Hitzmann
      Data driven regression models such as Principle Component Regression (PCR) or Partial Least Square Regression (PLS) in combination with spectroscopic methods are increasingly applied in bioprocess monitoring. However, as the name “data driven” implies, the calibration of these regression models requires a large amount of predictor (X) and response (Y) data. The predictor data in this case mostly consists of the spectroscopic data, which are easy to generate in large quantities, but the response data typically involves offline measurements in the laboratory that require much more effort to perform in large numbers. It will be shown that in case of a H. polymorpha cultivation performed in microtiter plates, those tedious offline measurements for response data can be replaced by a mathematical process model. Here, an exponential growth model in an ideal stirred tank reactor with lag-time is applied, which has three parameters (lag time, specific growth rate, and yield coefficient). Furthermore, it will be demonstrated that knowledge about the parameter values of this process model is not required, as these values can be determined from 2D fluorescence spectra alone. The only required information about the cultivation is the predictor data, 2D-fluorescence spectra in this case, and the initial state of at least three different cultivation runs, that is the initial values of biomass and substrate (glycerol) concentration. The smallest prediction error for biomass and glycerol obtained by the new calibration procedure are 0.19 g/L and 0.79 g/L respectively, and 0.19 g/L and 1.12 g/L, if a classical procedure using off-line measurements is applied. The inherently calculated process parameters of lag time, specific growth rate and yield coefficient are 4.77 h, 0.154 h−1, and 0.457 g/g, which are similar to values which are determined with offline measurements and least square fit 4.48 h, 0.139 h−1, 0.466 g/g respectively.

      PubDate: 2017-10-17T23:25:58Z
       
  • Evaluation and assessment of homogeneity in images. Part 1: Unique
           homogeneity percentage for binary images
    • Abstract: Publication date: Available online 8 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Leandro de Moura França, José Manuel Amigo, Carlos Cairós, Manel Bautista, Maria Fernanda Pimentel
      Texture features analysis is one of the most important approaches for the assessment of homogeneity on images. However, all of them are either relative to the comparison with a standardized set of images, or further multivariate models are strongly required to predict or classify the images according to their features. In this first work, we propose an alternative and novel methodology to calculate a percentage of homogeneity by only using the self-information contained on the image. This methodology is based on the macropixel analysis theory and the generation of what is called the “homogeneity curve”. The homogeneity curve is deeply explored and the knowledge to what it could be considered the most homogeneous and inhomogeneous distribution for every case is spanned. This first work postulates the theory and demonstrates its usefulness with several examples applied to binary images. This will provide a theoretical framework to fully understand the homogeneity curve, postulating a mathematical model to parametrize homogeneity and its plausible deviations.
      Graphical abstract image

      PubDate: 2017-10-11T03:12:18Z
       
  • Optimized self-adaptive model for assessment of soil organic matter using
           Fourier transform mid-infrared photoacoustic spectroscopy
    • Abstract: Publication date: Available online 7 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Fei Ma, Changwen Du, Jianmin Zhou, Yazhen Shen
      Advanced technologies, such as infrared spectroscopy, have been applied to develop rapid, cheap but accurate methods for the analysis of soil matter organic (SOM). However, the unsatisfied prediction accuracy resulted from heavy soil heterogeneity limits the practical application. In our previous work, soil identification based self-adaptive partial least squares model (SAM), which was built using identification algorithm and the partial least square regression (PLSR), makes it possible for a wide use. However, soil identification in the SAM needs further optimized. In this study, we designed an advanced optimal self-adaptive partial least squares model (OPT-SAM), a more general model to predict SOM. 597 soil samples from China with large variances were collected, and the soil spectra were recorded using Fourier transform mid-infrared photoacoustic spectroscopy (FTIR-PAS). Five typical algorithms (Correlation coefficients (CC), Euclidean distance (ED), Mahalanobis distance (MD), Angle cosine (AC), and k-medoids (KM)) were considered for the identification in the SAM model. The results demonstrated that the performances of CC-SAM, ED-SAM, MD-SAM, AC-SAM were significantly improved in comparison with no identification based SAM (NI-SAM), but KI-SAM showed a poor prediction. ED-SAM (R2 = 0.8890, RMSEP = 7.00 g kg−1, RPD = 2.96) indicated the highest accuracy and robustness in all algorithms, which was an optimal model for soil identification and prediction, and CC-SAM (R2 = 0.8572, RMSEP = 7.89 g kg−1, RPD = 2.44) was an alternative choice, especially for prediction with different soil types.

      PubDate: 2017-10-11T03:12:18Z
       
  • Fast pure ion chromatograms extraction method for LC-MS
    • Abstract: Publication date: Available online 7 October 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Rong Wang, Hongcao Ji, Pan Ma, Huitao Zeng, Yamei Xu, Zhi-Min Zhang, Hong-Mei Lu
      Liquid chromatography coupled with mass spectrometry (LC-MS) has shown great potential in analysis complex samples. However, informative feature extraction is still challenge since the electrospray ionization in LC-MS tends to produce ninety percent or more ions not originated from compounds of interest. The concept of pure ion chromatogram (PIC) is effective to extract informative ions, but tradition PIC methods are time-consuming because of their theories and programming languages. In this study, we present a novel method, called Fast Pure Ions Chromatograms (FPIC), for extracting PICs from raw LC-MS dataset effectively and quickly. This method can search ion of PIC from its maximum bi-directionally and adaptively, which can improve the stability and reduce the computation time drastically. A further speedup has been achieved by exploiting modern software engineering techniques. FPIC was validated by analyzing four LC-MS datasets: MM14 and MM48, simulated MM48 and quantification (MTBLS234) datasets. Results show that FPIC outperformed traditional methods in the recall, precision and F-score, and it has good reliability of quantification. Furthermore, the method is very fast with few adjustable parameters, which leads to an approximately 125-fold speedup over PITracer and 18-fold speedup over XCMS. An open source implementation of the FPIC method is available at https://github.com/zmzhang/pymass.

      PubDate: 2017-10-11T03:12:18Z
       
  • Review on data-driven modeling and monitoring for plant-wide industrial
           processes
    • Abstract: Publication date: Available online 29 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Zhiqiang Ge
      Data-driven modeling and applications in plant-wide processes have recently caught much attention in both academy and industry. This paper provides a systematic review on data-driven modeling and monitoring for plant-wide processes. First, methodologies of commonly used data processing and modeling procedures for the plant-wide process are presented. Detailed research statuses on various aspects for plant-wide process monitoring are reviewed since 2000. After that, extensions, opportunities, and challenges on data-driven modeling for plant-wide process monitoring are discussed and highlighted for future research.

      PubDate: 2017-10-04T06:42:34Z
       
  • A new measure of regression model accuracy that considers applicability
           domains
    • Abstract: Publication date: Available online 29 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Hiromasa Kaneko
      The coefficient of determination and the root-mean-squared error (RMSE) evaluate regression models for test samples without considering the applicability domains (ADs) of the models. In this study, we propose a new measure for evaluating the predictive performance of regression models that considers their ADs. The purpose is not selecting the best regression model among various competing models, but determining an appropriate model group corresponding to the AD of each model. The proposed measure is the area under coverage and RMSE curve for coverage less than p% (p%-AUCR). It is confirmed that some regression models have global predictive ability and others have local predictive ability, and p%-AUCR is an appropriate indicator for selecting between local and global regression models depending on the coverage and considering the AD. Selecting a regression model for each sample or each chemical structure using p%-AUCR can improve the prediction accuracy of data sets.

      PubDate: 2017-10-04T06:42:34Z
       
  • Penalized logistic regression for classification and feature selection
           with its application to detection of two official species of Ganoderma
    • Abstract: Publication date: Available online 28 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Ying Zhu, Tuck Lee Tan, Wai Kwong Cheang
      Two species of Ganoderma, Ganoderma lucidum (G. lucidum) and Ganoderma sinense (G. sinense) have been widely used as traditional Chinese herbal medicine for their high medicinal value. Recent studies show that the two species differ in levels of their main active compounds triterpenoids though both have antitumoral effects. An effective and simple analytical method using attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy to discriminate between the two species is of essential importance for its quality assurance and medicinal value estimation. In this study three penalized logistic regression models, weighted least absolute shrinkage and selection operator (Lasso), elastic net and weighted fusion, using ATR-FTIR spectroscopy have been explored for the purpose of classification and interpretation. The weighted fusion model incorporating spectral correlation structure allowed an automatic selection of a small number of spectral bands and achieved an excellent overall classification accuracy of 99% in discriminating spectra of G. lucidum from that of G. sinense. Its classification performance was superior to that of the weighted Lasso model and elastic net model. The automatic selection of informative spectral features results in substantial reduction in model complexity and improvement of classification accuracy, and it is particularly helpful for the quantitative interpretations of the major chemical constituents of Ganoderma regarding its anti-cancer effects.

      PubDate: 2017-10-04T06:42:34Z
       
  • Maximum likelihood unfolded principal component regression with residual
           bilinearization (MLU-PCR/RBL) for second-order multivariate calibration
    • Abstract: Publication date: Available online 28 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Jez Willian Batista Braga, Franco Allegrini, Alejandro C. Olivieri
      A maximum likelihood model is described for performing second-order multivariate calibration with unfolded principal component regression with residual bilinearization (MLU-PCR/RBL). It differs from the conventional RBL models based on U-PCR or U-PLS (unfolded partial least-squares) in the incorporation of the measurement error information into both the U-PCR calibration and the RBL model phases. The error information is represented by the instrumental error covariance matrix. Simulations were made by adding correlated and proportional noise to synthetic systems consisting of one analyte in the presence of a calibrated and unexpected interferent, under different conditions of overlapping profiles, noise levels and noise types (correlated and proportional). The results show that MLU-PCR/RBL outperforms conventional RBL methods in prediction ability, as confirmed by a detailed study on validation samples through the average prediction error as a convenient figure of merit. Results obtained in experimental data set based on flow injection analysis and UV detection for determination of acetylsalicylic and ascorbic acids in pharmaceutical products also support the theoretical conclusions.

      PubDate: 2017-10-04T06:42:34Z
       
  • Concurrent probabilistic PLS regression model and its applications in
           process monitoring
    • Abstract: Publication date: Available online 28 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Qinghua Li, Feng Pan, Zhonggai Zhao
      The probabilistic PLS (PPLS) algorithm derives the latent variables by maximizing the likelihood of input scores and quality scores, but imposes no constraint on the input residuals and the quality residuals, which implies that residuals may contain large information. Motivated by the concurrent PLS method, this paper proposes a concurrent PPLS (CPPLS) method to perform further decomposition of these residuals, and then two more subspaces are obtained. In this method, the maximum-likelihood method along with the expectation-maximization (EM) algorithm are employed to develop the model, in which the variance of each variable explained by latent variables is introduced to determine the number of latent variables. Based on the CPPLS model, five monitoring statistics all based on Mahalanobis norm are constructed for the evaluation of five subspaces decomposed by CPPLS, respectively.

      PubDate: 2017-10-04T06:42:34Z
       
  • In honor of Professor Yizeng Liang (May 1950–Oct. 2016): A scientist
           of passion and innovation
    • Abstract: Publication date: Available online 24 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Hong-Dong Li


      PubDate: 2017-09-26T05:56:54Z
       
  • Method for the comparison of complex matrix assisted laser desorption
           ionization-time of flight mass spectra. Stability of therapeutical
           monoclonal antibodies
    • Abstract: Publication date: Available online 22 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Raquel Pérez-Robles, Natalia Navas, Santiago Medina-Rodríguez, Luis Cuadros-Rodríguez
      This paper describes a straightforward and easy-to-implement method for comparing mass spectra using multivariate chemometric techniques in order to detect differences and obtain representative similarity metrics. For this purpose, we have programmed own MATLAB function called Protiago. Protiago successfully transforms complex vectors (i.e. intensities and m/z values from mass spectra) with different lengths in binary vectors with the same number of elements. In addition, Protiago is able to read and properly process a set of spectra vectors in order to carry out a proper similarity analysis using four similarity metrics (i.e. the coefficient of determination, the cosine of the angle, the Bray-Curtis index and the nearness index). The latter is a new similarity index proposed by the authors and applied for the first time in this study. It calculates a standardized Euclidean distance between two vectors in the space in order to obtain a numerical value, ranged between 0 and 1, of the proximity of both vectors. To supplement the similarity analysis information, two multivariate exploratory methods were applied, i.e. principal component analysis (PCA) and multivariate analysis of variance (MANOVA). As an example of the proposed method, the comparison of peptide mass fingerprints obtained using MALDI-TOF mass spectrometry from two therapeutical monoclonal antibodies, infliximab (INF) and rituximab (RTX), was carried out. By using this method it was possible detect changes in the primary structure of the two proteins in order to study their chemical stability for 7 days under two storage conditions (refrigerated at 4 °C, and frozen at −20 °C).
      Graphical abstract image

      PubDate: 2017-09-26T05:56:54Z
       
  • A sparse partial least squares algorithm based on sure independence
           screening method
    • Abstract: Publication date: Available online 21 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Xiangnan Xu, Kian-Kai Cheng, Lingli Deng, Jiyang Dong
      Partial least squares (PLS) regression is a dimension reduction method used in many areas of scientific discoveries. However, it has been shown that the consistency property of the PLS algorithm does not extend to cases with very large number of variables p and small number of samples n (i.e., p > > n ). To overcome the issue, sparsity can be imposed to the dimension reduction step of the PLS algorithm. This leads to a sparse version of PLS (SPLS) algorithm which can achieve dimension reduction and variable selection simultaneously. Here, we present a new SPLS method called sure-independence-screening based sparse partial least squares (SIS-SPLS) algorithm, by incorporating both SIS method and extended Bayesian information criterion (BIC) into the PLS algorithm. The developed SIS-SPLS method was evaluated using a number of numerical studies including simulation and real datasets. The current results showed that the proposed SIS-SPLS method is efficient in variable selection. It offered low mean squared prediction errors with high sensitivity and specificity. The SIS-SPLS algorithm proposed in the current work may serve as an alternative SPLS method for the analysis of modern biological data.

      PubDate: 2017-09-26T05:56:54Z
       
  • A novel nucleic acid sequence encoding strategy for high-performance
           aptamer identification and the aid of sequence design and optimization
    • Abstract: Publication date: Available online 20 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Qin Yang, Sui-Ping Wang, Xin-Liang Yu, Xiao-Hai Yang, Qiu-Ping Guo, Li-Juan Tang, Jian-Hui Jiang, Ru-Qin Yu
      Aptamers have exhibited a great potential for research, clinical and industrial purposes. A critical step to realize these applications is to gain high-affinity aptamers specific to interested targets. To facilitate the selection of aptamers generated in systematic evolution of ligands by exponential enrichment (SELEX) process, we propose a novel nucleic acid sequence encoding strategy of Apta-LoopEnc for secondary structural feature extraction of candidate sequences by analyzing their delicate substructures in loop regions. Since the unique loop structures of aptamers determine their interaction with targets, encoding their central loop structures directly enables featuring aptamer binding affinity related properties. Additionally, the nucleotide composition of a sequence is also used as descriptors in Apta-LoopEnc to further decrease the description similarity between sequences. The feasibility of Apta-LoopEnc for sequence encoding has been demonstrated by the study of high-affinity aptamer identification against human hepatocellular carcinoma cells. The results indicate the developed Apta-LoopEnc is able to significantly improve the performance of different pattern recognition models. Using the Apta-LoopEnc based support vector machine (SVM) to predict a set of newly designed candidate sequences beyond SELEX has further demonstrated the potential of the developed sequence encoding and prediction strategy in aid of high-performance aptamer design and optimization in an easy, time-saving and cost-effective way via computation, thus, promoting the development of aptamer-related studies and applications.
      Graphical abstract image

      PubDate: 2017-09-26T05:56:54Z
       
  • Batch process monitoring based on self-adaptive subspace support vector
           data description
    • Abstract: Publication date: Available online 18 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Lv Zhaomin, Yan Xuefeng, Jiang Qingchao
      Inherent time-varying dynamics, which is a general characteristic of batch processing, causes two problems in data-driven batch process monitoring methods: (1) changes in data trajectory and (2) changes in correlation between variables along time. These problems can be solved by employing monitoring methods based on moving time window technology. However, correlation behaviors between variables in dynamic batch processing are complex. As a consequence, traditional monitoring methods may fail to detect faults. Complex correlation behaviors of batch processing can be learned by placing variables with similar variation information in the same subspace and faults may be detected. In this study, a self-adaptive subspace support vector data description (SASSVDD) is proposed. Two-time unfolding three-dimensional data technology and moving time window technology are used to obtain modeling data. An online subspace is then constructed by using sensitive variables, which may highly yield variation information, and non-sensitive variables, which likely contain variation information and exhibit a higher correlation with sensitive variables. Support vector data description is applied as the subspace monitoring method. The availability of SASSVDD is verified through the fed-batch penicillin fermentation.

      PubDate: 2017-09-20T05:10:27Z
       
  • Comparing unfolded and two-dimensional discriminant analysis and support
           vector machines for classification of EEM data
    • Abstract: Publication date: Available online 8 September 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Camilo L.M. Morais, Kássio M.G. Lima
      Three-way data has been increasingly used in chemical applications. However, few algorithms are capable of properly classifying this type of data maintaining its original dimensions. Unfolding procedures are commonly employed to reduce the data dimension and enable its classification using first order algorithms. In this paper, modified versions of two-dimensional principal component analysis with linear discriminant analysis (2D-PCA-LDA), quadratic discriminant analysis (2D-PCA-QDA), and support vector machines (2D-PCA-SVM) have been proposed to classify three-way chemical data. Applications were performed for two-category classification using fluorescence excitation emission matrix (EEM) of simulated and three real data sets, in which the performance of the proposed algorithms were compared with regular PCA-LDA, PCA-QDA and PCA-SVM using unfolding proceedings. The results show that 2D algorithms had equal or superior classification performance in the four data sets analyzed, thus indicating its ability to classify this type of data.
      Graphical abstract image

      PubDate: 2017-09-14T04:54:50Z
       
  • A similarity elastic window based approach to process dynamic time delay
           analysis
    • Abstract: Publication date: Available online 26 August 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Bo Yang, Hongguang Li
      Due to a large number of correlated process variables involved in industrial processes, dynamic characteristics of time delays between correlated process variables are generally major concerns. Traditionally, the time delay is approximately estimated by static sliding time windows, which could not better deal with the dynamics of time delays. In response to this problem, this paper proposes a dynamic time delay analysis (e-DTA, dynamic time delay analysis by elastic windows) method based on similarity elastic windows, which is aiming at effectively estimating the transfer time delay between process variables. According to contrast similarities between correlated variables, the size of the elastic window is self-tuned and the dynamic delay time can be estimated offline. Subsequently, through an additional correlation analysis for time series of the time delay estimated from historical data, main variables influencing the time delay can be obtained. By providing relevant trend variables, an improved fuzzy interpolation prediction method is suggested to estimate the transfer time delay between process correlated variables online. In addition, an e-DTA dynamic directed time graph is created by combining dynamic transfer time delays of mutually dependent variables. Finally, performances of the e-DTA method are tested through a numerical study and a distillation column simulation.

      PubDate: 2017-09-02T08:34:10Z
       
  • Slow feature analysis based on online feature reordering and feature
           selection for dynamic chemical process monitoring
    • Abstract: Publication date: Available online 1 August 2017
      Source:Chemometrics and Intelligent Laboratory Systems
      Author(s): Jian Huang, Okan K. Ersoy, Xuefeng Yan
      This study considers the insufficiency of traditional monitoring methods to eliminate dynamics, and proposes a novel online feature reordering- and feature selection-based slow feature analysis (SFA) algorithm. The SFA algorithm explores the process dynamics from the view of inner variation of data to extract the slowly varying features. The extracted SFs are considered as the representations of steady- and dynamic-state processes. Online feature reordering and feature selection strategies maximize online fault information and can be used to perform fault detection operation. The proposed method is applied to two simulated processes. Monitoring results show that the proposed method has better monitoring results than those of traditional methods.

      PubDate: 2017-08-03T06:45:39Z
       
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
 
Home (Search)
Subjects A-Z
Publishers A-Z
Customise
APIs
Your IP address: 54.162.241.40
 
About JournalTOCs
API
Help
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016