Institute for Integrative Genome Biology


Rakesh KaundalRakesh Kaundal

Director, Bioinformatics Facility

Mailing Address:

Botany and Plant Sciences
Genomics Building / 1207G
University of California
Riverside, CA 92521

Phone: (951) 888-9835
Fax: (951) 827-5155 
Email: rkaundal@ucr.edu

Bioinformatics Core Website


PhD Plant Breeding & Genetics  2006  Dr. B.R. Ambedkar University, India
PGD Bioinformatics  2004  Sikkim Manipal University, India
MS Plant Breeding & Genetics  1998  CSK Himachal Pradesh Agricultural University, India

College/Division Affiliation:

College of Natural and Agricultural Sciences

Center/Inst Affiliation(s):

Center for Plant Cell Biology

Areas Of Expertise:

Next-Generation Sequencing Data Analysis; Algorithms for Protein-Protein Interactions, Intra- and Inter-Species (Host-Pathogen Interactions); Protein Function Prediction; Computational Modeling Using Supervised (Machine Learning) and Unsupervised (Bayesian-based) Learning Approaches; Data Mining/Bioinformatics Software / Tools

Awards / Honors:

2013  President’s Cup, Creative Interdisciplinarity in Bioinformatics and Computational Biology, 1st Prize, Oklahoma State University
2012  President’s Cup for Creative Interdisciplinarity on the Project "EDNA: Powerful New Technology for Electronic Diagnostic Nucleic acid Analysis", 3rd Prize, Oklahoma State University
2002  Best Poster Award, International Conference on Challenges and Options for Sustainable Development of the Himalayas-Beyond 2002

Research Summary:

In addition to my role as the director of IIGB bioinformatics facility, I am actively engaged in research aimed at computationally mining the diverse and large multi-dimensional -omics datasets by integrating cutting-edge informatics technologies, e.g., applying statistical pattern recognition, artificial intelligence, and supervised / unsupervised learning approaches to develop novel computational tools and algorithms, and apply the gained knowledge towards organismal improvement.

The beginning of the 21st century has seen an increased interest in the approaches to data analysis in scientific computing as essentially every field is seeing an exponential increase in the size of the data deluge. In biological sciences, with the ever-increasing efficiencies of 'omics' technologies (genomics, transcriptomics, proteomics, metabolomics), including Next Generation Sequencing (NGS), we are now poised to make huge advancements. One of the challenges is to develop high-end computational infrastructure and knowledge-based products, and contribute towards building a viable and sustainable bioeconomy. In this context, my interest is to develop bioinformatics applications particularly focusing on multi-omics data to discover systems level insights into traits of economic, environmental and nutritional value.

Some of the major scientific areas of interest are: Systems-based understanding of complex genetic traits (e.g. modeling of gene regulatory networks, visualization); Predicting intra- and inter-species protein interaction networks (host-pathogen interactions); Protein function prediction (subcellular localization, predicting pathways related to lignin degradation/synthesis, classification of other protein functions); Metagenomics (e.g. rhizosphere microbiome interacting with host); and Next-generation sequencing data analysis (develop packages for assembly, alignment, annotation, etc.).

Some Current /Past Projects:
  • Systems Approaches to Understanding Plant-Microbe Interactions

Not only humans, every year pathogenic organisms cause billions of dollars worth of damage to crops and livestock. Studies regarding the role of effector proteins in plants are still in its infancy. One way to study the role of effectors in disease development is to identify the plant target proteins that effectors interact with. To date, there is no automated system to predict genome-wide plant-pathogen interactions, and mechanisms to visualize these networks in a user-friendly way.

Under an ongoing project on predicting Protein Interaction Networks (PINs) in the model plant host-pathogen system, Arabidopsis-Pseudomonas syringae, we are using unsupervised learning (Bayesian-based) and supervised learning (machine learning) techniques to develop novel algorithms and predict genome-scale PINs, including visualization of these networks in a Cytoscape environment. By integrating diverse data types or properties such as Plant-Associated Microbe Gene Ontology (PAMGO) annotations, protein domain interactions such as intra-species protein-domain profiles, topology or proximity in intra-species Protein-Protein Interaction (PPI) networks, protein sequence similarity (interologs and orthologs), correlated gene expression, phylogenetic relationship, and the available experimental evidence into a computational framework, our results to predict host-pathogen interactions show more than 95% prediction accuracy. The models have been further implemented as a web-based resource, AP-iNET, freely accessible at http://apinet.bioinfo.ucr.edu/. The users could analyse their 'query' host/pathogen sequence data to predict the interactions using this tool. The positive interaction pairs could be visualized in AP-iNET implemented as a Cytoscape plug-in.

Further, we are experimentally validating some novel interaction pairs in vivo using Yeast two-hybrid and BiFC (Bimolecular Fluorescence Complementation) techniques, in collaboartion with the Noble Foundation (http://www.noble.org/). Our interest is to apply this knowledge to develop such computational models in agriculturally relevant crop systems, and help the plant science community in guiding cost effective experimental strategies to detect host-pathogen PPIs and drive research on how pathogens infect host cells.

  • Novel Algorithms and Tools for Subcellular Localization Prediction

Determining subcellular localization is important for understanding the protein function and is a critical step in any genome annotation. In the past, the trend has been to develop 'general' prediction tools (e.g., TargetP, LOCtree, PA-SUB, MultiLoc, WoLF PSORT, Plant-Ploc etc.) applicable to all organisms. In my earlier studies on individual proteomes (Arabidopsis, Rice), I found that there are unique genome-specific signals for subcellular localization, and thus, organism-specific prediction tools are better than the general ones. My innovative techniques have integrated empirical biological knowledge with machine learning methods for intelligent automated decision making on localizations. Two online tools were developed; one for Arabidopsis (called AtSubP, http://bioinfo3.noble.org/AtSubP/), and the other for Rice (RSLpred, http://www.imtech.res.in/raghava/rslpred/).

These subcellular predictors are being actively used by various researchers, e.g. one of the tools, AtSubP has been integrated into the TAIR database (http://www.arabidopsis.org/), a comprehensive resource for Arabidopsis thaliana, to provide genome-scale subcellular annotations. My finding that species-specific predictors are better than the generalist predictors has been confirmed by various other groups through experiments (New Phytologist 2013, 200(4): 1022-33). Further, these predictions have been tested experimentally in lab through Green Fluorescent Protein (GFP) fusions. About 25 'previously unknown' proteins have been randomly picked from the AtSubP predictions and their localizations confirmed in planta (unpublished).

My interest is to develop such novel algorithms and tools in other important species of interest, including develop classifiers for dual- and multi-targeted proteins within a cell.

  • Bioinformatics for Bioenergy

This is another area where we are employing integrated and innovative computational approaches to generate new and recombined metabolism in organisms that may lead to useful products such as biofuels. To aid in the discovery of novel biomass degrading enzymes, we have recently developed a comprehensive prediction system for the identification and classification of organism-wide biomass degrading enzymes. A state-of-the-art Artificial Intelligence (AI) technique, Support Vector Machines (SVM) was used to train a known set of ligninase classes (~27 enzyme categories) to develop computational models and predict novel lignin degrading enzymes in (meta) genomes. Our results indicate a high degree of prediction performance; an overall accuracy of 98% with a Matthews Correlation Coefficient (MCC) of 0.84. A web-based prediction tool has also been developed; available at (http://pred.bioinfo.ucr.edu/ligpred/).

In our similar study on one of the most important lignin-related enzymes, laccases, we recently have developed a two-phase classification system to characterize various laccase subtypes using unsupervised and supervised learning approaches. Laccases (E.C. are multi-copper oxidases that have gained importance in many industries such as biofuels, pulp production, textile dye bleaching, bioremediation, and food production. Our online tool, LacSubPred (http://lacsubpred.bioinfo.ucr.edu/) has been specifically designed to characterize novel laccase subtypes from their physicochemical properties.

We are interested in applying other computational approaches, e.g. Self-Organized Maps (SOM) and k-means clustering algorithms to refine the models so that they are applicable to full length as well as the metagenomics sequences; classification of enzymes based on the pathways they are involved in; and seek collaborations for validation of these models in lab and field conditions, etc.

  • Metagenomics for Crop Improvement

The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. One such area of interest is relating microbial communities to plant productivity, and/or disease assocations. However, characterization of microbiome contributing to the plant phenotypic traits is challenging due to, difficulty in working in the soil environment, and the numerical and functional complexity of the microbial community. Then most microorganisms cannot be isolated easily.

In one of our collaborative projects, we are developing bioinformatics approaches to understand wheat productivity associations with its microbiome. Using statistical approaches, our collaborators identified ~800 Operational Taxonomic Units (OTUs) that are positively and negatively associated with the wheat productivity. From bioinformatics perspective, our interest is to develop computational algorithms using machine learning techniques to identify patterns of +vely and -vely associated OTUs, and then develop a web-based database and prediction software to further identify/classify OTUs from an 'unknown' data. The overall goal is to computationally optimize the productivity potential of a given agricultural soil-system based on the microbial community structure and soil characteristics.

  1. AP-iNET (http://apinet.bioinfo.ucr.edu/): a bioinformatics system for predicting and visualizing genome-wide Protein Interaction Networks (PINs) in the Arabidopsis-Pseudomonas syringae model interaction system.
  2. AtSubP (http://bioinfo3.noble.org/AtSubP/): a highly accurate Arabidopsis Subcellular Localization predictor.
  3. DoBlast (http://bioinfo.okstate.edu:8080/doblast/): a parallelized BLAST server for genome-scale annotations; large-scale sequence data analysis could be finished in minutes using automated parallel computing.
  4. LacSubPred (http://lacsubpred.bioinfo.ucr.edu/): a two-phase classification system to characterize various laccase subtypes using unsupervised and supervised learning approaches, a useful resource to the biofuel community.
  5. LigPred (http://pred.bioinfo.ucr.edu/ligpred/): a comprehensive prediction system for the identification and classification of enzymes related to the synthesis and degradation of lignin.
  6. PLpred (http://pred.bioinfo.ucr.edu/PLpred/): this online tool first identifies a query protein to be a plastid or non-plastid one and then, classifies the identified plastid proteins further into four categories viz. Chloroplast, Chromoplast, Amyloplast or Etioplast proteins.
  7. RSLpred (www.imtech.res.in/raghava/rslpred/): a highly accurate Rice Subcellular Localization predictor.
  8. RB-Pred (www.imtech.res.in/raghava/rbpred/): a first of its kind worldwide, this server forecasts rice leaf blast severity based on the weather parameters for general use to plant pathologists and farming community.
  9. Project (http://www.imtech.res.in/raghava/rslpred/project.html): Given a protein sequence / accession number, this tool searches for high hydrophobicity window in the query sequence when a suitable pattern is made to search by the user (e.g. AL???LW pattern). The high hydrophobicity window is defined with the Kyte-Doolittle score schema based on the user-customizable search pattern, user-customizable window size and score threshold value.

Related Press Releases:

Selected Publications:

List of publications from PubMed

Lab Personnel: 

Hayes, Jordan
Systems Administrator — High-performance computing, Linux cluster / storage administration
Katiyar, Neerja
Bioinformatics Analyst — Computational biology, next-generation sequencing, bioinformatics tools and database development
Pham, Viet
Graduate Student —  The RNPome Project: Identification and characterization of RNA binding proteins that sequester mRNAs during low oxygen stress

More Information

General Campus Information

University of California, Riverside
900 University Ave.
Riverside, CA 92521
Tel: (951) 827-1012

Career OpportunitiesUCR Libraries
Campus StatusDirections to UCR

Genomics Information

Institute of Integrative Genome Biology
2150 Batchelor Hall

Tel: (951) 827-7177
E-mail: Aurelia Espinoza, Managing Director