Institute for Integrative Genome Biology

Bioinformatics Core


The following state-of-the-art computational resources and expertise are available through the Bioinformatics Core Facility:


For programming and workshops, the bioinformatics laboratory has 8 triple-bootable Windows/Apple/Linux workstations and one molecular modeling workstation from Silicon Graphics. A 16 CPU production server system with 3 TB storage space manages large databases, data storage and multi-user activity. Network intensive web services, such as public databases, are managed by a separate 8 CPU web/database server with an attached 12 TB storage area network (SAN). All hardware components on this high-availability SAN system are duplicated in order to avoid downtime during hardware failures. A 64 CPU Linux cluster for large-scale parallel computing is fully integrated into this hardware infrastructure. The data from all workstations and servers are automatically replicated every night onto a 12 TB backup server system that is located in a geographically separated server room. A strong focus on remote access systems maximizes the availability of all hardware and software resources for many simultaneous users from any networked location on or on campus. Online information is provided on this hardware infrastructure, including its usage.

Software & Network Architecture

The Bioinformatics Core Facility is strongly committed to maintaining a comprehensive open-source and open-access software infrastructure for Linux-and Unix-based operating systems. This approach offers access to the widest spectrum of software tools with the most advanced algorithms, and it maximizes the freedom to operate in a highly diverse and multidisciplinary academic research environment. The Debian Linux distribution is used on all workstations and servers to synchronize and automate software and OS updates on all machines of the facility. In addition, a centralized file system maximizes the efficiency and security of data availability and access. Currently, the facility maintains over 300 open-source bioinformatics software packages for sequence/genome analysis, data mining, molecular modeling, cheminformatics, evolutionary biology, ecology, statistical analysis, etc. In areas that depend on industrial software applications, the facility also owns various commercial licenses. The most important commercial tools are:

  1. GCG package for traditional sequence analyses
  2. Insight II for protein modeling and ligand docking
  3. Catalyst software for pharmacophore modeling and pharmacophore-based database searching;
  4. Cerius2 package for large-scale small molecule mining and QSAR analyses.

Online information is provided on this software infrastructure, including its usage, on the Bioinformatics web portal. As much as possible, the facility provides access to various web-based data analysis tools (e.g. EMBOSS, local BLAST server). Some of these tools were developed by the staff or students of the facility (e.g. online miRNA prediction or compound clustering). Available online tools can also be accessed through this site.

Development of Research Databases

To serve the campus community as efficiently as possible and to provide public access to large-scale research data, bioinformatics personnel have developed a wide spectrum of research databases.  The most important database projects of the facility are:

  1. ChemMine is a compound mining database to facilitate drug/agrochemical discovery and chemical genomics screens (Girke et al., 2005).
  2. The Compound Screening and Bioactivity Database is a versatile publication and management system for diverse compound bioactivity and screening data from chemical genomics screens that allows external and internal users to upload their screening data via a web browser (manuscript in preparation).
  3. The Genome Cluster Database (GCD) is a research tool for genome-wide sequence family mining in Arabidopsis and Rice (Horan et al., 2005).
  4. The Plant Gene Expression Database (PED) contains pre-analyzed public GeneChip expression data to identify differentially expressed and co-regulated genes using modern statistical and clustering techniques (Horan et al., 2007).
  5. The Bioassay and Phenotype Database (BAP DB) is an information resource for exploring the biological and molecular functions of genes based on available phenotype and screening data from mutant, transgenic and wildtype organisms. An online upload utility allows internal and external users to upload their own assay and phenotype data to BAP DB (manuscript in preparation).
  6. The Plant Unknown-eome Database (POND) is an online service that facilitates the functional characterization of genes of unknown function in Arabidopsis and Rice (Horan et al., 2007).
  7. Cell Wall Navigator (CWN) is an integrated database and mining tool for protein families involved in plant cell wall metabolism (Girke et al., 2004).

Consulting Services

The personnel of the facility provide consulting services to assist scientists in solving their day-to-day bioinformatics-related questions and problems. These services include basic advice on various data analysis strategies, coding of data analysis scripts and writing of bioinformatic sections for publications or grant applications.

More Information

General Campus Information

University of California, Riverside
900 University Ave.
Riverside, CA 92521
Tel: (951) 827-1012

Career OpportunitiesUCR Libraries
Campus StatusDirections to UCR

Genomics Information

Institute of Integrative Genome Biology
2150 Batchelor Hall

Tel: (951) 827-7177
E-mail: Aurelia Espinoza, Managing Director