PartiGeneDB may currently be searched in one of four main modes (click on the selections to the left). Firstly the user may select individual organism datasets and view the list of clusters ordered by their relative abundance. This is especially useful as an overview of the most commonly expressed genes associated with the cDNA libraries used to generate the sequences for a particular organism. Secondly, the user may search for particular patterns of annotation associated with a single or user specified group of species. This is useful for identifying potential groups of paralogous genes (e.g. provide the user with all the genes from chordates which have the term 'spectrin' associated with their annotation). Third, the user may search the partial genomes for sequences which bear sequence similarity to a gene of interest. Again this may be useful for identifying putative paralogs to a previously uncharacterised gene. Finally, we have incorporated SimiTri into PartiGeneDB. SimiTri is a tool which allows the simultaneous display of relative sequence similarity relationships between a single organism dataset and three user selected datasets (which may consist of one or more phylogenetically related organisms - see below for more details). We are continuing to develop ways of mining the database to make the data more accessible to the user community.
Viewing entire organism datasets
Selecting this mode takes you to a page where you select an individual organism and can view its associated clusters organised by relative abundance or by cluster ID. On the "Datasets Search" page, click the 'select organism' button. A new window will pop up and you will be able to navigate the phylogenetic trees (by clicking on the +'s and -'s) to select an individual organism (tick its associated button). Once you've selected the organism, click on the 'OK' button and the organism will appear in the 'Selected Organism' text box. Now choose to view the list by abundance or by cluster ID and click the 'Search' button. You will now be presented with a list of clusters, their abundance (in terms of sequences) and some brief annotation gleaned from a BLASTX search of the sequence against the protein non-redundant database. Click on the cluster ID to see more details.
Searching by Annotation
With this search page, a user may select one or more species and attempt to identify clusters which have been annotated (via BLASTX homology to a known protein) with a specified term. First click on the "Select Organisms" box. A new page will pop up which provides a hierarchical view of the organisms available in PartiGeneDB. Navigate this window using the '+' and '-' icons and select either a single organism or groups of organisms by checking the boxes next to the relevant nodes in the tree. Click the 'OK' button and the organism(s) you selected will appear in the 'Selected Organisms' box. Now enter the text you wish to search for in the 'Enter your search against BLAST annotation box'. You may restrict your search on the basis of BLAST expectation values - simply enter the two digit exponent of the expect score that you wish to filter for in the 'Minimum E-Score' field.
Searching by Sequence Similarity
This option allows you to enter your own sequence of interest and perform a BLAST search against our datasets. This page and options are very similar to other standard BLAST pages. First choose the database of interest, you may choose from any of the available boxes at most levels (e.g. Eukaryotes / Alveolates / Tetrahymena). Next paste your sequence into the box, choose the type of BLAST to perform (BLASTN & TBLASTX for nucleotide sequences, TBLASTN for protein sequences), choose from the available options (Cutoff / Gapped/Non-Gapped / Filtered / Type of Output / Descriptions and Alignments). Click 'Search' and wait a few seconds while your search is performed. You will be presented with a typical BLAST result page, from here, you may click on the Cluster names to retrieve more information on the clusters with sequence matching yours.
SimiTri is a tool previously developed as part of NEMBASE to display sequence similarity relationships for a large number of clusters on one graphic (see examples below). We have made the tool available here to provide users with an additional tool to mine the data.
How it works
For each cluster, a TBLASTX was performed against each of the other partial genomes. BLAST scores in excess of 50 (our 'significance cutoff') were extracted in addition to the highest e-value and imported into PartiGeneDB. SimiTri takes these values and uses them to draw a graphic in the form of a three-node graph. To launch the SimiTri application, simply select the SimiTri search option. You will first need to select a dataset of interest - again by using the taxonomic browser pop-up window. After selecting a single organism dataset, you will need to select three datasets to compare against. Simply click the 'Select Comparators' button, and as before navigate the tree using the + and - buttons. Select three datasets by checking three boxes next to nodes of interest (NB it is possible to select from individual species to groups of related species - for example you might wish to examine similarity relationships of a fungal species to Apicomplexa, Arthropods and Nematodes. After selecting the datasets, click the 'search' button and wait while the results are generated (depending on the search this may take upto 30 seconds). You will be presented with a Java window of the SimiTri graphic (see Info section for further details) followed by a list of clusters which had similarity to none or just one of the datasets. You can navigate the Java window by sliding the bottom bar to zoom in, click and hold the left mouse button and move the mouse to move around the SimiTri window, click on the left mouse button while over a coloured tile will select a cluster, if the control key is held down at the same time, the cluster page will launch. Holding down the mouse button while moving the mouse will navigate around the window. In addition you may move the slider bars on the E-Value cutoffs to alter which coloured tiles are visible. NB Not all organism datasets are availble for SimiTri comparisons, to date the profiles of only 201 organisms have been generated (these are shown in the taxonomic selection tool).