NCG collects 2,000 cancer genes from different sources:
  • Cancer gene census (CGC)
  • High-throughput mutational screenings (HTMS) of selected genes
  • Whole-exome sequencing (WES) of cancer samples
  • Whole-genome sequencing (WGS) of cancer samples
For each cancer gene, NCG provides information on:
  • duplicability
  • orthology
  • evolutionary appearance
  • protein-protein interactions
  • miRNA-gene interactions
  • functional properties
  • gene expression

Search

From the homepage the user may retrieve the information from the database in several ways:


Gene Search

The user may give a single gene identifier or a list of gene identifiers, to be chosen among four possibilities:

  1. Gene symbol: to query for list of genes, use * (e.g. MDM* will display 3 genes: MDM2, MDM4, MDM1);
  2. Entrez identifier (e.g. 5728);
  3. RefSeq protein identifier (e.g. NP_000305);
  4. Ensembl protein identifier (e.g. ENSP00000418960);
  5. All cancer genes within a genomic region (it is possible to either select a chromosome or the genomic coordinates, human genome hg19).

Screenings

The user may choose to retrieve the list of genes for any one of the studies in the following sources:

  1. Known Cancer Genes;
  2. High-throughput mutational screenings (HTMS) of selected genes;
  3. Whole-exome sequencing (WES) of cancer samples;
  4. Whole-genome sequencing (WGS) of cancer samples.

Advanced Search

The advanced search allows the user to analyze lists of cancer genes with similar properties based on user-defined filters.

The filters are based on the following properties:

  1. Cancer Genes: Based on the different sources: CGC, HTMS, WES and WGS;
  2. Cancer Types: Defined by the type of cancer (organ / tissue) and the dominance of the genes. Default selection is all cancer types;
  3. Function: Based on the various processes the cancer genes are involved in. This filter is not applied by default;
  4. Appearance in Evolution: Based on the origin of the genes. This filter is not applied by default;
  5. Duplicability: Duplicated or singleton genes. This filter is not applied by default;
  6. PIN: Hubs or non-hubs; hubs are defined as the top 25% most connected nodes in the human protein interaction network: in our case, hubs correspond to genes with degree higher than 15. This filter is not applied by default.

Possible False Positives

The list of 60 possible false positives have been collected based on:

  1. Functional irrelevance (i.e. olfactory receptor genes);
  2. Gene length (i.e. long exons and/or introns);
  3. Literature evidence (as reported in Lawrence, Nature 2013);

Candidates with No Statistics

These genes were identified as possible cancer genes although no statistical method was applied in the original study.


Search for miRNA-Cancer Gene Interactions

NCG 4.0 allows the user to analyze miRNAs that have cancer genes as targets or cancer genes that are target of miRNAs.

In the section Browse List, the user may also retrieve the lists of cancer genes that are targets or hosts of miRNAs.


OncomiR Search

NCG 4.0 also collects miRNAs that have been identified as OncomiRs, i.e. miRNAs that do play a role in cancer. The user may search the OncomiR by the miRNA identifier or may browse the list of OncomiRs


Results for cancer genes

The results page contains eight sections for each gene:


Gene Description

This section includes the general information about the queried gene: symbol, description and links to external databases, such as Entrez, COSMIC, OMIM, RefSeq, Ensembl.


Duplicability

Duplicability is defined as in Rambaldi D et al. (2008): it is measured by aligning the corresponding protein sequences directly to the human genome, using the BLAST-like Aligment Tool (BLAT). We define as duplicates all additional genomic matches covering at least 60% of the query length. Singletons are all those genes, which do not have any additional hit above 60% of the query length.

The button Duplicability opens a new page, which describes all the duplicated loci related to the studied gene.


Orthology

The appearance of a gene is defined as the deepest taxonomic branch of the tree of life where an ortholog can be detected. In order to retrieve orthology relationships eggNOG 3.0 is used.

Seven branches of the tree of life are defined:

  1. Last Universal Common Ancestor (LUCA)
  2. Eukaryotes
  3. Opisthokonts
  4. Metazoans
  5. Vertebrates
  6. Mammals
  7. Primates

The button Orthology opens a new page, which describes all the orthology relationships of the gene of interest in detail.


Network Properties

The number of proteins interacting with the gene of interest is enlisted.

The button Network Properties opens a new page, which describes all the network properties of the gene of interest in detail.

The network properties are derived from several databases of Protein Interaction Networks:

Dataset Version Nodes Interactions Publications
BioGRID 3.2.96 (Jan 1st 2013) 12920 97491 18877
IntAct 159 (Dec 14th 2012) 9905 42398 3261
MINT Oct 26th 2012 6446 17455 3459
DIP Oct 10th 2010 4656 12373 2239
HPRD 9 (Apr 13th 2010) 8897 37026 18851
Total 16241 164008 33497

miRNA Information

The number of miRNAs regulating the gene is reported.

The button miRNA Information opens a new page which shows the graphical representation of all the miRNA regulating the queried gene, along with the other genes regulated by the same miRNAs.

In case the gene hosts miRNA, the name of the miRNA contained within the gene is reported.


Protein Function

The functional classes of the genes are enlisted.

The button Protein Function opens a new page, which describes gene functions along with GO ids and terms.


Gene Expression

The number of tissues where the gene is expressed is shown.

The button Gene Expression opens a new page, which shows the barplots of the expression for the gene.

Cancer studies

Four types of cancer experiments are defined:

  • Cancer Gene Census (CGC);
  • High-throughput mutational screenings (HTMS) of selected samples;
  • Whole-exome sequencing (WES) of cancer samples;
  • Whole-genome sequencing (WGS) of cancer samples.

Amplified genes are classified based on the type of supporting data as reported in Santarius, Nature 2010:

  • Class I collects amplified genes whose therapeutic targeting improves patient clinical outcome in multiple Phase II or Phase III trials.
  • Class II collects amplified genes with 3 or more points (see Table below), which indicates substantial evidence of involvement in cancer.
  • Class III collects amplified genes with 1 or 2 points (see Table below), which indicates significant evidence of involvement in cancer.

If a gene was found mutated in one of these studies, a table appears to show all the experiments where the mutation was found:

Duplicability

Duplicability is defined as in Rambaldi D et al. (2008): it is measured by aligning the corresponding protein sequences directly to the human genome, using the BLAST-like Aligment Tool (BLAT). We define as duplicates all additional genomic matches covering at least 60% of the query length. Singletons are all those genes, which do not have any additional hit above 60% of the query length.

Three types of Hit are defined, depending on the genomic location of the duplicated locus:

  • Best Hit, which corresponds to the original gene locus;
  • Other Gene Hits, which include other gene loci where the gene of interest is duplicated;
  • Genomic, which include loci with no known genes mapped (no genes are defined by the UCSC Genome Browser, but mRNAs or ESTs may be present).

The default cutoff to display genomic hits is 60% of the original length, but the user is allowed to choose different cutoffs from the widget next to the table. The range of choice varies from 10% of the query length to 100%.

Orthology

The orthology relationships are derived from eggNOG 3.0.


Tree Of Life

The Tree of Life provides a visualization of the origin and the orthologs of the gene of interest. The origin of the gene is represented by red color and the presence of orthologs in yellow. The nodes that do not have any orthologs of the genes of interest, are depicted in white.

Clicking on the node of interest displays a short description of the node in the Orthology Information section above the legend.

The Orthology Information describes the number of orthologous genes found and the number of species containing the orthologs.

The user can look at the detailed information about the orthologs by clicking on the link in the Othology Information section or can scroll down to get all the results for all the nodes.


Orthology Table

The table describes all the species and the corresponding orthologs. In case the node has further branching with orthologus genes, the species from the lower nodes are also shown. For example in the table below mammals have two branches, Primates and Rodents. The orthologs from these nodes are also reported in the table.

Protein-Protein Interactions

The network is displayed using Cytoscape Web v1.0.3


Network visualization

The first-level network for the gene of interest (which is in the center of the image) is displayed: The primary interactions (i.e. the interaction between the gene of interest and the other genes) are colored in green, while the secondary interactions (i.e. the interactions among the inteactors of the gene of interest) are colored in pink.

The thickness of the lines representing the interactions is based on the number of experiments which support the interaction: the thinner lines represent single experiments, while the thicker ones represent interactions found in more than one experiment.

Singleton genes are colored in red, while Duplicated genes are colored in black.

The shape of the nodes denotes the category of the gene:

  1. Triangle represents known cancer genes,
  2. Diamond represents candidate cancer genes,
  3. Circle represents genes that are not associated with cancer.

The color of the node defines the origin of the gene:

  1. Cyan for young genes originating in Metazoans, Vertebrates, Mammals or Primates,
  2. Blue for old genes originating in Last Universal Common Ancestor, Eukaryotes or Opisthokonts.

In the example above, PRF1 is a known cancer gene present (denoted by a triangle) interacting with a candidate cancer gene, CALR (denoted by diamond). It is a recent gene (represented in cyan), which appeared with Metazoans, and has 5 primary interactions (displayed in green), supported by one PMID (thin edges). This gene is singleton (name displayed in black).

Clicking on the edges describes the relationship between the genes.


Clicking on the nodes provides the information of the gene.


By default, the node and edge information section displays information for the gene of interest.


Network table

The table lists the various properties of the genes with which the gene of interests interacts. The properties mentioned are:

  1. Cancer Gene: Information on whether the gene is present in any cancer study. If yes, then based on the study the gene is found, i.e. known or candidate.
  2. Duplicated: States if the gene is has duplicability in the genome.
  3. Appearance In Evolution: Lists the origin of the gene as appeared in the evolution.
  4. Protein interaction network properties: The network properties is described using the three measures.
  5. PubMed ID(s) Supporting the Interactions: Evidence of the interaction as reported in the study identified by the PubMed IDs.

miRNA-Gene interactions

The network of miRNA-target interaction is composed of either cancer genes that are regulated by miRNAs or OncomiRs and their target genes. The network displays only interactions that are supported by experimental validations. The miRNA data are derived from TarBase v.5.0 (Papadopoulos et al., 2009), miRecords v.4.0 (Xiao F et al., 2009) and miRTarBase v.4.4 (Hsu SD et al., 2011).


Network visualization

Two types of nodes compose the network: miRNAs (octagons) and target genes. OncomiRs are visualized as orange octagons. The properties of the genes are described in the protein network interaction page. The thickness of the edge lines is based on the number of publications that support the interaction: the thinner lines represent single publication, while the thicker ones represent interactions supported by more than one publication.


Clicking on the miRNA or the gene node provides the information of the miRNA or the gene.

By default the node and edge information section displays the information for the gene or the miRNA of interest.


Network table

The Table includes all miRNAs and target genes visualized in the network. For each interaction we specified the mature form of the miRNA that regulate the target gene. Each row describes whether the target gene belongs to the list of cancer genes, the appearance in evolution and the information on duplicability for the gene. The Pubmed IDs column furnishes the link to the publication that has determined the interaction, while the last column describes the experimental support for the interaction. Three categories of experimental support were identified: microarray, mass spectrometry and single gene experiment (i.e. non-high throughput experiments).

Protein Function

The table lists all functional classes as defined in D'Antonio M and Ciccarelli FD (2011) for the gene of interest.

We decided to focus on the Biological process branch of Gene Ontology (GO) levels 5 and 6, in order to have a good compromise between specificity of the terms and the number of genes.

These GO terms are divided into 12 categories:

  • Development,
  • Regulation of transcription,
  • Signal transduction,
  • Cell motility and interactions,
  • Multicellular activities,
  • Immune system response,
  • Cell response to stimuli,
  • Cell cycle,
  • Cellular processes,
  • Regulation of intracellular processes and metabolism,
  • Cellular metabolism,
  • DNA/RNA metabolism and transcription.

 

Gene expression

Expression levels for 109 human tissues are derived from two microarray experiments (Su et al. (2004) and Ge et al. (2005)): For each gene i, expression levels are normalized over the median level in each tissue, following the formula:

Expnorm,i = (Expi - Expmedian) / (Expi + Expmedian)

Therefore genes that are not expressed in a tissue have an expression value of value -1.

The results are plotted separately for each experiment:

Expression Ge Expression Su

The scatterplot shows the correspondence of the expression values between the two experiments, for all the tissues where the gene is expressed. The linear regression line is plotted. The scatterplot is shown if the gene is expressed in at least one tissue that was investigated in both experiments.

Expression scatterplot

Results for OncomiRs

The results page contains four sections for each OncomiR:

The list of 64 OncomiRs in NCG 4.0 was derived from:

Reference OncomiRs Pubmed ID
Spizzo, Cell 2009 57 19410551
Kent, Oncogene 2006 26 17028598
Lujambio, Nature 2012 29 22337054
Esquela-Kerscher, Nat Rev Cancer 2006 25 22337054
Manikandan, Bioinformation 2008 31 18685719
Yang, Cell 2013 1 23410973

miRNA Description

This section includes the general information about the queried miRNA: symbol, description, information about cancer and links to miRBase and to UCSC Genome Browser. For intragenic OncomiRs we also reported the information on the host gene.


Duplicability

Duplicability is defined for miRNAs that share the same seed sequence with at least one other human miRNA. For each mature miRNA we identify the seed as the nucleotides at positions 2-9 starting from the 5 end of the miRNA.

The button of this section opens a new page, which shows the sequences of the mature forms of the OncomiR and, if presents, the sequences of the miRNA duplications.


Orthology

The appearance of a miRNA is defined as the deepest taxonomic branch of the tree of life where an ortholog can be detected. In order to retrieve orthology relationships we used the family classification of miRNA hairpin sequences from miRBase (Release 17).

Eight branches of the tree of life are defined:

  1. Metazoans
  2. Bilaterians
  3. Deuterostomes
  4. Chordates
  5. Vertebrates
  6. Mammals
  7. Primates
  8. Hominids

The button Orthologs opens a new page, which describes all the orthology relationships of the miRNA of interest in detail.


miRNA-Target Interactions

The number of genes regulated by the OncomiR is reported.

The button Targets opens a new page which shows the graphical representation of all the genes regulated by the queried OncomiR.

Duplicability of OncomiRs

Duplicability is defined for miRNAs that share the same seed sequence with at least one other human miRNA. For each mature miRNA we identified the seed as the nucleotides at positions 2-9 starting from the 5 end of the miRNA.

For duplicated miRNAs the sequence of the mature forms of the miRNA and their related duplication are reported:

For singleton miRNAs the sequences of the mature forms of the miRNA of interest are reported:

Orthology of OncomiRs

The orthology relationships are derived from the family classification of miRNA hairpin sequences from miRBase (Release 17).


Tree Of Life

The Tree of Life provides a visualization of the origin for the miRNA of interest. The origin of the miRNA is represented by red color and the presence of orthologs in yellow. The nodes that do not have any orthologs of the miRNA of interest, are depicted in white.

Clicking on the node of interest displays the number of orthologous miRNAs found and the number of species containing the orthologs.

The user can look at the detailed information about the orthologs by scrolling down to get all the results for all the nodes.


Orthology Table

The table lists the species with the orthologs of the miRNA of interest.