Division of Safety Information on Drug, Food and Chemicals, National Institute of Health Sciences, Tokyo, 158-8501, Japan
Received date: October 31, 2015 Accepted date: November 17, 2015 Published date: November 19, 2015
Citation: Tanabe S. Overview of Gene Regulation in Stem Cell Network to Identify Therapeutic Targets Utilizing Genome Databases. Insights Stem Cells. 2015, 1:1.
Title: Overview of gene regulation in stem cell network to identify therapeutic targets utilizing genome databases.
Background: Recent major progress in bioinformatics has enabled the collection and accumulation of so-called big data in medical fields and cell biology. It is important to analyze and interpret these abundant data for appropriate application in therapeutics and the treatment of diseases.
Methods and Findings: Several databases have been introduced worldwide, and the utility of these databases is discussed with literatures. The databases are useful for analyzing genome mutations, gene expression, epigenetic regulation, gene ontology, stem cell phenotype alteration, risk prediction, species differences, and so on. Gene network analysis using these databases may identify targets for therapeutics and treatment relating stem cells.
Conclusions: Molecular network regulation is critical for understanding disease and treatment mechanisms.
Bioinformatics, Database, Gene regulation, Stem cell
AHR: Aryl Hydrocarbon Receptor; BRCA1: Breast Cancer 1 Early Onset; BRCA2: Breast Cancer 2 Early Onset; BSP: Bone Sialoprotein; CCLE: Cancer Cell Line Encyclopedia; CDH1: Cadherin 1 Type 1; CDH2: Cadherin 2 Type 1 N-Cadherin (Neuronal); CEBP: CCAAT-Enhancer-Binding Protein; CGC: Cancer Gene Census; Chibe: Chisio Biopax Editor; COSMIC: Catalogue Of Somatic Mutations In Cancer; DAVID: Database For Annotation; Visualization And Integrated Discovery; Dbgap: Database Of Genotypes And Phenotypes; Dbhimo: Database For Histone- Modifying Enzymes; DDBJ: DNA Data Bank Of Japan; DGV: Database Of Genomic Variants; DLX5: Distal-Less Homeobox5; EGA: European Genome-Phenome Archive; EMT: Epithelial-Mesenchymal Transition; ERG: Epigenetic Regulator Genes; ETS: E26 Transformation-Specific; FGFR2: Fibroblast Growth Factor Receptor 2; GABP: GA-Binding Protein; Gemdbj: Genome Medicine Database Of Japan; GEO: Gene Expression Omnibus; Gtex: Genotype-Tissue Expression; GWAS: Genome-Wide Association Study; HME: Histone-Modifying Enzymes; Hopx: Homeodomain-Only Protein Gene; HSPC: Hematopoietic Stem/Progenitor Cell; IBD: Inflammatory Bowel Disease; ICGC: International Cancer Genome Consortium; ILC3: Group 3 Innate Lymphoid Cell; JSNP: Japanese Single Nucleotide Polymorphisms; KEGG: Kyoto Encyclopedia Of Genes And Genomes; Lncrna: Long Noncoding RNA; MDCK: Madin-Darby Canine Kidney; MHC: Major Histocompatibility Complex; MHCII: MHC Class II; MSC: Mesenchymal Stem Cell; OB-BMST: Osteoblastic Bone Metastasis-Associated Stroma Transcriptome; PGC: Primordial Germ Cell; 15- PGDH: 15-Hydroxyprostaglandin Dehydrogenase; PP2A: Protein Serine/Threonine Phosphatase Type 2A; PPI: Protein-Protein Interaction; PSCA: Prostate Stem Cell Antigen; SNP: Single Nucleotide Polymorphism; TCGA: The Cancer Genome Atlas; TERT: Telomerase Reverse Transcriptase; Th17: T Helper 17; Uniprotkb: Uniprot Knowledgebase
Recent advances in bioinformatics have enabled to predict cancer risk by gene alterations. In breast and ovarian cancer, risk is estimated based on the breast cancer 1, early onset (BRCA1) and breast cancer 2, early onset ( BRCA2) mutations . Increasing amounts of data have recently revealed many candidate genes to be examined for application in target therapies. Variations in the human transcriptome have been revealed using RNA sequence data generated by the Genotype-Tissue Expression (GTEx) project . Comprehensive analyses have shown interindividual differences in gene expression and splicing variability . The existence of tissue-specific transcriptional regulation has also been demonstrated using data from the GTEx project . The differences in gene alterations between individuals highlight the importance of prognostic biomarkers for treatment sensitivity in diseases. A novel method based on OncoFinder pathway activation strength revealed that the JNK pathway (insulin signaling) and the mitochondrial apoptosis pathway are significantly correlated with the response to cetuximab, a monoclonal antibody against epidermal growth factor receptor (EGFR) used in colorectal cancer patients with wild-type K-ras .
Genetic variation in prostate stem cell antigen (PSCA) has been found to be associated with susceptibility to diffuse-type gastric cancer . Single nucleotide polymorphisms (SNPs) in PSCA have been compared in diffuse-type gastric cancer cases and control subjects, which has revealed statistically significant SNPs . Furthermore, SNPs in PSCA have been found to have a greater effect in diffuse-type than intestinal-type gastric cancer . The allele and genotype frequencies of the rs2976392 SNP are different between the two types of the gastric cancer . The levels of polymorphism may be different among various cell types.
The gene expression in diffuse-type gastric cancer and mesenchymal stem cells (MSCs) have been profiled, which has revealed that combination of cadherin 1, type 1 ( CDH1) and cadherin 2, type 1, N-cadherin (neuronal) (CDH2) distinguishes the cancer cell phenotypes and provided insights of gene regulation in epithelial-mesenchymal transition (EMT) [5-8]. In this review, genome and gene regulation analyses performed with several databases, which are briefly introduced in the text, are overviewed mainly focused in stem cell and cancer networks.
Databases for Gene and Molecular Regulation
Recently, it has become possible to analyze gene regulation using abundant databases (Table 1). One such useful tool is the GTEx database . In this database, a comprehensive overview and assessment of gene regulation in human tissues are provided based on genome-wide association studies (GWASs) . According to the project, all data releases are also available to the public through the Database of Genotypes and Phenotypes (dbGaP), which is a repository that is charged to archive, curate and distribute information related to genotype-phenotype interactions . Epigenetics, including histone-modifying enzymes (HMEs), can be analyzed using the database for histonemodifying enzymes (dbHiMo) . According to dbHiMo, HMEs are identified using a hidden Markov model-based pipeline, which will be useful tools for revealing epigenetics/epigenomics .
|Name||Full Name or Source||Reference||URL|
|dbGaP||Database of Genotypes and Phenotypes||10||http://www.ncbi.nlm.nih.gov/gap|
|dbHiMo||Database for Histone-modifying Enzymes||11||http://hme.riceblast.snu.ac.kr/|
|KEGG||Kyoto Encyclopedia of Genes and Genomes||14-16||www.genome.jp/kegg|
|DDBJ||DNA Data Bank of Japan||17||www.ddbj.nig.ac.jp|
|JSNP||Japanese Single Nucleotide Polymorphisms||18||http://snp.ims.u-tokyo.ac.jp/|
|ERGO||Integrated Genomics; Inc.||19;20||www.igenbio.com/ergo|
|DGV||Database of Genomic Variants||21-23||http://projects.tcag.ca/variation|
|COSMIC||Catalogue of Somatic Mutations in Cancer||24;25||http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/|
|GeMDBJ||Genome Medicine Database of Japan||26||http://gemdbj.nibio.go.jp/|
|ICGC||International Cancer Genome Consortium||27||http://dcc.icgc.org|
|TCGA||The Cancer Genome Atlas||29;30||http://cancergenome.nih.gov/dataportal|
|EGA||European Genome-phenome Archive||31||https://www.ebi.ac.uk/ega/|
|GEO||Gene Expression Omnibus||32;33||http://www.ncbi.nlm.nih.gov/geo|
|DAVID||Database for Annotation; Visualization and Integrated Discovery||34||http://david.abcc.ncifcrf.gov|
|cBioPortal||cBioPortal for Cancer Genomics||35||http://cbioportal.org|
|CCLE||Cancer Cell Line Encyclopedia||38||http://www.broadinstitute.org/ccle/home|
Table 1: Databases for analyzing gene regulation.
De novo transcriptome assembly databases for the central nervous system have been developed, and functional analyses as well as analyses of evolutionary relationships can be performed using BLASTX . Trinity, for de-novo assembly, and Illumina HiSeq2000, for RNA-Seq, were used to conduct analyses that generated an annotation set of 22,604 transcripts that were aligned against the Swiss-Prot database using BLASTX . Genomic signatures undergo evolutionary transitions during environmental alterations, which may be a target for analyses using these databases . The promoter regions of 5865 singlecopy orthologs among 10 species were analyzed to calculate a motif score for 188 Drosophila melanogaster transcription factors with at least one ortholog in each of the 10 bees, and the motif score was found to correlate with social complexity using phylogenetically independent contrasts . The results of this research showed that the complexity of the gene network has been involved in the evolution of eusociality, indicating that changes in gene regulation were critical for the evolutionary transition in biological organization .
The Kyoto Encyclopedia of Genes and Genomes (KEGG), which aims to compile knowledge from information, is a useful tool for analyzing the regulation of genes and genomes [14,15]. KEGG pathway maps are mainly focused on the biological interpretation of genome sequences and other high-throughput data [14,15]. A functional analysis and visualization tool for omics data referred to as FuncTree has been developed using the KEGG database . The role of FuncTree is mapping omics data onto the Functional Tree map, which is a circular dendrogram showing the relationship of biological functions in the KEGG database .
The DNA Data Bank of Japan (DDBJ) databases include analytical services for biological information . The DDBJ sequence databases release datasets including genome, genome survey sequence (GSS), transcriptome shotgun assembly (TSA), highthroughput cDNA sequence (HTC), expressed sequence tag (EST), and transcriptome data for various species . The Japanese Single Nucleotide Polymorphisms (JSNP) database includes data on 150,000 SNPs from the Japanese population . The ERGO genome analysis and discovery suite contains biological data from genomics, biochemistry, and high-throughput expression profiling analyses . Metabolic networks and antimicrobial drug targets for Category A-designated bioterrorism agents, including Bacillus anthracis (anthrax), Francisella tularensis (tularemia) and Yersinia pestis (bubonic plague), have been analyzed with ERGO and KEGG . Human genomic variation has been identified and collated into the Genome Variation Database, which is currently referred to as the Database of Genomic Variants (DGV) [21,22]. DGV includes information on copy number variations, SNPs, and genomic variations [21,22]. Using DGV, par-3 family cell polarity regulator ( PARD3) was found to be microdeleted in squamous carcinomas and glioblastoma .
For analyzing somatic mutations in human cancer, the Catalogue Of Somatic Mutations In Cancer (COSMIC) is the comprehensive database . Tumor antigens can be predicted in silico using COSMIC and the Cancer Gene Census (CGC) . A database of missense mutation-derived peptides was assembled from the COSMIC database, followed by the identification of candidate peptides that can be related to tumor rejection antigens . Information on mutant genes was collected from CGC, and major histocompatibility complex (MHC) class I binding peptides that are mutated in cancer were analyzed, with their binding being predicted . Sharing of GWAS data related to cancer has been made possible via the Genome Medicine Database of Japan (GeMDBJ) . GeMDBJ contains genome-wide SNP typing data related to Alzheimer’s disease, gastric cancer, type 2 diabetes, hypertension, and asthma . The International Cancer Genome Consortium (ICGC) is a database for characterizing genomic abnormalities in different cancer types . The ICGC Data Portal includes ICGC, the Cancer Genome Atlas (TCGA), the Johns Hopkins University data, and the Tumor Sequencing Project [27,28]. TCGA is based on comprehensive sequencing of all protein-coding genes and transcripts in tumors . The involvement of epigenetic regulator genes (ERGs) in human cancer was analyzed using mutation, copy number and expression data from 5943 tumors across 13 TCGA cancer types, which revealed that multiple ERGs, including enhancer of zeste 2 polycomb repressive complex 2 subunit (EZH2), are co-regulated in the cell cycle network . The European Genome-phenome Archive (EGA) is a database for sharing the biomolecular, genetic and phenotypic data collected from human subjects . The policy of distributed access-granting distinguishes EGA from dbGaP . Authorization decisions for dbGaP’s datasets are made by the US National Institutes of Health (NIH), whereas the data submitted to EGA are required to be consistent with the original consent agreements, national laws and applicable regulations .
Gene expression data collected worldwide are publicly available in the Gene Expression Omnibus (GEO) database [32,33]. GEO provides genomic data generated using microarrays and nextgeneration sequencing, which enables us to share and analyze the data to explore disease treatment targets [32,33]. The Database for Annotation, Visualization and Integrated Discovery (DAVID) includes a Gene Functional Classification Tool with a novel agglomeration algorithm to categorize genes in function . The DAVID Gene Functional Classification Tool has shifted functional annotation analysis from being term- or gene-centric to biological module-centric .
Integrative analysis of cancer genomics and clinical profiles can be performed with the cBioPortal for Cancer Genomics database . The cBioPortal for Cancer Genomics contains genomic data on somatic mutations, DNA copy-number alterations, mRNA and microRNA expression, DNA methylation, protein abundance, and phosphoprotein abundance in cancer, and networks can be analyzed in association with genes .
The UniProt consortium has completed the first draft of the complete human proteome in the UniProt Knowledgebase (UniProtKB) Swiss-Prot [36,37]. UniProtKB Swiss-Prot contains manually annotated data on protein functions, enzymespecific information, biologically relevant domains and sites, post-translational modifications, subcellular locations, tissue specificity, developmental-specific expression, structures, interactions, splicing isoforms and associated diseases [36,37]. The Broad Institute provides a collection of data from various species for bioinformatic analysis. One of the collaborative studies of the Broad Institute is the Cancer Cell Line Encyclopedia (CCLE) project, which compiles gene expression, chromosomal copy number and parallel sequencing data from human cancer cell lines . The comprehensive database called NextBio is also useful to analyze the therapeutic target of the disease such as cancer .
Gene expression profiles in hepatocellular carcinoma have been analyzed through gene ontology analysis, KEGG and Biocarta pathway enrichment analysis, followed by a protein-protein interaction (PPI) network analysis with Cytoscape software . Genes that are differentially regulated in alcoholic hepatitis were analyzed using Cytoscape in a PPI network . A network analysis was performed based on genes that are regulated in bladder transitional cell carcinoma, which led to the identification of the central nodes in the network and the selection of potential cancer markers .
The database cBioPortal for Cancer Genomics was used to analyze protein serine/threonine phosphatase type 2A (PP2A) regulation, which revealed that the PP2A complex is deregulated in 59.6% of basal breast tumors . A tool for analyzing human biological pathways referred to as Chisio BioPAX Editor (ChiBE) has been developed using profiling data from the cBioPortal for Cancer Genomics and expression data from GEO . ChiBE links DAVID for further annotation and gene-set-related analysis . A candidate prognostic gene signature in advanced prostate cancer was analyzed with cBioPortal, which interrogates the Memorial Sloan Kettering Cancer Center Prostate Oncogenome Project dataset for changes in the expression of the genes enriched under the KEGG term “Cell Cycle” in clinical prostate cancer . A biomarker panel with 7 genes has been identified in high-risk prostate cancer as a prognostic gene signature .
Gene Regulation in the Human Body
Many studies utilize the databases and bioinformatics approach to identify gene regulation in the human body. Intestinal stem cells exhibit variation in their ground states, which have been analyzed using genome bioinformatics techniques, such as microarray analysis, gene set enrichment analysis and exome capture sequencing analysis . Differentiated intestinal stem cells can be marked as goblet (Muc2+), endocrine (chromogranin A+), and Paneth cells and based on polarized villin expression, whereas intestinal stem cells can be distinguished using SRY (sex determining region Y)-box 9 ( SOX9), olfactomedin 4 ( OLFM4), prominin 1 (PROM1, also known as CD133) and leucin-rich repeat-containing G protein-coupled receptor 5 ( LGR5) from tracheobronchial stem cells .
In normal human skin, the mutations in 74 cancer genes were analyzed through ultradeep sequencing . NOTCH1, whose receptors are key regulators of stem cell biology and targets of inactivating mutations in epithelial cancers, has been found to be mutated in approximately 20% of normal human skin and to carry driver mutations in 60% of cutaneous squamous cell carcinomas . The main differences between normal cells and cancer cells were found to be the number of driver mutations per cell . The methylation patterns of runt-related transcription factor 2 ( RUNX2), Sp7 transcription factor ( SP7) [also known as osterix ( OSX)], distal-less homeobox 5 ( DLX5) and integrinbinding sialoprotein ( IBSP) [also known as bone sialoprotein (BSP)] differ during the osteoblastic differentiation of MSCs . The epigenetic regulation of genes may affect the direction of cell differentiation. The differentiation direction in MSCs might be regulated with gene alterations and paracrine effect of the cells [49,50]. Based on GTEx data, protein-truncating variants were analyzed to reveal clinical interpretations of the genome . It has been revealed that tissue-specific protein truncations exist, although further data collection and analysis will be needed to predict the molecular consequences of these variants .
It has been revealed that commensal bacteria-specific CD4+ T cells in the intestine are regulated by group 3 innate lymphoid cells (ILC3s) . ILC3 expression of MHC class II (MHCII) is controlled by transcriptional pathways . It has been found that MHCII expression in colonic ILC3s is reduced in pediatric inflammatory bowel disease (IBD), which highlights the significance of analyzing the mechanism of MHCII regulation in ILC3s . The selection of commensal bacteria-specific CD4+ T cells in the intestine may be critical for the IBD mechanism .
The aryl hydrocarbon receptor (AHR), a type I nuclear receptor and cytosolic transcription factor, regulates T helper 17 (Th17)/T regulatory (Treg) pathways . Th17 cell differentiation is regulated by AHR ligands, which gives rise to the possibility of targeting autoimmune and chronic inflammatory disease correlated with environmental toxins . The pluripotency of primordial germ cells (PGCs) is regulated by Blimp-1 and Akt . Blimp-1 has been found to suppress the downstream targets of pluripotency network genes such as Myc, Klf4, Nanog, Oct3/4 and Sox2 and lead to suppression of the pluripotency of PGCs . In MSCs, the differentiation status can be modulated through epigenetic regulation . Reverted cells from osteogenic-differentiated cells exhibit increased expression of Nanog, Oct4 and Sox2 . It has been reported that mechanical forces regulating the development, organization, and function of multicellular tissues induce E-cadherin (or Cdh1)-dependent Yap1, a Hippo pathway transcription factor, and β-catenin activation to drive cell cycle entry . When mechanical strain is applied to Madin-Darby canine kidney (MDCK) cells, transient Yap1 activation and subsequent cell cycle reentry occur, and nuclear localization and transcriptional activation of β-catenin are induced, leading to progression into S phase . The Yap1 nuclear exclusion and β-catenin activation induced by mechanical strain require interactions with the extracellular domain of E-cadherin, suggesting a role of the cell adhesive molecule E-cadherin as a mechanical signal transducer .
In cellular programming and reprogramming, long noncoding RNAs (lncRNAs) play important roles . In adult tissue stem cells, lncRNAs regulate differentiation and self-renewal, together with other molecules that are dependent on cellular fates . In the case of the regulation in the brain, the differentiation of embryonic stem cells into neuronal tissues is regulated by TUNA/megamind, a lncRNA required for the maintenance of pluripotency, and PTBP1, Nucleolin (NCL) and hnRNP-K . The regulators inducing pluripotency are different in each dedifferentiated cell type . Reprogramming processes are controlled by Oct4, Sox2, Klf4 and c-Myc in fibroblasts, whereas mps1, plk1 and cdc2 dedifferentiate zebrafish cardiomyocytes into a pluripotent state . Immune B cells transdifferentiate into macrophages via CCAAT-enhancerbinding protein-α (CEBPα) and CEBPβ regulation, and fibroblasts transdifferentiate into cardiomyocytes via regulation of GATA binding protein 4 (GATA4), T-box 5 (Tbx5) and myocyte enhancer factor 2C (Mef2C) [58-60]. EMT is regulated by Snail, and the transcription of Snail is suppressed by Oct4 in the reprogramming process . In terms of tissue generation, the prostaglandin PGE2 supports stem cell expansion, which is supposed to be suppressed by 15-hydroxyprostaglandin dehydrogenase (15- PGDH) . Inhibiting 15-PGDH with the molecule designated SW033291 has been reported to promote tissue repair .
Prediction of Risk from Gene Alterations
The gene alterations such as the regulation of Snail gene expression affect the cellular phenotype: Snail causes EMT and tumor progression . The risk of diseases such as cancer may be predicted based on gene alterations. An important point is that gene regulation occurs in molecular networks, which reminds us to consider combinations of genes. Gene alterations are analyzed through network analysis in prostate cancer-induced osteoblastic bone metastasis . Using interspecies difference in bioinformatics, osteoblastic bone metastasis-associated stroma transcriptome (OB-BMST) was generated through comparison with mouse transcripts and human transcripts in implanted cancer and the stroma, which is an elegant method for analyzing stroma genes . According to the generated OB-BMST, pleiotrophin (PTN), Eph receptor 3 (EPHA3) and fascin actinbundling protein 1 (FSCN1) were extracted as components of the bone-specific response to prostate cancer-induced osteoblastic bone metastasis . Interspecies differences are interesting and important topics for investigation. A pluripotent state (i.e., the ability to differentiate into any cell type) can allow interspecies chimeric-competent cells to be obtained from region-selective pluripotent stem cells . Interspecies cell type transitions may be a future direction to be explored.
Insights Regarding Genomic Data and Diseases
Genomic alterations such as BRCA1 and BRCA2 mutations occur in breast cancer patients and have been revealed as a risk factor for breast cancer . Even in wild-type BRCA1/2 breast cancer patients in the Sardinian population, it has been revealed that TOX high mobility group box family member 3 (TOX3) and fibroblast growth factor receptor 2 ( FGFR2) play roles as susceptibility genes, which includes SNPs in FGFR2 .
It has been revealed that optogenetic stimulation interacts with skeletal muscle via the light-sensitive channel Channelrhodopsin-2 . The force generated from optogenetic-stimulated skeletal muscle leads to cell-type-specific activation, which may be a target of therapy for laryngeal paralysis or other locomotive syndromes .
The genetic landscape and genetic determinants of the hematopoietic stem/progenitor cell (HSPC) frequency were analyzed in a GWAS . It was revealed that a homeodomainonly protein gene (Hopx) locus is associated with the frequency of HSPCs, such as Lineage[Lin]-Sca-1+ c-Kit+ (LSK) HSPCs, LSKCD150- CD48- multipotent progenitors (MPPs), and LSKCD150+CD48- cells, which are the most primitive long-term HSCs in mice bone marrow mononuclear cells . In terms of the LSKCD150-CD48- population, Hopx+/+ cells are dominant to Hopx-/- cells, which means that the percentage of LSKCD150-CD48- cells is low in the mononuclear cells of Hopx-/- mice . This finding indicates that Hopx plays some role in determining the HSPC frequency . The multimeric GA-binding protein (GABP) transcription factor has been found to selectively bind and activate the mutant telomerase reverse transcriptase ( TERT) encoding the catalytic subunit of telomerase in cancer . GABP was identified as the critical E26 transformation-specific (ETS) transcription factor activating TERT expression in the context of highly recurrent promoter mutations . These findings may indicate that the tandem flanking native ETS motifs interact with mutations in cancer, leading to the activation of TERT, which enables cells to overcome replicative senescence .
Genomic data have provided the abundant information about mutations in cancer, revealing interesting insights in stem cells and cancer . The investigations of cellular networks in terms of gene, genome and proteins would be needed to reveal whole pictures of stem cells [70,71].
Our knowledge has increased due to recent advances in bioinformatics and computational capacity. How to efficiently utilize these data and knowledge is an important issue for future development of the data era. One of the useful directions for utilizing gene information may be the identification of targets for the treatment of diseases, including appropriate predictions.
Since stem cell phenotype alters in disease conditions, it is very important to analyze and understand gene regulation in stem cells. One of the example models is that stem cell clonal expansion exponentially occurred in cancer, which emphasizes the significance of analyzing stem cell alterations relating the combinations of genes and molecules.
The author would like to thank the members of the National Institute of Health Sciences and researchers for helpful discussions.
All Published work is licensed under a Creative Commons Attribution 4.0 International License
Copyright © 2019 All rights reserved. iMedPub LTD Last revised : May 21, 2019