Validation of Biomarkers in Gene Expression Datasets of Inflammatory Bowel Disease: IL13RA2, PTGS2 and WNT5A as Predictors of Responsiveness to Infliximab Therapy

Copyright: © 2014 Győrffy A, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Abstract Background: Some patients with inflammatory bowel disease (IBD) do not respond to infliximab (IFX) therapy. Gene expression studies revealed genes that may help to predict non-responding IBD patients. Our purpose was to validate the discriminating power of published genes.


Introduction
Infliximab therapy controls effectively the signs and symptoms of inflammatory bowel disease (IBD) in the majority of patients who fail to respond conventional therapy. About 40% of IBD patients, however, are not responding and are exposed to risks of IFX therapy without clinical benefits [1]. Identification of non-responding patients before the therapy is, therefore, a big challenge for clinicians. Clinical factors determining pharmacokinetics [2] such as baseline laboratory data, medicinal history, smoking, dosing schedules are implicated in therapeutic failure. However, patients with comparable clinical characteristics still differ largely in term of their responsiveness to IFX therapy. Therefore, biomarkers are needed to classify IBD patients whether they are candidate or not to IFX treatment.
Biomarkers may vary in term of their nature, physiologic function and the sample of origin. While antibodies against IFX are associated with loss of response [3], they are not suitable to predict response in IFX-naive patients. Recently, in addition to routine laboratory indicators and immune markers [4] much research activity focused on the measurement of gene expression values in colon biopsy samples and peripheral blood to predict the individual response to IFX in IFXnaive IBD patients. However, study designs including methods used to determine gene expression and patient populations to be enrolled were quite heterogeneous. In addition, the majority of these studies investigated some specific elements or pathways of disease pathogenesis and did not analyze the gene expression patterns as a whole. Therefore, the information originated from these studies is unavoidably skewed to these factors and may ignore important detail beyond the scope of that research. These limitations make the comparison and validation of the individual study results quite difficult.
The availability of recent gene expression microarray data obtained in IBD groups later responding and non-responding to IFX therapy provides an opportunity to re-analyze how the published genes would perform in another IBD population.
In this study we cross-validated published data with GEO datasets. We also analyzed whether biomarkers identified in blood or biopsy samples reflect gene expression patterns in biopsies and peripheral blood mononuclear cells, respectively. As a result of this work we identified genes that were consistently suitable to discriminate between IBD patients responding and non-responding to IFX therapy in all analyzed datasets of colon biopsy specimens.

Literature search
We have set up the present meta-analysis according to "Preferred Reporting Items for Systematic Reviews and Meta-Analyses" guidelines published in 2009 (PRISMA) [5]. We used a pipeline requiring for both the markers to be validated and the data to be used for validation to be identified by a search of available publications and datasets. The design for cross-validation process (including identification of transcriptome datasets and published biomarkers) is presented in Figure 1.

Construction of GEO-based microarray database: the transcriptome arm
We have searched GEO (http://www.ncbi.nlm.nih.gov/gds) using the keywords "infliximab" and "response" to identify transcriptomic datasets publishing response data against infliximab treatment. Only datasets including at least five patients for each included cohort (responder and non-responder) were considered. Patients receiving placebo were also excluded from the analysis. The series matrix file for each eligible dataset was downloaded and subsequently each dataset was processed separately.

Genetic biomarkers of IFX responsiveness: the 'biomarker arm'
A search was performed in Pubmed using combinations of the keywords "infliximab", "response", "marker" and "gene expression". Only publications in English language were considered. Repeated publications were identified and excluded so that each gene was included just once in the final database. In addition, only studies publishing gene expression biomarkers for inflammatory bowel disease, Crohn's disease and ulcerative colitis were included. We also recorded whether the biomarker relates to a gene determined in blood or a biopsy sample. The unique gene symbols and names were identified for each gene by querying the online repository of the HUGO Gene Nomenclature Committee (http://www.genenames.org).

Statistical analyses
Receiver Operating Characteristics (ROC) analysis was performed in the R statistical environment (http://www.r-project.org) using Biobase Bioconductor libraries (http://www.bioconductor.org/). In a ROC analysis the true positive rate (=sensitivity of the marker) is compared to the false positive rate (=100-specificity of the marker) for each available cut-off point of a parameter. Thus, each value of the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. Then, area under the ROC curve (AUC) is computed to measure of how well this parameter can distinguish between the two diagnostic groups (responder/nonresponder). The AUC value of a perfect biomarker is 1.0.

IBD, n=9
Genes in these studies, n=63 Genes in these studies, n=52

BIOMARKER ARM TRANSCRIPTOMIC ARM
Other disease, n=9 (RA, JIA, depression) Measured in at least one transcriptomic study, n=104 Blood-based markers, n=39 Biopsy-based markers, n=65 We also set up an R function to enable instant reproduction of all of the results published in our present study. An R script for ROC analysis was also added to the supplemental material (Supplemental R script 1.R) with an additional input file needed for designation of the samples into responder and non-responder cohorts-these data were originally published in the series matrix files in GEO, but could not be downloaded in an automated way (Supplemental Table 1). The ROC_stats function in the supplemented script calculates the p-values that test the null hypothesis that the area AUC really equals 0.5. First, the script calculates the standard error of our null hypothesis from which the function derives the z-score. Finally, the p-value is calculated by the transformation of the z-score using the normal distribution. In addition, the function performs downloading and parsing for each of the original series matrix files using the GEOquery package (http:// watson.nci.nih.gov/~sdavis/), therefore, there is no need to include these in the supplemental material. This new ROC analysis script not only enables the user to validate the results of present study, but also provides an easy to use tool for independent validation of new biomarker candidates.
With this approach, we re-tested the discrimination performance of each of the 104 published genes in the 4 GEO datasets. Statistical significance was set at p<0.05 and the genes had to reach statistical significance using the average p value across all datasets.
An independent analysis of GEO datasets was also performed using the same statistical approach.

Identification of transcriptomic databases
The database search in GEO resulted in all together 13 datasets, of which four datasets were related to IBD. Three and one GEO dataset were originated from colon biopsies and blood mononuclear cells (PBMC), respectively. These were related to 59 and 40 IFX responder and non-responder patients, respectively. Table 1 presents the characteristics of these 4 datasets; clinical characteristics for each patient are listed in Supplemental Table 2. The entire database contains 20,168 genes-one should note, however, that some inconsistencies regarding gene lists measured in individual datasets exist. Genes originated from biomarker studies were verified by each of the four independent datasets.

Biomarkers
We identified 292 publications investigating candidate biomarkers of non-responsiveness to IFX therapy. Thirty-one papers repeatedly tested biomarkers of previous studies, while 98 studies investigated biomarkers other than gene expression. Of the remaining 163 publications, 28 studies enrolled IBD patients. In these studies a total of 139 genes were tested as possible biomarkers of IFX responsiveness. Of these, 104 were present in at least one of the 4 GEO datasets (Figure 1).

Published biomarkers of response to infliximab therapy
To avoid batch effects, the datasets were not combined into one set. Instead, we performed the entire analysis in each of the datasets. Then, 65 and 39 genetic biomarkers identified in biopsy samples and blood were related to GEO datasets of biopsy samples and PBMC. The complete results of analysis for each of the 104 candidate genes within each of the four datasets and the link to the original publication in PubMed are listed in Supplemental Table 3.
Of the 65 candidate genes reported as possible biomarker in biopsy specimens just 25 genes discriminated significantly (p<0.05) infliximab responders and non-responders in the three biopsy datasets consistently. The discriminative power of these biomarkers is listed in Table 2A in term of ROC AUC (area under the curve) values. Of note, just three biomarkers (S100A8, SELP and CD86) were significant also in PBMC dataset.
Of the 39 candidate genes reported as possible biomarker in peripheral blood just 9 genes in PBMC datasets provided significant discrimination (p<0.05) between infliximab responders and nonresponders. The discriminative power of these biomarkers is listed in Table 2B in term of ROC AUC values; just three biomarker (WARS, MAP1LC3B and ODC1) were significant also in biopsies.

Analysis of datasets irrespectively of published results
The complete results of analysis including AUC values and p values for all available 20,168 genes in each of the four datasets is presented as Supplemental Table 4. In order to identify the most robust markers, we considered only those markers which were significant across all three independent analyses of colon biopsy samples and with the average of ROC values. The top five of these markers are listed in Table 3; of those, three (IL13RA2, PTGS2 and WNT5A) were also identified in earlier studies.

Discussion
The discrimination of IBD patients as possible responders and nonresponders before the initiation of IFX therapy is a still unmet clinical need. Previous attempts to predict therapeutic response identified several genes in colon biopsy samples and peripheral blood.
Before one would start to investigate clinical usefulness of biomarker cited as predictive for therapeutic response in a larger population, the published data are worth to be re-validated in independent cohorts. In our study the cross-validation of the discriminating biomarkers was performed with ROC analysis. ROC AUC curve and associated p values were generated using an R-script (available as supplemental material). With this approach we demonstrated that merely 17 and 23 per cent of published biomarkers provided consistently significant discrimination between responding and non-responding patients in transcriptomic datasets of the same kind of sample (colon and PBMC, respectively). Of note, these biomarkers were not overlapping: biomarkers identified in colon biopsy samples were not identical to those in peripheral blood. Therefore, datasets of PBMC and biopsy samples are not comparable and their information content cannot be pooled. Additionally, to confirm the possible relevance of some published genes, we also performed a de novo analysis on biopsy datasets. Of the top five genes we identified as very efficient discriminators in microarray datasets, two genes were novel, while three genes (IL13RA2, PTGS2 and WNT5A) published earlier were validated in our study. All of these genes exhibited significantly increased expression in colon biopsy samples of IBD patients not responding to IFX therapy. ( Of these: validation in same tissue is OK 25 9 Of these: validation in different tissue is OK 3 3 Table 2: Detailed results for previously published biomarker candidates including characteristics of the original studies describing the marker as well as performance of the gene to discriminate responder and non-responder patients after infliximab therapy in each available dataset. Biomarkers identified in colon biopsy specimens (A) include genes with a p value <0.05 as the average of the three biopsy-based dataset is listed. Biomarkers identified in blood samples (B) and number of genes with a consistent discriminator performance in each dataset of the same type of sample (C). Ref #17 has more patients as in Table 1 because in the original study additional samples were included in the discovery set. *Genes identified in GSE12251 were also described in the paper where GSE12251 was published-these are included because their average p-value is significant in the two other datasets.
studies, Wnt5a (a noncanonical Wnt ligand) potentiated colonic crypt regeneration after tissue injury [17]. On the other hand, high Wnt5a levels were associated with poor prognosis in colorectal cancer patients and promoted human colon cancer cell migration in animals [18]. The contribution of Wnt5a to polyp formation in colon was also suggested [19]. Wnt5a is also involved in the induction of epithelialto-mesenchymal transition [20], a common phenomenon in chronic inflammatory conditions.
Data suggest that the relative absence of an endogenous Wntinhibitor in colonic myofibroblasts (and, consequently, the increase of Wnt levels in crypts) is associated with an increased risk of cancer in ulcerative colitis [21]. Although there are no data available whether IBD patients non-responding to IFX therapy are at an increased risk for colon cancer, an animal study indicated that IBD animals treated with IFX are protected from colon cancer [22][23][24][25][26][27][28][29]. Therefore, one may also speculate that patients having high Wnt5a gene expression may be subjected to IFX non-responsiveness and, also, are at a higher risk of cancer. These data together are suggestive that the causative role of Wnt5a in resistance to IFX therapy may be worth to be investigated further.
The major limitation of our in silico analysis is the verification of our results in independent datasets. However, the lack of an available biobank with IFX responding and non-responding patients' biopsy specimens prevented the prompt investigation of this issue; several years are required to complete this task. Therefore, this analysis should be regarded rather as a hypothesis-generating one.
In conclusion, our analysis revealed that IL13RA2, PTGS2 and WNT5A, genes expressed in colonic tissues of IBD patients are suitable to discriminate patients responding and non-responding to IFX therapy. As all three genes encode proteins that are implicated in intestinal homeostasis and pathology; the difference in their expression in responding and non-responding patients may indicate important diagnostic targets in IBD therapy. used; however, the relatively small sample size would not allow to make established conclusions.) In addition to use as biomarkers to discriminate between IBD patients responding and non-responding to IFX treatment these genes may provide novel perspectives in IBD pharmacotherapy. While the major limitation of our in silico study is the absence of protein levels, one may assume that transcriptome is representative for the proteome. Therefore, one may speculate that proteins encoded by these genes as PTGS2, IL13RA and Wnt5a may predict and, theoretically, contribute to the responsiveness to IFX in IBD.
PTGS2 is the gene that encodes the enzyme prostaglandinendoperoxide synthase also called as cyclooxygenase-2 (COX-2) the enzyme is responsible for the production of prostaglandin H2, an inflammatory mediator from arachinodic acid. Previous observations indicate higher than normal PTGS2 expression in colonic epithelial cells in IBS [6] and that PTGS2 expression is associated with disease activity [7]. In addition to these data our analysis indicates that patients do not responding to IFX therapy may exhibit even higher PTGS expression than those with appropriate response. This finding is in line with the demonstrated contribution of inflammatory prostaglandins to the pathogenesis of IBD.
On the other hand, COX-2 derived prostaglandins also have a role in mucosal defense in the small intestine and colon [8]. Hence, inhibition of COX-2 is a double-edged sword as it is reflected by controversial results of controlled trials and experimental colitis models with COX-2 inhibitors [9]. Administration of COX-2 inhibitors was associated with the cited as a possible contributor to the integrity of intestinal mucosa, in the healing of gastrointestinal ulcers and in the modulation of IBD [10], but exacerbation of IBD was also reported in several studies with COX-2 inhibition [11].
Interleukin (IL)-13 is an important regulator of epithelial apoptosis in immune mediated disorders [12]. It plays a role in inflammation, mucus production, tissue remodeling, and fibrosis [13]. Disturbed IL-13-expression may play a role in immune dysregulation [14]. IL-13 signaling is mediated by the type-2 IL-4 receptor. The IL-13 receptor alpha 2 (IL13RA2) inhibits the activity of IL-13; IL-13Ralpha2 contributes to the down-regulation of a chronic and pathogenic Th2mediated immune response [15]. These literary data support the results of our analysis indicating the association of an abnormal IL13RA2 expression with infliximab unresponsiveness.
Literary data on Wnt5a gene product are also promising. Members of Wnt family are glycoproteins actively secreted by cells and along with other pathways (e.g. FGF, Notch, BMP, and Hedgehog signalization) they contribute to intestinal stem cell biology, the maintenance of homeostasis, and, in case of malfunctioning, to pathological conditions including inflammation and cancer [16]. Regarding Wnt5a, several possible functions in gastrointestinal tract were reported. In animal  Table 3: Top five genes ranked by AUC with consistently significant predictive values in all the 3 GEO datasets of colon biopsy samples. Bold: genes overlapping to Table 2.