Abstract:
Systemic therapy of breast cancer can include chemotherapy, hormonal therapy, and targeted therapy. Prognostic biomarkers are able to predict survival and predictive biomarkers are able to predict therapy response. In this report, we describe the initial release of the first available online tool able to identify gene expression-based predictive biomarkers using transcriptomic data of a large set of breast cancer patients. Published gene expression data of 36 publicly available datasets was integrated with treatment data into a unified database. Response to therapy was determined using either author-reported pathological complete response data (n=1,775) or relapse-free survival status at five years (n=1,329). Treatment data includes chemotherapy (n=2,108), endocrine therapy (n=971), and anti-HER2 therapy (n=267). The transcriptomic database includes 20,089 unique genes and 54,675 probe sets. Gene expression and therapy response are compared using receiver operating characteristics and Mann-Whitney tests. We demonstrate the utility of the pipeline by cross-validating 23 paclitaxel resistance-associated genes in different molecular subtypes of breast cancer. An additional set of established biomarkers including TP53 for chemotherapy in Luminal breast cancer (p=1.01e-19, AUC=0.769), HER2 for trastuzumab therapy (p=8.4e-04, AUC=0.629), and PGR for hormonal therapy (p=8.6e-05, AUC=0.7), are also endorsed. The tool is designed to validate and rank new predictive biomarker candidates in real time. By analyzing the selected genes in a large set of independent patients, one can select the most robust candidates and quickly eliminate those that are most likely to fail in a clinical setting. The analysis tool is accessible at www.rocplot.org. This article is protected by copyright. All rights reserved.