Big Data and Cancer

One of the major issues, 

from a human, socio economic, medical and scientific perspective is Cancer. Therefore, cancers are among the leading causes of morbidity and mortality worldwide; in 2012, there were approximately 14 million new cases [1].

The growing economic impact of cancer is considerable. In 2010, it was estimated that the total annual cost of the disease was about $ 1160 billion (US $) [2]. Of the low- and middle-income countries, only 1 in 5 have the data needed to drive a cancer control policy [3].

First of all,

Tumor progression is a dynamic process that tends to select a cell clone with one or more genetic alterations favoring its survival and expansion. The development of a tumor is based on a complex process that is not related to the alteration of a gene, isolated.

The understanding of this process and the characterization of a given tumor therefore need to identify the altered genetic and epigenetic events in a given tumor, to decrypt intra and intercellular signaling networks and to understand their biological consequence both at the cellular and tissue level [4].

At the end of the 1990s,

Consequently, the technological progress in the field of DNA sequencing (emergence of microarrays) and RNA led to the production of a large amount of data (in the order of several tens of Petra bytes) [5,6].

It is in the field of DNA sequencing (DNASeq) and RNA (RNASeq) that progress is greatest, with the rise of next generation sequencing (NGS) techniques. In cancer, the two most important initiatives are The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC). The aim is to characterize exhaustively and systematically a large number of cancer samples.

In parallel,

With the various clinical and molecular data collection programs, data generated by targeted sequencing for diagnostic purposes can represent a technical and human challenge:

  • Technical : by storing and managing more and more important data in laboratories of modest size.
  • Human : the need to recruit competent staff for bioinformatics analysis [4].

In addition,

The main challenge in « clinical routine » is to generate this data and use it to personalize the patient’s therapeutic strategy.

Initially restricted to research programs, advances in technology and reduced sequencing costs have made it possible to consider the production of large volumes of data in clinical settings, in patients with relapsed disease and resistant to conventional treatments, a promise of medicine based on an analysis of each patient’s tumor characteristics and genetic makeup [4].

In conclusion,

The development of dynamic mathematical models to describe the temporal evolution of the tumor and provided with personalized parameters by a molecular characterization specific to each patient will be an essential key.

This reveals a major challenge that is the establishment of dialogue and the orchestration of exchanges between different actors: doctors, biologists, bioinformaticians, biomathematicians.



