Big Data and Cancer

One of the major issues, 

from a human, socio economic, medical and scientific perspective is Cancer. Therefore, cancers are among the leading causes of morbidity and mortality worldwide; in 2012, there were approximately 14 million new cases [1].

The growing economic impact of cancer is considerable. In 2010, it was estimated that the total annual cost of the disease was about $ 1160 billion (US $) [2]. Of the low- and middle-income countries, only 1 in 5 have the data needed to drive a cancer control policy [3].

First of all,

Tumor progression is a dynamic process that tends to select a cell clone with one or more genetic alterations favoring its survival and expansion. The development of a tumor is based on a complex process that is not related to the alteration of a gene, isolated.

The understanding of this process and the characterization of a given tumor therefore need to identify the altered genetic and epigenetic events in a given tumor, to decrypt intra and intercellular signaling networks and to understand their biological consequence both at the cellular and tissue level [4].

At the end of the 1990s,

Consequently, the technological progress in the field of DNA sequencing (emergence of microarrays) and RNA led to the production of a large amount of data (in the order of several tens of Petra bytes) [5,6].

It is in the field of DNA sequencing (DNASeq) and RNA (RNASeq) that progress is greatest, with the rise of next generation sequencing (NGS) techniques. In cancer, the two most important initiatives are The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC). The aim is to characterize exhaustively and systematically a large number of cancer samples.

In parallel,

With the various clinical and molecular data collection programs, data generated by targeted sequencing for diagnostic purposes can represent a technical and human challenge:

  • Technical : by storing and managing more and more important data in laboratories of modest size.
  • Human : the need to recruit competent staff for bioinformatics analysis [4].

In addition,

The main challenge in « clinical routine » is to generate this data and use it to personalize the patient’s therapeutic strategy.

Initially restricted to research programs, advances in technology and reduced sequencing costs have made it possible to consider the production of large volumes of data in clinical settings, in patients with relapsed disease and resistant to conventional treatments, a promise of medicine based on an analysis of each patient’s tumor characteristics and genetic makeup [4].

In conclusion,

The development of dynamic mathematical models to describe the temporal evolution of the tumor and provided with personalized parameters by a molecular characterization specific to each patient will be an essential key.

This reveals a major challenge that is the establishment of dialogue and the orchestration of exchanges between different actors: doctors, biologists, bioinformaticians, biomathematicians.



[1] Ferlay J, Soerjomataram I, Ervik M, Dikshit R, Eser S, Mathers C et al. GLOBOCAN 2012 v1.0, Cancer Incidence and Mortality Worldwide: IARC CancerBase No. 11 Lyon, France: International Agency for Research on Cancer; 2013

[2] Stewart BW, Wild CP, editors. World cancer report 2014 Lyon: International Agency for Research on Cancer; 2014

[3] Global Initiative for Cancer Registry Development. International Lyon: International Agency for Research on Cancer

[4] Saintigny P, et al. Apport et défis des Big Data en cancérologie. Bull Cancer (2016)

[5] Barrett T, Qilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevky M, et al. NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res 2013;41(Database issue):D991-5

[6] Kolesnikov N, Hastings E, Keays M, Melnichuk O, Tang YA, Williams E, et al. ArrayExpress update – simplifying data submissions. Nucleic Acids Res 2015; 43 (Database issue):D1113-16

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *