Data Analytics

Data Analytics | CERN

CERNs know-how and experience with ‘big data’ analysis for high energy physics and control of systems used in the LHC.

CERN’s Know-How

  • Cern’s experiments probing the fundamental nature of the universe creates 1 PB/sec —roughly four times that held in the US Library of Congress
  • About 1 million CPU cores worldwide are used to process and analyse all data from LHC, using advanced data analytics
  • Additionally, online and offline analysis of the data acquired from each of the 20,000 devices that monitor and control the CERN complex 

Facts & Figures

  • >10 PB/month of  data selected by trigger mechanisms and stored in CERN Data Center
  • 170 Data Centers worldwide at which LHC data analysis is being done
  • >250 PB CERN Data Center storing all physics data for analysis
  • >1000 PB of ROOT data

Key Competences

Designing Data Analytics Infrastructure

In order to process and analyze the vast amounts of data generated by the experiments at CERN, a data infrastructure was designed for distributed analytics. This infrastructure is made of various layers and allows 1000 clients to access the data for analysis, handling >5 million data transaction per day. With its unique knowhow in structuring big data sets, CERN can elaborate efficient analysis.

Components used for big data and related analytics

  • User Interface: Notebooks, SWAN (developed by CERN)
  • Data analysis: ROOT / TMVA (developed by CERN)
  • Apache Hadoop clusters with YARN and HDFS (also HBase, Impala, Hive,…)
  • Apache Spark for analytics and Apache Kafka for streaming

Data Analysis for Control Systems

CERN analyses data from its large industrial infrastructure, for monitoring, control and predictive maintenance purposes. This includes data from accelerators, detectors, cryogenic systems, data centers and log files from the Worldwide LHC Computing Grid and others.

Specifications

  • Online monitoring (analysis of logs, alarms, loads)
  • Fault analysis (root cause analysis / fault detection)
  • Predictive maintenance
  • Safety
  • Input for new engineering designs