CHEP submissions

To upload an abstract:

  • Login to this site
  • Click on the pencil (top-right corner) and then 'Create Content' tab
  • Click on 'CHEP', and fill in the form. Don't forget to save.

CHEP Submissions

Contact: Luca Menichetti
Author(s):
Primary authors Marco Meoni (Universita di Pisa & INFN (IT)) Co-authors Tommaso Boccali (Universita di Pisa & INFN (IT)) Nicolo Magini (Fermi National Accelerator Lab. (US)) Luca Menichetti (CERN) Domenico Giordano (CERN)
Abstract:

p { margin-bottom: 0.1in; line-height: 120%; }

The CMS experiment has implemented a computing model where distributed monitoring infrastructures are collecting any kind of data and metadata about the performance of the computing operations. This data can be probed further by harnessing Big Data analytics approaches and discovering patterns and correlations that can improve the throughput and the efficiency of the computing model.

CMS has already begun to store a large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - in a Hadoop cluster. This offers the ability to run fast arbitrary query on the data and test several computing MapReduce-based frameworks.

In this work we analyze the XrootD logs collected in Hadoop through Gled and Flume and we benchmark their aggregation at the level of dataset for monitoring purpose of popularity queries, thus proving how dashboard and monitoring systems can benefit from Hadoop parallelism. Processing time on existing Oracle DBMS of XrootD time-series logs does not scale linearly with data volume. Conversely, Big Data architectures do and make it very effective re-processing any user-defined time interval. The entire set of existing Oracle queries is replicated in the Hadoop data store and result validation is performed accordingly.

These results constitute the set of features on top of which a mining platform is designed to predict the popularity of a new dataset, the best location for replicas or the proper amount of CPU and storage in future timeframes. Learning techniques applied to Big Data architectures are extensively explored to study the correlations between aggregated data and seek for patterns in the CMS computing ecosystem. Examples of this kind are primarily represented by operational information like file access statistics or dataset attributes, which are organised in samples suitable for feeding several classifiers.

(CMS abstract, submitted here because two authors are from IT)

Submission type:
Contact: Alberto Aimar
Author(s):
Alberto Aimar, Pedro Andrade, Borja Garrido Bear, Maria-Varvara Georgiou, Edward Karavakis, Luca Magnoni, Rocio Rama Ballesteros, Hassen Riahi, Javier Rodriguez Martinez, Pablo Saiz, Daniel Zolnai
Abstract:

For over a decade, LHC experiments have been relying on advanced and specialized WLCG dashboards for monitoring, visualizing and reporting the status and progress of the job execution, data management transfers and sites availability across the WLCG distributed grid resources.

In the recent years, in order to cope with the increase of volume and variety of the grid resources, the WLCG monitoring had started to evolve towards data analytics technologies such as ElasticSearch, Hadoop and Spark. Therefore, at the end of 2015, it was agreed to merge these WLCG monitoring services, resources and technologies with the internal CERN IT data centres monitoring services also based on the same solutions.

The overall mandate was to migrate, in concertation with representatives of the users of the LHC experiments, the WLCG monitoring to the same technologies used for the IT monitoring. It started by merging the two small IT and WLCG monitoring teams, in order to join forces to review, rethink and optimize the IT and WLCG monitoring and dashboards within a single common architecture, using the same technologies and workflows used by the CERN IT monitoring services.

This work, in early 2016, resulted in the definition and the development of a Unified Monitoring Architecture aiming at satisfying the requirements to collect, transport, store, search, process and visualize both IT and WLCG monitoring data. The newly-developed architecture, relying on state-of-the-art open source technologies and on open data formats, will provide solutions for visualization and reporting that can be extended or modified directly by the users according to their needs and their role. For instance it will be possible to create new dashboards for the shifters and new reports for the managers, or implement additional notifications and new data aggregations directly by the service managers, with the help of the monitoring support team but without any specific modification or development in the monitoring service.

This contribution provides an overview of the Unified Monitoring Architecture, currently based on technologies such as Flume, ElasticSearch, Hadoop, Spark, Kibana and Zeppelin, with insight and details on the lessons learned, and explaining the work done to monitor both the CERN IT data centres and the WLCG job, data transfers and sites and services.

 

Submission type: Talk or poster
Contact: Gerhard Ferdinand Rzehorz
Author(s):
Gen Kawamura, Oliver Keeble, Arnulf Quadt and *Gerhard Rzehorz
Abstract:

This contribution reports on the feasibility of executing data intensive workflows on Cloud infrastructures. In order to assess this, the metric ETC = Events/Time/Cost is formed, which quantifies the different workflow and infrastructure configurations that are tested against each other.

In these tests ATLAS reconstruction Jobs are run, examining the effects of overcommitting (more parallel processes running than CPU cores available), scheduling (staggered execution) and scaling (number of cores). The desirability of commissioning storage in the cloud is evaluated, in conjunction with a simple analytical model of the system, and correlated with questions about the network bandwidth, caches and what kind of storage to utilise.

In the end a cost/benefit evaluation of different infrastructure configurations and workflows is undertaken, with the goal to find the maximum of the ETC value.

Submission type: talk
Contact: Julia Andreeva
Author(s):
Maria Alandes Pradillo, Julia Andreeva, Alexey Anisenkov, Giuseppe Bagliesi, Stefano Belforte, Simone Campana, Maria Dimou, Alessandro Di Girolamo, Josep Flix, Alessandra Forti, Edward Karavakis, Stephan Lammel, Maarten Litmaath, Andrea Sciaba, Andrea Valassi
Abstract:

The Worldwide LHC Computing Grid infrastructure links about 200 participating computing centers affiliated with several partner projects. It is built by integrating heterogeneous computer and storage resources in diverse data centers all over the world and provides CPU and storage capacity to the LHC experiments to perform data processing and physics analysis.  In order to be used by the experiments, these distributed resources should be well described, which implies easy service discovery and detailed description of service configuration. Currently this information is scattered over multiple generic information sources like GOCDB, OIM, BDII and experiment-specific information systems. Such a model does not allow to validate topology and configuration information easily. Moreover, information in various sources is not always consistent.  Finally, the evolution of computing technologies introduces new challenges. Experiments are more and more relying on opportunistic resources, which by their nature are more dynamic and should also be well described in the WLCG information system.

This contribution describes the new WLCG configuration service CRIC (Computing Resource Information Catalog) which collects information from various information providers, performs validation and provides a consistent set of UIs and APIs to the LHC VOs for service discovery and usage configuration. The main requirements for CRIC are simplicity, agility and robustness.  CRIC should be able to be quickly adapted to new types of computing resources, new information sources, and allow for new data structures to be implemented easily following the evolution of the computing models and operations of the experiments.

The implementation of CRIC was inspired by the successful experience with the ATLAS Grid Information System (AGIS).  The first prototype was put in place in a short time thanks to the fact that the substantial part of AGIS code was re-used though some re-factoring required in order to perform clean decoupling in two parts:

  • A core which describes all physical endpoints and provides a single entry point for WLCG service discovery.
  • Experiment-specific extensions (optional), implemented as plugins. They describe how the physical resources are used by the experiments and contain additional attributes and configuration which are required by the experiments for operations and organization of their data and work flows.

CRIC not only provides a current view of the WLCG infrastructure, but also keeps track of performed changes and audit information. Its administration interface allows authorized users to make changes. Authentication and authorization are subject to experiment policies in terms of data access and update privileges.

Submission type: talk
Contact: Maarten Litmaath
Author(s):
Maria Alandes Pradillo, Julia Andreeva, Alessandro Di Girolamo, Maria Dimou, Josep Flix, Alessandra Forti, Maarten Litmaath, Andrea Sciaba, Andrea Valassi
Abstract:

The Worldwide LHC Computing Grid (WLCG) infrastructure allows the use of resources from more than 150 sites.  Until recently the setup of the resources and the middleware at a site were typically dictated by the partner grid project (EGI, OSG, NorduGrid) to which the site is affiliated.  Since a few years, however, changes in hardware, software, funding and experiment computing requirements have increasingly affected the way resources are shared and supported.  At the WLCG level this implies a need for more flexible and lightweight methods of resource provisioning.

In the WLCG cost optimisation survey presented at CHEP 2015 the concept of lightweight sites was introduced, viz. sites essentially providing only computing resources and aggregating around core sites that provide also storage.  The efficient use of lightweight sites requires a fundamental reorganisation not only in the way jobs run, but also in the topology of the infrastructure and the consolidation or elimination of some established site services.

This contribution gives an overview of the solutions being investigated through "demonstrators" of a variety of lightweight site setups, either already in use or planned to be tested in experiment frameworks.

 

Submission type: talk
Contact: Julia Andreeva
Author(s):
Maria Alandes Pradillo, Julia Andreeva, Alessandro Di Girolamo, Maria Dimou, Josep Flix, Alessandra Forti, Fabrizio Furano, Edward Karavakis, Oliver Keeble, Stephan Lammel, Maarten Litmaath, Nicolo Magini, Natalia Ratnikova, Stefan Roiser, Andrea Valassi
Abstract:

The WLCG computing infrastructure provides distributed storage capacity hosted at the geographically dispersed computing sites.

In order to effectively organize storage and processing of the LHC data, the LHC experiments require a reliable and complete overview of the storage capacity in terms of the occupied and free space, the storage shares allocated to different computing activities, and the possibility to detect “dark” data that occupies space while being unknown to the experiment’s file catalog. The task of the WLCG space accounting activity is to provide such an overview and to assist LHC experiments and WLCG operations to manage storage space and to understand future requirements.

Several space accounting solutions which have been developed by the LHC experiments are currently based on Storage Resource Manager (SRM).  In the coming years SRM becomes an optional service for sites which do not provide tape storage. Moreover, already now some of the storage implementations do not provide an SRM interface. Therefore, the next generation of the space accounting systems should not be based on SRM. This contribution gives an overview of various alternatives to SRM-based space accounting solutions and presents a common approach for space accounting to be applied on the WLCG infrastructure.

Submission type: talk
Contact: Sima Baymani
Author(s):
Sima Baymani et al
Abstract:

RapidIO (http://rapidio.org/) technology is a packet-switched high-performance fabric, which has been under active development since 1997. Originally meant to be a front side bus, it developed into a system level interconnect which is today used in all 4G/LTE base stations world wide. RapidIO is often used in embedded systems that require high reliability, low latency and scalability in a heterogeneous environment - features that are highly interesting for several use cases, such as data analytics and data acquisition networks.

We will present the results of evaluating RapidIO in a Data Analytics environment, from setup to benchmark. Specifically, we will share the experience of running ROOT and Hadoop on top of RapidIO.

To demonstrate the multi-purpose characteristics of RapidIO, we will also present the results of investigating RapidIO as a technology for high-speed Data Acquisition networks using a generic multi-protocol event-building emulation tool.

In addition we will present lessons learned from implementing native ports of CERN applications to RapidIO.

Submission type: Talk
Contact: Massimo Lamanna
Author(s):
J. Moscicki et al.
Abstract:

A new approach to providing scientific computing services is currently investigated at CERN. It combines solid existing components and services (EOS Storage, CERNBox Cloud Sync&Share layer, ROOT Analysis Framework) with rising new technologies (Jupyter Notebooks) to create a unique environment for Interactive Data Science, Scientific Computing and Education Applications.

EOS is the main disk storage system handling LHC data in the 100PB range. CERNBox offers a convenient sync&share layer and it is available everywhere: web, desktop and mobile. The Jupyter Notebook is a web application that allows users to create and share documents that contain live code, equations, visualizations and explanatory text. ROOT is a modular scientific software framework which provides the functionality to deal with big data processing, statistical analysis, visualisation and storage.

The system will be integrated in all major work-flows for scientific computing and with existing scientific data repositories at CERN. File access will be provided using a range of access protocols and tools: physics data analysis applications access CERNBox via xrootd protocol; Jupyter Notebooks interact with the storage via file-system interfaces provided by EOS fuse mounts; Grid jobs use webdav access authenticated with Grid certificates whereas batch jobs may use local Krb5 credentials for authentication. We report on early experience with this technology and applicable use-cases, also in a broader scientific and research context.

Submission type: TALK
Contact: Massimo Lamanna
Author(s):
D. van der Ster et al.
Abstract:

This work will present the status of Ceph-related operations and development within the CERN IT Storage Group: we summarise significant production experience at the petabyte scale as well as strategic developments to integrate with our core storage services. As our primary back-end for OpenStack Cinder and Glance, Ceph has provided reliable storage to thousands of VMs for more than 3 years; this functionality is used by the full range of IT services and experiment applications.

Ceph at the LHC scale (above 10's of PB) has required novel contributions both in the development and operational side. For this reason, we have performed scale testing in cooperation with the core Ceph team. This work has been incorporated into the latest Ceph releases and enables Ceph to operate with at least 7,200 OSDs (totaling 30 PB in our tests). CASTOR has been evolved with the possibility to use a Ceph cluster as extensible high-performance data pool. The main advantages of this solution are the drastic reduction of the operational load and the possibility to deliver high single-stream performances to efficiently drive the CASTOR tape infrastructure. Ceph is currently our laboratory to explore S3 usage in HEP and to evolve other infrastructure services.

In this paper, we will highlight our Ceph-based services, the NFS Filer and CVMFS, both of which use virtual machines and Ceph block devices at their core. We will then discuss the experience in running Ceph at LHC scale (most notably early results with Ceph-CASTOR).

Submission type: TALK
Contact: Massimo Lamanna
Author(s):
J. Iven et al.
Abstract:

OpenAFS is the legacy solution for a variety of use cases at CERN, most notably home-directory services. OpenAFS has been used as the primary shared file-system for Linux (and other) clients for more than 20 years, but despite an excellent track record the project's age and architectural limitations are becoming more evident. We are now working to offer an alternative solution based on existing CERN storage services. The new solution will offer evolved functionality while reducing risk factors compared to the present status, and is expected to eventually benefit from operational synergies.

In this paper we will present CERN's usage and an analysis of our technical choices: we will focus on the alternatives chosen for the various use cases (among them EOS, CERNBox, CASTOR); on implementing the migration process over the coming years; and the challenges expected to come up during the migration.

Submission type: TALK
Contact: Helge Meinhard
Author(s):
Helge Meinhard, for the HNSciCloud Consortium
Abstract:

The HELIX NEBULA Science Cloud (HNSciCloud) project is run by a consortium of ten procurers and two other partners; it is funded partly by the European commission, has a total volume of 5.5 MEUR and runs from January 2016 to June 2018. By its nature as a pre-commercial procurement (PCP) project, it addresses needs that are not covered by any commercially available solution yet. The contribution will explain the steps, including administrative and legal ones, needed to establish and conduct the project, and will describe CERN's experience as the lead procurer of the project.
 

Submission type: Poster
Contact: Prasanth Kothuri
Author(s):
Prasanth Kothuri,Zbigniew Baranowski,Kacper Surdy;Joeri Hermans;Daniel Lanza Garcia
Abstract:

This contribution is about sharing our recent experiences of building Hadoop based application. Hadoop ecosystem now offers myriad of tools which can overwhelm new users, yet there are successful ways these tools can be leveraged to solve problems. We look at factors to consider when using Hadoop to model and store data, best practices for moving data in and out of the system and common processing patterns, at each stage relating with the real world experience gained while developing such application. We share many of the design choices, tools developed and how to profile a distributed application which can be applied for other scenarios as well. In conclusion, the goal of the presentation is to provide guidance to architect Hadoop based application and share some of the reusable components developed in this process.

Submission type: talk
Contact: Luca Canali
Author(s):
Luca Canali, Zbigniew Baranowski, Prasanth Kothuri
Abstract:

This work reports on the activities of integrating Oracle and Hadoop technologies for CERN database services and in particular in the development of solutions for offloading data and queries from Oracle databases into Hadoop-based systems. This is of interest to increase the scalability and reduce cost for some our largest Oracle databases. These concepts have been applied, among others, to build offline copies of controls and logging databases, which allow reports to be run without affecting critical production and also reduces the storage cost. Other use cases include making data stored in Hadoop/Hive available from Oracle SQL, which opens the possibility for building applications that integrate data from both sources.

 

Submission type:
Contact: Massimo Lamanna
Author(s):
X. Espinal et al.
Abstract:

In the competitive 'market' for large-scale storage solution, EOS has been showing its excellence in the multi-Petabyte high-concurrency regime. It has also shown a disruptive potential in powering the CERNBox service in providing sync&share capabilities and in supporting innovative analysis environments along the storage of LHC data. EOS has also generated interest as generic storage solution ranging from university systems to very large installations for non-HEP applications. While preserving EOS as an open software solution for our community, we teamed up with the Comtrade company (within the CERN OpenLab framework) to productise this HEP contribution to ease its adoption by interested parties, notably outside HEP.

In this paper we will deliver a status report of this collaboration and of EOS adoption.

Submission type: TALK
Contact: Massimo Lamanna
Author(s):
L. Mascetti et al.
Abstract:

EOS, the CERN open-source distributed disk storage system, provides the high-performance storage solution for HEP analysis and the back-end for various work-flows. Recently EOS became the back-end of CERNBox, the cloud synchronisation service for CERN users. EOS can be used to take advantage of wide-area distributed installations: for the last few years CERN EOS uses a common deployment across two computer centres (Geneva-Meyrin and Budapest-Wigner) about 1,000 km apart (~20-ms latency) with about 200 PB of disk (JBOD). In late 2015, the CERN-IT Storage group and AARNET (Australia) set-up a challenging R&D project: a single EOS instance between CERN and AARNET with more than 300ms latency (16,500 km apart).

This paper will report about the success in deploy and run a distributed storage system between Europe (Geneva, Budapest), Australia (Melbourne) and later in Asia (ASGC Taipei), allowing different type of data placement and data access across these four sites.

Submission type: TALK
Contact: Massimo Lamanna
Author(s):
FDO section
Abstract:

Dependability, resilience, adaptability, and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the detectors need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the success of HEP experiments. Nowadays we operate at high incoming throughput (14GB/s during 2015 LHC Pb-Pb run) and with concurrent complex production work-loads. In parallel our systems provide the platform for the continuous user and experiment driven work-loads for large-scale data analysis, including end-user access and sharing. The storage services at CERN cover the needs of our community: EOS and CASTOR as a large-scale storage; CERNBox for end-user access and sharing; Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy distributed-file-system services.

In this paper we will summarise the experience in supporting LHC experiments and the transition of our infrastructure from static monolithic systems to flexible components providing a more coherent environment with pluggable protocols, tunable QoS, sharing capabilities and fine grained ACLs management while continuing to guarantee the dependable and robust services.

Submission type: TALK
Contact: Helge Meinhard
Author(s):
Helge Meinhard, for the HNSciCloud consortium
Abstract:

HEP is only one of many sciences with sharply increasing compute requirements that cannot be met by profiting from Moore's law alone. Commercial clouds potentially allow for realising larger economies of scale. While some small-scale experience requiring dedicated effort has been collected, European science has not ramped up to significant scale yet; in addition, public cloud resources have not been integrated yet with the standard workflows of science organisations in their private data centres. The HELIX NEBULA Science Cloud project, partly funded by the European Commission, addresses these points. Ten organisations under CERN's leadership, covering particle physics, bioinformatics, photon science and other sciences, have joined to procure public cloud resources as well as dedicated development efforts towards this integration. The contribution will give an overview of the project, explain the findings so far, and provide an outlook into the future.

Submission type: Oral presentation
Contact: Omar Awile
Author(s):
Omar Awile, Aram Santogidis
Abstract:

Application performance is often assessed using the Performance Monitoring Unit (PMU) capabilities present in modern processors. One popular tool that can read the PMU's performance counters is the Linux-perf tool. pmu-tools is a toolkit built around Linux-perf that provides a more powerful interface to the different PMU events and give a more abstracted view of the events. Unfortunately pmu-tools report results only in text form or simple static graphs, limiting their usability.

 

We report on our efforts of developing a web-based front-end for pmu-tools allowing the application developer to more easily visualize, analyse and interpret performance monitoring results. Our contribution should boost programmer productivity and encourage continuous monitoring of the application's performance. Furthermore, we discuss our tool's capability to quickly construct and test new performance metrics for characterizing application performance. This will allow the user to experiment with new high level metrics that reflect the performance requirements of his application more accurately.

Submission type: talk (paper/poster)
Contact: Emmanuel Ormancey
Author(s):
M. Domaracky et al.
Abstract:

It’s been for almost 10 years that CERN has been providing live webcast of events using Adobe Flash technology. This year is finally the year that flash died at CERN! At CERN we closely follow the broadcast industry and are always trying to provide our users with the same experience as they have on other commercial streaming services. With Flash being slowly phased out on most of the streaming platforms, we moved as well from Flash to HTTP streaming. All our live streams are delivered via the HTTP Live Streaming (HLS) protocol, which is supported in all modern browsers on desktops and mobile devices. Thanks to HTML5 and the THEOPlayer we are able to deliver the same experience as we did with Adobe Flash based players.  Our users can still enjoy video of the speaker synchronised with video of the presentation, so they have the same experience as sitting in the auditoria.

For On Demand Video, to reach our users on any device, we improved the process of publishing recorded lectures with the new release of the CERN Lecture Archiving system - Micala. We improved the lecture viewer, which gives our users the best possible experience of watching recorded lectures, with both video of the speaker and slides in high resolution. For the mobile devices we improved quality and usability to watch any video from CDS even on low bandwidth conditions.

We introduced DVR functionality for all our live webcasts. Users that arrived late on the webcast website, now have a possibility to go back to the beginning of the webcast or if they missed something they can seek back to watch it again. With DVR functionality we are able to provide recording right after the webcast is finished.

With 15 CERN rooms capable of webcast and recording, about 300 live webcasts and 1200 lectures recorded every year, we needed a tool for our operators to start webcast and recording easily.  We developed a Central Encoding Interface, from which our operators see all the events for a given day and with one click can start webcasting and/or recording. With this new interface we manage to almost eliminate issues where operators forget to start the webcast and with an automatic stop, we now support webcasts and recording which finish out of standard working hours without additional expenses.

 

Submission type: Talk
Contact: Emmanuel Ormancey
Author(s):
P. Ferreira et al.
Abstract:

The last two years have been atypical to the Indico community, as the development team undertook an extensive rewrite of the application and deployed no less than 9 major releases of the system. Users at CERN have had the opportunity to experience the results of this ambitious endeavor. They have only seen, however, the "tip of the iceberg".

 

Indico 2.0 employs a completely new stack, leveraging open source packages in order to provide a web application that is not only more feature-rich but, more importantly, builds on a solid foundation of modern technologies and patterns. But this milestone represents not only a complete change in technology - it is also an important step in terms of user experience and usability that opens the way to many potential improvements in the years to come.

 

In this article, we will describe the technology and all the different dimensions in which Indico 2.0 constitutes an evolution vis-à-vis its predecessor and what it can provide to users and server administrators alike. We will go over all major system features and explain what has changed, the reasoning behind the most significant modifications and the new possibilities that they pave the way for.

 

Submission type: Talk
Contact: Emmanuel Ormancey
Author(s):
P. Ferreira et al.
Abstract:

Over the last two years, a small team of developers worked on an extensive rewrite of the Indico application based on a new technology stack. The result, Indico 2.0, leverages open source packages in order to provide a web application that is not only more feature-rich but, more importantly, builds on a solid foundation of modern technologies and patterns.

 

Indico 2.0 has the peculiarity of looking like an evolution (in terms of user experience and design), while constituting a de facto revolution. An extensive amount of code (~75%) was rewritten, not to mention a complete change of database and some of the most basic components of the system.

 

In this article, we will explain the process by which, over a period of approximately two years, we have managed to deliver and deploy a completely new version of an application that is used on a daily basis by the CERN community and HEP at large, in a gradual way, with no major periods of unavailability and with virtually no impact in performance and stability. We will focus particularly on how such an endeavor would not have been possible without the use of Agile Methodologies of software development. We will provide examples of practices and tools that we have adopted and display the evolution of development habits in the team over the period in question, as well as their impact in code quality and maintainability.

 

Submission type: talk
Contact: Emmanuel Ormancey
Author(s):
JY Le Meur
Abstract:

This talk reports on the status of the new CERN Digital Memory project.
From the repeated observation that a fraction of content digitally-produced by CERN has been lost or is threaten to be lost, a review of information systems in use at CERN has been carried out and actions have been proposed to move towards a common and standard OAIS based approach for long term digital preservation of CERN assets.
The first part of the talk will explain the rationale of such a project, looking at world wide evolution and current practices. The second part will describe the status of the implementation of digital preservation according to the ISO Norm 14721 and what has been achieved so far. Finally, a third part will explore innovative processes to collect missing knowledge worth being transmitted to future generations.

Submission type: Talk (poster)
Contact: Emmanuel Ormancey
Author(s):
Tibor Simko et al. (presenter TBD)
Abstract:

In this paper we present the new Invenio 3 digital library framework and
demonstrate its application in the field of open research data
repositories. Invenio digital library framework is composed of more than
sixty independently developed packages that share a set of common
patterns and communicate together via well-established APIs. The digital
repository managers can cherry-pick individual modules with the aim of
building a customised digital repository solution targeting their
individual needs and use cases. We present how the Invenio technology
has been applied in two research data services: (1) the CERN Open Data
portal that provides access to the approved open datasets and software
of the ALICE, ATLAS, CMS and LHCb collaborations; (2) the Zenodo service
that offers an open research data archiving solution to world-wide
scientific communities in any research discipline. We discuss the role
of underlying technologies such as the JSON Schema for controlling
metadata structure, the Elasticsearch for information retrieval, the
CERN EOS system for data storage, or the role of virtual environments
(CernVM) and container-based solutions (Docker) that together with the
archived data analysis software (Jupyter notebooks, custom analysis
code) aim at reproducing the research data analyses even many years
after their publication.

Submission type: talk (poster)
Contact: Emmanuel Ormancey
Author(s):
Ludmila Marian et al. (presenter TBD)
Abstract:

CERN Document Server (CDS) is the CERN Institutional Repository, playing a key role in the storage, dissemination and archival for all research material published at CERN, as well as multimedia and some administrative documents. As CERN’s document hub, it joins together submission and publication workflows dedicated to the CERN experiments, but also to the video and photo teams, to the administrative groups, as well as outreach groups.

 

In the past year, Invenio, the software platform underlying CDS, has been undergoing major changes, transitioning from a digital library system to a digital library framework, and moving to a new software stack (Invenio is now built on top of the Flask web development framework, using Jinja2 template engine, SQLAlchemy ORM, JSONSchema data model, and Elasticsearch for information retrieval). In order to reflect these changes on CDS, we are launching a parallel service, CDSLabs, with the goal of offering our users a continuous view of the reshaping of CDS, as well as increasing the feedback from the community in the development phase, rather than after release.

 

The talk will provide a detailed view on the new and improved features of the next generation CERN Document Server, as well as its design and architecture. The talk will then cover how the new system is shaped to be more user driven, and to respond better to different needs from different user communities (Library, Experiments, Video team, Photo team, and others), and what mechanisms have been put in place to synchronise the data on the two parallel systems. As a showcase, the talk will present more in depth the architecture and development of a new workflow for submitting and disseminating CERN videos.

Submission type: Talk (poster)
Contact: Emmanuel Ormancey
Author(s):
Sebastian Bukowiec (TBD for presentation)
Abstract:

Windows Terminal Servers provide application gateways for various parts of CERN accelerator complex, used by hundreds of CERN users every day. Combination of new tools such as Puppet, HAProxy and Microsoft System Center suite enable automation of provisioning workflows to provide terminal server infrastructure that can scale up and down in an automated manner. The orchestration does not only reduce the time and effort necessary to deploy new instances, but also facilitates operations such as patching, analysis and recreation of compromised nodes as well as catering for workload peaks.

Submission type: poster (paper / talk)
Contact: Emmanuel Ormancey
Author(s):
Vincent Bippus and Natalie Kane (TBD for presentation)
Abstract:

CERN Print Services include over 1000 printers and multi-function devices as well as a centralised print shop. Every year, some 12 million pages are printed. We will present the recent evolution of CERN print services, both from the technical perspective (automated web-based configuration of printers, Mail2Print) and the service management perspective.

Submission type: Poster (Paper / talk)
Contact: Emmanuel Ormancey
Author(s):
Andreas Wagner (Alexandre Lossent for presentation)
Abstract:

CERN’s enterprise Search solution “CERN Search” provides a central search solution for users and CERN service providers. A total of about 20 million public and protected documents from a wide range of document collections is indexed, including Indico, TWiki, Drupal, SharePoint, JACOW, E-group archives, EDMS, and CERN Web pages. 
In spring 2015, CERN Search was migrated to a new infrastructure based on SharePoint 2013. In the context of this upgrade, the document pre-processing and indexing process was redesigned and generalised. The new data feeding framework allows to profit from new functionality and to facilitate the long term maintenance of the system. 

Submission type: poster (paper / talk)
Contact: Emmanuel Ormancey
Author(s):
Andreas Wagner (Alexandre Lossent for presentation)
Abstract:

In October 2015, CERN’s core website has been moved to a new address, http://home.cern, marking the launch of the brand new top-level domain .cern.  In combination with a formal governance and registration policy, the IT infrastructure needed to be extended to accommodate the hosting of Web sites in this new top level domain.  We will present the technical implementation in the framework of the CERN Web Services that allows to provide virtual hosting and a reverse proxy solution and includes the provisioning of SSL server certificates for secure communications.

Submission type: poster (paper / talk)
Contact: Emmanuel Ormancey
Author(s):
Alexandre Lossent
Abstract:

The CERN Web Frameworks team has deployed OpenShift Origin to facilitate deployment of web applications and improve resource efficiency. OpenShift leverages Docker containers and Kubernetes orchestration to provide a Platform-as-a-service solution oriented for web applications. We will review use cases and how OpenShift was integrated with other services such as source control, web site management and authentication services.

Submission type: talk (poster)
Contact: Fons Rademakers
Author(s):
Fons Rademakers for the CERN openlab team
Abstract:

CERN openlab is a unique public-private partnership between CERN and leading IT companies and research institutes. Several of the CERN openlab projects research technologies that have the potential to become game changers in HEP software development (like Intel Xeon-FPGA, Intel 3DXpoint memory, Micron Automata Processor, etc.). In this presentation I will highlight a number of these technologies in detail and describe in what way they might change current software development techniques and practices.

Submission type: Talk
Contact: Fons Rademakers
Author(s):
Fons Rademakers for the CERN openlab team
Abstract:

CERN openlab is a unique public-private partnership between CERN and leading IT companies and research institutes. In the fall of 2015 CERN openlab and its partner Intel organised a Code Modernization Competition. The goal of the competition was to speed up the performance of a given piece of code, by making use of all possible features of the latest Intel CPU’s and co-processors. This competition was a great success and the results were quite remarkable and unexpected. In this presentation I will highlight the competition's results and the lessons learned, and more importantly the setting up of an own permanent CERN openlab coding competition environment, to be used to let students and professionals alike challenge each other's coding prowess.

Submission type: Talk
Contact: Fons Rademakers
Author(s):
Fons Rademakers for the CERN openlab team
Abstract:

CERN openlab is a unique public-private partnership between CERN and leading IT companies and research institutes. Having learned a lot from the close collaboration with industry in many different projects we now are using this experience to transfer some of our knowledge to other scientific fields, specifically in the areas of code optimization for the simulations of biological dynamics and the advanced usage of ROOT for the storage and processing of genomics data. In this presentation I will give an overview of the knowledge transfer projects we are currently engaged in. How they are relevant and beneficial for all parties involved, the interesting technologies that are being developed and what the potential and exciting results will be.
 

Submission type: Talk
Contact: Manuel Martin Marquez
Author(s):
Andrea Apollonio, Volodimir Begy, Johannes Gutleber, Manuel Martin-Marquez, Niemi Arto, Jussi-Pekka Penttinen, Elena Rogova, Antonio Romero-Marin, Peter Solander
Abstract:

European Strategy for Particle Physics update 2013, the study explores different designs of circular colliders for the post-LHC era. Reaching unprecedented energies and luminosities require to understand system reliability behaviour from the concept phase onwards and to design for availability and sustainable operation. The study explores industrial approaches to model and simulate the reliability and availability the entire particle collider complex. Estimates are based on an in-depth study of the CERN injector chain and LHC collider and are carried out as a cooperative effort with the HL-LHC project. The work so far has revealed that a major challenge is obtaining accelerator monitoring and operation data with sufficient quality, to automate the data quality annotation and calculation of reliability distribution functions for systems, subsystems and components where needed. A flexible data management and analytics environment that permits integrating the heterogenous data sources, the domain-specific data quality management algorithms and the reliability modelling and simulation suite is a key enabler to complete this accelerator operation study. This paper describes the Big Data infrastructure and analytics ecosystem that has been put in operation at CERN, serving as the foundation on which reliability and availability analysis and simulations can be built. This contribution focuses on data infrastructure and data management aspects and gives practical data analytics examples.

Submission type:
Contact: Manuel Martin Marquez
Author(s):
Andrea Apollonio, Volodimir Begy, Johannes Gutleber, Manuel Martin-Marquez, Niemi Arto, Jussi-Pekka Penttinen, Elena Rogova, Antonio Romero-Marin, Peter Solander
Abstract:

European Strategy for Particle Physics update 2013, the study explores different designs of circular colliders for the post-LHC era. Reaching unprecedented energies and luminosities require to understand system reliability behaviour from the concept phase onwards and to design for availability and sustainable operation. The study explores industrial approaches to model and simulate the reliability and availability the entire particle collider complex. Estimates are based on an in-depth study of the CERN injector chain and LHC collider and are carried out as a cooperative effort with the HL-LHC project. The work so far has revealed that a major challenge is obtaining accelerator monitoring and operation data with sufficient quality, to automate the data quality annotation and calculation of reliability distribution functions for systems, subsystems and components where needed. A flexible data management and analytics environment that permits integrating the heterogenous data sources, the domain-specific data quality management algorithms and the reliability modelling and simulation suite is a key enabler to complete this accelerator operation study. This paper describes the Big Data infrastructure and analytics ecosystem that has been put in operation at CERN, serving as the foundation on which reliability and availability analysis and simulations can be built. This contribution focuses on data infrastructure and data management aspects and gives practical data analytics examples.

Submission type:
Contact: Dirk Duellmann
Author(s):
Michal Simon, Andrew Hanushevsky
Abstract:

XRootD is a distributed, scalable system for low-latency file access. It is the primary data access framework for the high-energy physics community. One of the latest developments in the project has been to incorporate metalink and segmented file transfer technologies.
We report on the implementation of the metalink metadata format support within XRootD client. This includes both the CLI and the API semantics. Moreover, we give an overview of the employed segmented file transfer mechanism that exploits metalink-based data sources. Its aim is to provide multisource file transmission (BitTorrent-like), which results in increased transfer rates.

The final abstract for this poster will be submitted upon Michal's return to CERN early April.

Submission type: poster
Contact: Dirk Duellmann
Author(s):
Dirk Duellmann, Luca Menichetti, Vineet Menon, Kacper Surdy
Abstract:

The statistical analysis of infrastructure metrics comes with several specific challenges, including the fairly large volume of unstructured metrics from a large set of independent data sources. Hadoop and Spark provide an ideal environment in particular for the first steps of skimming rapidly through hundreds of TB of low relevance data to find and extract the much smaller data volume that is relevant for statistical analysis and modelling.
This presentation will describe the new Hadoop service at CERN and the use of several of its components for high throughput data aggregation and ad-hoc pattern searches. We will describe the hardware setup used, the service structure with a small set of decoupled clusters and the first experience with co-hosting different applications and performing software upgrades. We will further detail the common infrastructure used for data extraction and preparation from continuous monitoring and database input sources.

Submission type: talk
Contact: Dirk Duellmann
Author(s):
Dirk Duellmann, Christian Nieke, Krystof Borkovec
Abstract:

The IT Analysis Working Group (AWG) has been formed at CERN across individual computing units and the experiments to attempt a cross cutting analysis of computing infrastructure and application metrics. In this presentation we will describe the first results obtained using medium/long term data (1 months - 1 year) correlating box level metrics, job level metrics from LSF and HTCondor, I/O metrics from the physics analysis disk pools (EOS) and networking and application level metrics from the experiment dashboards.
We will cover in particular the measurement of hardware performance and prediction of job durations, the latency sensitivity of different job types and a search for bottlenecks with the production job mix in the current infrastructure. The presentation will conclude with the proposal of a small set of metrics to simplify drawing conclusions also in the more constrained environment of public cloud deployments.

Submission type: talk
Contact: Maria Girone
Author(s):
Maria Girone for the CERN openlab team
Abstract:

LHC Run3 and Run4 represent an unprecedented challenge for HEP computing in terms of both data volume and complexity.   New approaches are needed for how data is collected and filtered, processed, moved, stored and analyzed if these challenges are to be met with a realistic budget.    To develop innovative techniques we are fostering relationships with industry leaders.  CERN openlab is a unique resource for public-private partnership between CERN and leading Information Communication and Technology (ICT) companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide HEP community.  In 2015, CERN openlab started its phase V with a strong focus on tackling the upcoming LHC challenges.   Several R&D programs are ongoing in the areas of data acquisition, networks and connectivity, data storage architectures, computing provisioning, computing platforms and code optimisation and data analytics.  In this presentation I will give an overview of the different and innovative technologies that are being explored by CERN openlab V and discusses the long-term strategies that are pursued by the LHC communities with the help of industry in closing the technological gap in processing and storage needs expected in Run3 and Run4. 

Submission type: talk
Contact: Nathalie Rauschmayr
Author(s):
Nathalie Rauschmayr, Sami Kama
Abstract:

HEP applications perform an excessive amount of allocations/deallocations within short time intervals which results in memory churn, poor localtiy and performance degradation. These issues are already known for a decade, but due to the complexity of software frameworks and the large amount of allocations (which are in the order of billions for a single job), up until recently no efficient meachnism has been available to correlate these issues with source code lines. However, with the advent of the Big Data era, many tools and platforms are available nowadays in order to do memory profiling at large scale. Therefore, a prototype program has been developed to track and identify each single de-/allocation. The CERN IT Hadoop cluster is used to compute memory key metrics, like locality, variation, lifetime and density of allocations. The prototype further provides a web based visualization backend that allows the user to explore the results generated on the Hadoop cluster. Plotting these metrics for each single allocation over time gives new insight into application's memory handling. For instance, it shows which algorithms cause which kind of memory allocation patterns, which function flow causes how many shortlived objects, what are the most commonly allocated sizes etc. The paper will give an insight into the prototype and will show profiling examples for LHC reconstruction, digitization and simulation jobs.

Submission type:
Contact: Nathalie Rauschmayr
Author(s):
Nathalie Rauschmayr, Sami Kama
Abstract:

Memory has become a critical parameter for many HEP applications and as a consequence some experiments had already to move from single- to multicore jobs. However in the case of LHC experiment software, benchmark studies have shown that many applications are able to run with a much lower memory footprint than what is actually allocated. In certain cases even half of the allocated memory being swapped out does not result in any runtime penalty. As a consequence many allocated objects are kept much longer in memory than needed and remain therefore unused. In order to identify and quantify such unusued (obsolete) memory, FOM-tools has been developed. The paper presents the functionalities of the tool and shows concrete examples on how FOM-tools helped to remove unusued memory allocations in HEP software.

Submission type:
Contact: Fabrizio Furano
Author(s):
Laurence Field, Fabrizio Furano, Kwong-Tat Cheung
Abstract:

Volunteer computing has the potential to provide significant additional computing capacity for the LHC experiments.
One of the challenges with exploiting volunteer computing is to support a global community of volunteers that provides heterogeneous resources.
However, HEP applications require more data input and output than the CPU intensive applications that are typically used by other volunteer computing projects.
While the so-called "databridge" has already been successfully proposed as a method to span the untrusted and
trusted domains of volunteer computing and Grid computing respective, globally transferring data between potentially poor-performing public networks at home and CERN can be fragile and lead to wasted resources usage.
The expectation is that by placing closer to the volunteers a storage endpoint that is part of a wider, flexible
geographical databridge deployment, the transfer success rate and the overall performance can be improved.
This contribution investigates the provision of a globally distributed databridge implemented upon a commercial cloud provider.

 

Submission type: talk
Contact: Andreas Joachim Peters
Author(s):
Geoffray Adde, Andreas-Joachim Peters
Abstract:

Within the WLCG project EOS is evaluated as a platform to demonstrate efficient deployment of geographically distributed storage. Aim of distributed storage deployments is to reduce the number of individual end-points for LHC experiments (>100  today) and to minimize the required effort for small storage sites. The split of meta-data and data component in EOS allows to operate one regional high-available meta data service (MGM) and to deploy the easier to operate file storage compoment (FST) in geographically distributed sites. EOS has built-in support for geolocation-aware access scheduling, file placement policies and replication workflows.

This contribution will introduce the various concepts and discuss demonstrator deployments for several LHC experiments.

Submission type:
Contact: Andreas Joachim Peters
Author(s):
Andreas-Joachim Peters, Elvin Alin Sindrilaru
Abstract:

CERN has been developing and operating EOS as a disk storage solution successfully for 5 years. The CERN deployment provides 135 PB and stores 1.2 billion replicas distributed over two computer centres. Deployment includes four LHC instances, a shared instance for smaller experiments and since last year an instance for individual user data as well. The user instance represents the backbone of the CERNBOX service for file sharing. New use cases like synchronisation and sharing, the planned migration to reduce AFS usage at CERN and the continuous growth has brought EOS to new challenges.
Recent developments include the integration and evaluation of various technologies to do the transition from a single active in-memory namespace to a scale-out implementation distributed over many meta-data servers. The new architecture aims to separate the data from the application logic and user interface code, thus providing flexibility and scalability to the namespace component.
Another important goal is to provide EOS as a CERN-wide mounted filesystem with strong authentication making it a single storage repository accessible via various services and front-ends ( /eos initiative ). This required new developments in the security infrastructure of the EOS Fuse implementation. Furthermore, there were a series of improvements targeting the end-user experience like tighter consistency and latency optimisations.
In collaboration with SEAGATE as openlab partner, EOS has a complete integration of OpenKinetic object drive cluster as a high-throughput, high-availability, low-cost storage solution.
This contribution will discuss these three main development projects and present new performance metrics.

Submission type:
Contact: German Cancio Melia
Author(s):
Julien Leduc et al.
Abstract:

CERN has been archiving data on tapes in its Computer Center for decades and its archive system is now holding more than 135 PB of HEP data in its premises on high density tapes.

For the last 20 years, tape areal bit density has been doubling every 30 months, closely following HEP data growth trends. During this period, bits on the tape magnetic substrate have been shrinking exponentially; today's bits are now smaller than most airborne dust particles or even bacteria. Therefore tape media is now more sensitive to contamination from airborne dust particles that can land on the rollers, reels or heads.

These can cause scratches on the tape media as it is being mounted or wound on the tape drive resulting in the loss of significant amounts of data.

To mitigate this threat, CERN has prototyped and built custom environmental sensors that are hosted in the production tape libraries, sampling the same airflow as the surrounding drives. This paper will expose the problems and challenges we are facing and the solutions we developed in production to better monitor CERN Computer Center environment in tape libraries and to limit the impact of airborne particles on the LHC data.

Submission type:
Contact: German Cancio Melia
Author(s):
Stefanos Laskaridis, Vladimir Bahyl, Julien Leduc et al
Abstract:

CERN currently manages the largest data archive in the HEP domain; over 135PB of custodial data is archived across 7 enterprise tape libraries containing more than 20,000 tapes and using over 80 tape drives. Archival storage at this scale requires a leading edge monitoring infrastructure that acquires live and lifelong metrics from the hardware in order to assess and proactively identify potential drive and media level issues. In addition, protecting the privacy of sensitive archival data is becoming increasingly important and with it the need for a scalable, compute-efficient and cost-effective solution for data encryption.

In this paper we first describe the implementation of acquiring tape medium and drive related metrics reported by the SCSI interface and its integration with our monitoring system. We then address the incorporation of tape drive real-time encryption with dedicated drive hardware into the CASTOR hierarchical mass storage system.

Submission type:
Contact: German Cancio Melia
Author(s):
Steven Murray, Eric Cano, Daniele Kruse et al.
Abstract:

The IT Storage group at CERN develops the software responsible for archiving to tape the custodial copy of the physics data generated by the LHC experiments. Physics run 3 will start in 2021 and will introduce two major challenges for which the tape archive software must be evolved. Firstly the software will need to make more efficient use of tape drives in order to sustain the predicted data rate of 100 petabytes per year as opposed to the current 40 petabytes per year of Run-2. Secondly the software will need to be seamlessly integrated with EOS, which has become the de facto disk storage system provided by the IT Storage group for physics data.
The tape storage software for LHC physics run 3 is code named CTA (the CERN Tape Archive). This paper describes how CTA will introduce a pre-emptive drive scheduler to use tape drives more efficiently, will encapsulate all tape software into a single module that will sit behind one or more EOS systems, and will be simpler by dropping support for obsolete backwards compatibility.

Submission type: Talk
Contact: Andrea Manzi
Author(s):
A.Manzi, V. De Notaris, O.Keeble, A. Kiryanov, H. Mikkonen, H. Short, P. Tedesco, R. Wartel
Abstract:

Access to WLCG resources is authenticated using an X509 and PKI infrastructure. Even though HEP users have always been  exposed to certificates directly, the development of modern Web Applications by the LHC experiments calls for simplified authentication processes keeping the underlying software unmodified.

In this work we will show an integrated Web-oriented solution (code name Kipper) with the goal of providing access to WLCG resources using the user's home organisation’s credentials, without the need for user-acquired X.509 certificates. In particular, we focus on identity providers within eduGAIN, which interconnects research and education organisations worldwide, and enables the trustworthy exchange of identity-related information.

eduGAIN has been integrated at CERN in the SSO infrastructure so that users can authenticate without the need of a CERN account.

This solution achieves “X.509-free” access to Grid resources with the help of two services: STS and an online CA. The STS (Security Token Service) allows credential translation from the SAML2 format used by Identity Federations to the VOMS-enabled X.509 used by most of the Grid. The IOTA (Identifier-Only Trust Assurance) CA is responsible for the automatic issuing of short-lived X.509 certificates.

The IOTA CA deployed at CERN has been accepted by EUGridPMA as the CERN LCG IOTA CA, included in the IGTF trust anchor distribution and installed by the sites in WLCG.

We will also describe the first example of Kipper allowing eduGAIN access  to WLCG, the WebFTS interface to the FTS3 data transfer engine, enabled by integration of multiple services: WebFTS, CERN SSO, CERN LCG IOTA CA, STS, and VOMS.

Submission type: talk
Contact: Sebastian Lopienski
Author(s):
Sebastian Lopienski
Abstract:

The CERN Computer Security Team is assisting teams and individuals at CERN who want to address security concerns related to their computing endeavours. For projects in the early stages, we help incorporate security in system architecture and design. For software that is already implemented, we do penetration testing. For particularly sensitive components, we perform code reviews. Finally, for everyone undertaking threat modelling or risk assessment, we provide input and expertise. After several years of these internal security consulting efforts, it seems a good moment to analyse experiences, recognise patterns and draw some conclusions. Additionally, it's worth mentioning two offspring activities that emerged in the last year or so: White Hat training, and the IT Consulting service.

Submission type:
Contact: Sebastian Lopienski
Author(s):
Sebastian Lopienski
Abstract:

In order to patch web servers and web application in a timely manner, we first need to know which software packages are used, and where. But, a typical web stack is composed of multiple layers, including the operating system, web server, application server, programming platform and libraries, database server, web framework, content management system etc. as well as client-side tools. Keeping track of all the technologies used, especially in a heterogeneous computing environment as found in research labs and academia, is particularly difficult. WAD, a tool developed at CERN based on a browser plugin called Wappalyzer, makes it possible to automate this task by detecting technologies behind a given URL. It allows for establishing and maintaining an inventory of web assets, and consequently greatly improves the coverage of any vulnerability management activities.

Submission type:
Contact: Oliver Keeble
Author(s):
Oliver Keeble, Maria Arsuaga Rios, Alejandro Ayllon, Georgios Bitzes, Fabrizio Furano, Andrea Manzi
Abstract:

Understanding how cloud storage can be effectively used, either standalone or in support of its associated compute, is now an important
consideration for WLCG.

We report on a suite of extensions to familiar tools targeted at enabling the integration of cloud object stores into traditional grid
infrastructures and workflows. Notable updates include support for a number of object store flavours in FTS3, Davix and gfal2, including mitigations for lack of vector reads; the extension of Dynafed to operate as a bridge between grid and cloud domains; protocol translation in FTS3; the implementation of extensions to DPM (also implemented by the dCache project) to allow 3rd party transfers over HTTP.

The result is a toolkit which facilitates data movement and access between grid and cloud infrastructures, broadening the range of
workflows suitable for cloud. We report on deployment scenarios and prototype experience, explaining how, for example, an Amazon S3
or Azure allocation can be exploited by grid workflows.

Submission type: Oral.
Contact: Fabrizio Furano
Author(s):
Fabrizio Furano, Oliver Keeble, Andrea Manzi, Georgios Bitzes
Abstract:

The DPM (Disk Pool Manager) project is the most widely deployed solution
for storage of large data repositories on Grid sites, and is completing the most
important upgrade in its history, with the aim of bringing
important new features, performance and easier long term maintainability.
Work has been done to make the so-called "legacy stack" optional, and
substitute it with an advanced implementation that is based on the fastCGI and
RESTful technologies.
Beside the obvious gain in making optional several legacy components that
are difficult to maintain, this step brings important features together with
performance enhancements. Among the most important features we can cite the
simplification of the configuration, the possibility of working
in a totally SRM-free mode, the implementation of quotas, free/used space on directories,
and the implementation of volatile pools that can pull files from external sources, which can
be used to deploy data caches.
Moreover, the communication with the new core, called DOME (Disk Operations Management Engine) now
happens through secure HTTPS channels through an extensively documented,
industry-compliant protocol.
For this leap, referred to with the codename "DPM Evolution",  the help of the DPM collaboration has been very important in the beta
testing phases, and here we report about the technical choices and the
first site experiences. 

Submission type: Talk
Contact: Andrea Manzi
Author(s):
A.Manzi, A. Ayllon, R. Rocha, M. Arsuaga.
Abstract:

The deployment of  Openstack Magnum at CERN has given the possibility to manage container orchestration engines such as Docker and Kubernetes as first class resources in Openstack.

In this poster we will show the work done to exploit a docker Swarm cluster deployed via Magnum to setup a docker infrastructure running FTS ( the WLCG file transfer service). FTS has been chosen as one of the pilots to validate the Magnum and docker integration with the rest of the CERN infrastructure tools. The FTS service has an architecture that is suitable for the exploitation of containers: the functionality now offered by a VM cluster can be decomposed in dedicated containers and separately scaled according to the load and user interactions.

The pilot is under evaluation with a view to a docker-based FTS production deployment.

Submission type: poster
Contact: Gavin McCance
Author(s):
Ben Jones Gavin McCance
Abstract:

The current tier-0 processing at CERN is done on two managed sites, the CERN computer centre and the Wigner computer centre. With the proliferation of public cloud resources at increasingly competitive prices, we have been investigating how to transparently increase our compute capacity to include these providers. The approach taken has been to integrate these resources using our existing deployment and computer management tools and to provide them in a way that exposes them to users as part of the same site. The paper will describe the architecture, the toolset and the current production experiences of this model.

Submission type: Poster
Contact: Jan van Eldik
Author(s):
Manfred Alef, Alessandro De Salvo, Alessandro Di Girolamo, Cristovao Cordeiro, Domenico Giordano, Arne Wiebalck, and others
Abstract:

Performance measurements and monitoring are essential for the efficient use of computing resources. In a commercial cloud environment an exhaustive resource profiling has additional benefits due to the intrinsic variability of the virtualised environment. In this context resource profiling via synthetic benchmarking quickly allows to identify issues and mitigate them. Ultimately it provides information about the actual delivered performance of invoiced resources.

In the context of its commercial cloud initiatives, CERN has acquired extensive experience in benchmarking commercial cloud resources, including Amazon, Microsoft Azure, IBM, ATOS, T-Systems, the Deutsche Boerse Cloud Exchange. The CERN cloud procurement process has greatly profited of the benchmark measurements to assess the compliance of the bids with the requested technical specifications. During the cloud production activities, the job performance has been compared with the benchmark measurements.

In this report we will discuss the experience acquired and the results collected using several benchmark metrics. Those benchmarks span from generic open-source benchmarks (encoding algorithm and kernel compilers) to experiment specific benchmarks (ATLAS KitValidation) and fast benchmarks based on random number generation. The workflow put in place to collect and analyse performance metrics will be also described.

Submission type: Talk. Note that we are proposing a contribution with authors from other institutes (ie. collaborators from the Hepix Benchmarking working group)
Contact: Jan van Eldik
Author(s):
M. Adam, C. Cordeiro, D Giordano, L. Field, L. Magnoni
Abstract:

The ongoing integration of clouds into the WLCG raises the need for a detailed health and performance monitoring of the virtual resources in order to prevent problems of degraded service and interruptions due to undetected failures. When working in scale, the existing monitoring diversity can lead to a metric overflow whereby the operators need to manually collect and correlate data from several monitoring tools and frameworks, resulting in tens of different metrics to be interpreted and analysed per virtual machine, constantly.

In this paper we present an ESPER based standalone application which is able to process complex monitoring events coming from various sources and automatically interpret data in order to issue alarms upon the resources' statuses, without interfering with the actual resources and data sources. We will describe how this application has been used with both commercial and non-commercial cloud activities, allowing the operators to quickly be alarmed and react upon VMs and clusters running with a low CPU load and low network traffic, among other anomalies, resulting then in either the recycling of the misbehaving VMs or fixes on the submission of the LHC experiments workflows. Finally we'll also present the pattern analysis mechanisms being used as well as the surrounding Elastic and REST API interfaces where the alarms are collected and served to users.

Submission type: Talk, or a poster to complement a talk on "CERN Computing in Commercial Clouds" (https://information-technology.web.cern.ch/CHEP/cern-computing-commercial-clouds)
Contact: Jan van Eldik
Author(s):
C. Cordeiro, D Giordano, L. Field, et al.
Abstract:

With the imminent upgrades to the LHC and the consequent increase of the amount and complexity of data collected by the experiments, CERN's computing infrastructures will be facing a large and challenging demand of computing resources. Within this scope, the adoption of cloud computing at CERN has been evaluated and has opened the doors for procuring external cloud services from providers, which can supply the computing services needed to extend the current CERN infrastructure.
Over the past two years the CERN procurement initiatives and partnership agreements have led to several cloud computing activities between the CERN IT department and firms like ATOS, Microsoft Azure,  T-Systems, Deutsche Börse Cloud Exchange and IBM SoftLayer.  
As of summer 2016 more than 10 Million core-hours of computing resources will have been delivered by commercial cloud providers to the 4 LHC experiments to run their production workloads, from simulation to full chain processing.
In this paper we describe the experience gained in procuring and exploiting commercial cloud resources for the computing needs of the LHC experiments. The mechanisms used for provisioning, monitoring, accounting, alarming and benchmarking will be discussed, as well as the feedback received from the LHC collaborations in terms of managing experiment' workflows within a multi-cloud environment. 

Submission type: Talk
Contact: Jan van Eldik
Author(s):
Ricardo Brito do Rocha et al. (CERN Private Cloud team)
Abstract:

Containers remain a hot topic in computing, with new use cases and tools appearing every day. Basic functionality such as spawning containers seems to have settled, but topics like volume support or networking are still evolving. Solutions like Docker Swarm, Kubernetes or Mesos provide similar functionality but target different use cases, exposing distinct interfaces and APIs.

The CERN private cloud is made of thousands of nodes and users, with many different use cases. A single solution for container deployment would not cover every one of them, and supporting multiple solutions involves repeating the same process multiple times for integration with authentication services, storage services or networking.

In this presentation we will describe OpenStack Magnum as the solution to offer container management in the CERN cloud. We will cover its main functionality and some advanced use cases using Docker Swarm and Kubernetes, highlighting some relevant differences between the two. We will describe the most common use cases in HEP and how we integrated popular services like CVMFS or AFS in the most transparent way possible, along with some limitations found. Finally we will look into ongoing work on advanced scheduling for both Swarm and Kubernetes, support for running batch like workloads and integration of container networking technologies with the CERN infrastructure.

 

Submission type: Talk
Contact: Juan Manuel Guijarro
Author(s):
Ignacio Reguero and Lorena Lobato CERN IT-CM Linux & Configuration Support Section
Abstract:

Load Balancing is one of the technologies enabling deployment of large scale applications on cloud resources. At CERN we have developed a DNS Load Balancer as a cost-effective way to do it for applications accepting DNS timing dynamics and not requiring memory. We serve 378 load balanced aliases with two small VMs acting as master and slave. These aliases are based on 'delegated' DNS zones the we manage with DYN-DNS based on a load metric collected with SNMP from the alias members. In the last years we have done several improvements to the software, for instance support for IPV6 AAAA records, parallelization of the SNMP requests, as well as reimplementing the client in python allowing for multiple aliases with differentiated state on the same machine, support for Roger state and other new features. The configuration of the Load Balancer is built with a Puppet type that gets the alias members dynamically from PuppetDB and consumes the alias definitions from a REST service. We have produced a self-service GUI for the management of the LB aliases based on the REST service above implementing a form of Load Balancing as a Service (LBaaS). Both the GUI and REST API have authorisation based in hostgroups. All this is implemented with Open Software without much CERN specific code. 

Submission type: Preferably talk but it could also be a poster.
Contact: Romain Wartel
Author(s):
Hannah Short
Abstract:

HEP has long been considered an exemplary field in Federated Computing; the benefit of this technology has been recognised by the thousands of researchers who have used the grid for nearly 15 years. Whilst the infrastructure is mature and highly successful, Federated Identity Management (FIM) is one area in which the HEP community should continue to evolve.

The ability for a researcher to use HEP resources with their existing account reflects the structure of a research team – team members continue to represent their own home organisation whilst collaborating. Through eduGAIN, the inter-federation service, an extensive suite of research services and a pool of international users have been unlocked within the scientific community. Establishing adequate trust between federation participants, as well as the relevant technologies, is the key to enable the effective adoption of FIM.

What is the current landscape of FIM for HEP? How do we see this landscape in the future? How do we get there? We will be addressing these questions in the context of CERN and WLCG.

Submission type: Oral presentation
Contact: Romain Wartel
Author(s):
Romain Wartel
Abstract:

This presentation offers an overview of the current security landscape - the threats, tools, techniques and procedures followed by attackers. These attackers range from cybercriminals aiming to make a profit, to nation-states searching for valuable information. Threat vectors have evolved in recent years; focus has shifted significantly, from targeting computer services directly, to aiming for the people managing the computational, financial and strategical resources instead. The academic community is at a crucial time and must proactively manage the resulting risks. Today, high quality threat intelligence is paramount, as it is the key means of responding and providing defendable computing services. Efforts are necessary, not only to obtain actionable intelligence, but also to process it, match it with traceability information such as network traffic and service logs, and to manage the findings appropriately. In order to achieve this, the community needs to take a three-fold approach: exploit its well-established international collaboration network; participate in vetted trust groups; further liaise with the private sector and law enforcement.

Submission type: Oral presentation
Contact: Vincent Ducret
Author(s):
Vincent Ducret
Abstract:

Over the last few years, the number of mobile devices connected to the CERN internal network has increased from a handful in 2006 to more than 10,000 in 2015. Wireless access is no longer a “nice to have” or just for conference and meeting rooms, now support for mobility is expected by most, if not all, of the CERN community. In this context, a full renewal of the CERN Wi-Fi network has been launched in order to provide a state-of-the-art Campus-wide Wi-Fi Infrastructure. Which technologies can provide an end-user experience comparable, for most applications, to a wired connection? Which solution can cover more than 200 office buildings, which represent a total surface of more than 400.000 m2, while keeping a single, simple, flexible and open management platform? The presentation will focus on the studies and tests performed at CERN to address these issues, as well as some feedback about the global project organisation.

Submission type:

Previous CHEPs

CHEP : 2015 (Okinawa, April 2015)

Contact: Fabrizio Furano
Abstract:

The DPM project offers an excellent opportunity for comparative testing of the HTTP and xroot protocols for data analysis:

  1. The DPM storage itself is multi-protocol, allowing comparisons to be performed on the same hardware
  2. The DPM has been instrumented to produce an i/o monitoring stream, familiar from the xrootd project,
  3. ...
Contact: Fabrizio Furano
Abstract:

In this contribution we describe the activities and the technical aspects that led to the construction of a public prototype for LHCb file access that is built on HTTP and WebDAV, supporting file access for distributed computing data management...

Contact: Aram Santogidis
Abstract:

ALFA is the common framework of the next generation software for ALICE and FAIR high energy physics experiments. It supports both offline and online processing which includes ALICE DAQ/HLT/Offline and the FairRoot project. The framework is designed based on a data-flow model with message-oriented middleware (MOM) serving as a trans-...

Contact: Helge Meinhard
Abstract:

We will present how CERN's services around Issue Tracking and Version Control have evolved, and what the plans for the future are. We will describe the services' main design, integration and structure, giving special attention to the new requirements from the community of users in terms of collaboration and integration...

Contact: Helge Meinhard
Abstract:

Using High-Level Synthesis (HLS) tools for Field-Programmable Logic Array (FPGA) programming is slowly emerging as alternative solution to well-established VHDL and Verilog languages. This contribution will examine whether HLS tools are applicable to designing FPGA-based data acquisition systems. We will present the implementation of the CMS ECAL Data Concentrator Card...

Contact: Helge Meinhard
Abstract:

In this paper we present our findings gathered during the evaluation and testing of Windows Server High Performance Computing (Windows HPC), in view of potentially using it as a production HPC system for Engineering HPC applications. The Windows HPC package comes as an extension of Microsoft's Windows Server product. The...

Contact: Helge Meinhard
Abstract:

Using virtualisation with CernVM has emerged as a de-facto standard among HEP experiments; it allows for running of HEP analysis and simulation programs in cloud environments. Following the integration of virtualisation with BOINC and CernVM, first pioneered for...

Contact: Helge Meinhard
Abstract:

As part of CERN's Agile Infrastructure project, large parts of the CERN batch farm have been moved to virtual machines running on CERNs private IaaS cloud. During this process a large fraction of the resources, which had previously...

Contact: Marian Babik
Abstract:

The Worldwide LHC Computing Grid relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion, traffic routing, etc. The WLCG Network and Transfer Metrics project aims to integrate...

Contact: Laurence Field
Abstract:

The use of Cloud technologies in WLCG is currently focused on the infrastructure as a service (IaaS) layer, more specifically ability to dynamically create virtual machines on demand. In adopting such an approach, it is not only necessary...

Contact: Laurence Field
Abstract:

Volunteer computing remains an untapped opportunistic resource for the LHC experiments. The use of virtualization in this domain was pioneered by the Test4theory project and enabled the running of high-energy particle physics simulations on home computers. This paper describes the model for CMS to run workloads using a similar volunteer...

Contact: Helge Meinhard
Abstract:

When CERN migrated its infrastructure away from home-grown fabric management tools to emerging industry-standard open-source solutions, the immediate technical challenges and motivation were clear. The move to a multi-site Cloud Computing model meant that the toolchains that were growing around this ecosystem would be a good choice, the challenge was...

Contact: Tim Smith
Abstract:

The talk will focus on the "lab version" of the CERN Document Server. The re-design of the collection pages, the easy personalization of the content preferred by the user, the convergence to the central monitoring tools (ES, Kibana), the new search engine and its underlying technology will be exposed.
...

Contact: Tim Smith
Abstract:

The talk will focus on how the CERN communication team and the experiments outreach committees are organizing their multimedia content to archive and disseminate it on CERN Document Server; how the front-end is evolving into a true multimedia platform, with peak of users and requesters many times a year, while...

Contact: Hassen Riahi
Abstract:

The overall success of LHC data processing depends heavily on stable, reliable and fast data distribution. The Worldwide LHC Computing Grid (WLCG) relies on the File Transfer Service (FTS) as the data movement middleware for moving sets of files from one site to another.

This paper describes the components of...

Contact: Tony Cass
Abstract:

The LHC Optical Private Network, linking CERN and the Tier1s and the LHC Open Network Environment linking these to the Tier2 community successfully supported the data transfer needs of the LHC community during Run 1 and have evolved to better serve the networking requirements of the new computing models for...

Contact: Tony Cass
Abstract:

The advent of mobile telephony and VoIP has significantly impacted the traditional telephone exchange industry---to such an extent that private branch exchanges are likely to disappear completely in the near future. For large organisations, such as CERN, it is important to be able to smooth this transition by implementing new...

Contact: Tony Cass
Abstract:

With the inexorable increase in the use of mobile devices, for both general communications and mission-critical applications, wireless connectivity is required anytime and anywhere. This requirement is addressed in office buildings through the use of Wi-Fi technology but Wi-Fi is ill adapted for use in large experiment halls and complex...

Contact: Massimo Lamanna
Abstract:

Cernbox is a cloud synchronisation service for end-users: it allows to sync and share files on all major mobile and desktop platforms (Linux, Windows, MacOSX, Android, iOS) aiming to provide offline availability to any data stored in the CERN EOS infrastructure.  The successful beta phase of the service confirmed the...

Contact: Massimo Lamanna
Abstract:

CERN IT DSS operates the main storage resources for data taking and physics analysis mainly via three system: AFS, CASTOR and EOS. The total amouns to about 100 PB (with relative ratios 1:10:30). EOS deploys disk resources across the two CERN computer centres (Meyrin and Wigner) with a ratio 60%...

Contact: Massimo Lamanna
Abstract:

In 2013, CERN IT evaluated then deployed a petabyte-scale Ceph cluster to support OpenStack use-cases in production. As of fall 2014, this cluster stores around 300 TB of data comprising more than a thousand VM images and a similar number of block device volumes. With more than a year of...

Contact: Tibor Simko
Abstract:

Leveraging on synergies between ALICE, ATLAS, CMS and LHCb experiments,
a joint CERN Data Analysis Preservation pilot was launched in order to
help structuring the knowledge capturing process throughout the data analysis
chain.  The project aims at preserving not only information about primary and
reduced datasets,...

Contact: Pablo Saiz
Abstract:

The WLCG monitoring system solves a challenging task of keeping track of the LHC computing activities on the WLCG infrastructure, ensuring health and performance of the distributed services at more than 160 sites. The current challenge consists of decreasing the effort needed to operate the monitoring service and to satisfy...

Contact: Fons Rademakers
Abstract:

CERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide HEP community. Since January 2015 openlab phase V has started. To bring the openlab conducted research closer to the experiments, phase...

Contact: Artur Wiecek
Abstract:

The Java ecosystem has changed since CERN was honoured with the Duke's
Choice Award in 2008. Nowadays the developers look beyond the limits of
the application servers and want to exploit all of the capabilities of
"The Cloud". To respond this demand the CERN IT Department started...

Contact: Hassen Riahi
Abstract:

AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, designed for managing users' data. It addresses a major weakness of the previous model, namely low total job execution efficiency due to high failure rate of the remote stage-out of the output files. The remote stage-out,...

Contact: Dirk Duellmann
Abstract:

Optimising a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are already successfully used for...

Contact: Dirk Duellmann
Abstract:

Amazon S3 is a widely adopted protocol for scalable cloud storage that could also fulfill storage requirements of the high-energy physics community. CERN has been evaluating this option using some key HEP applications such as ROOT and the CernVM filesystem (CvmFS) with S3 back-ends. In this contribution we present our...

Contact: Dirk Duellmann
Abstract:

Cernbox is a cloud synchronisation service for end-users: it allows to sync and share files on all major mobile and desktop platforms (Linux, Windows, MacOSX, Android, iOS) aiming to provide offline availability to any data stored in the CERN EOS infrastructure.  The very successful beta phase of the service  demonstrated...

Contact: Dirk Duellmann
Abstract:

Archiving data to tape is a critical operation for any storage system, especially for the EOS system at CERN which holds production data from all major LHC experiments. Each collaboration has an allocated quota it can use at any given time therefore, a mechanism for archiving "stale" data is needed...

Contact: Dirk Duellmann
Abstract:

The EOS storage software was designed to cover CERN disk-only storage use cases in the medium-term trading scalability against latency. To cover and prepare for long-term requirements the CERN IT data and storage services group (DSS) is actively conducting R&D and open source contributions to experiment with a next generation...

Contact: Dirk Duellmann
Abstract:

EOS is an open source distributed disk storage system in production since 2011 at CERN. Development focus has been on low-latency analysis use cases for LHC and non-LHC experiments and life-cycle management using JBOD hardware for multi PB storage installations. The EOS design implies a split of hot and cold...

Contact: Cristovao Cordeiro
Abstract:

The adoption of Cloud technologies by the LHC experiments places the fabric management burden of monitoring virtualized resources upon the VO. In addition to monitoring the status of the virtual machines and triaging the results, it must be understood if the resources actually provided match with any agreements relating to...

Contact: Domenico Giordano
Abstract:

Helix Nebula – the Science Cloud Initiative is a public-private-partnership between Europe's leading scientific research organisations (notably CERN, EMBL and ESA) and European IT cloud providers, that aims to establish a cloud-computing platform for data intensive science within Europe.

Over the past two years, Helix Nebula has built a federated...

Contact: Thomas Baron
Abstract:

The CERN IT department has built over the years a performant and integrated ecosystem of collaboration tools, from videoconference and webcast services to event management software. These services have been designed and evolved in very close collaboration with the various communities surrounding the laboratory and have been massively adopted by...

Contact: Thomas Baron
Abstract:

We will present an overview of the current real-time video service offering for the LHC, in particular the operation of the CERN Vidyo service will be described in terms of consolidated performance and scale: The service is an increasingly critical part of the daily activity of the LHC collaborations, topping...

Contact: Thomas Baron
Abstract:

Indico has come a long way since it was first used to organize CHEP 2004.
More than ten years of development have brought new features and projects, widening the application's feature set and enabling event organizers to work even more efficiently. While this has boosted the tool's usage and...

Contact: German Cancio Melia
Abstract:

Data backup and archival are an important but often overlooked activity. They are necessary in order to minimize the impact of hardware failures, software bugs and user mistakes. Additionally, they are also necessary for legal compliance and data warehousing.
CERN's current backup and archive service hosts 8.5 PB of...

Contact: German Cancio Melia
Abstract:

CERN’s tape-based archive system has collected over 70 Petabytes of data during the first run of the LHC. The Long Shutdown is being used for migrating the complete 100 Petabytes data archive to higher-density tape media. During LHC Run 2, the archive will have to cope with yearly growth rates...

Contact: German Cancio Melia
Abstract:

CASTOR (the CERN Advanced STORage system) is used to store the custodial copy of all of the physics data collected from the CERN experiments, both past and present.  CASTOR is a hierarchical storage management system that has a disk-based front-end and a tape-based back-end.  The software responsible for controlling the...

Contact: Edward Karavakis
Abstract:

The ATLAS Experiment at the Large Hadron Collider has collected data during Run 1 and is ready to collect data in Run 2. The ATLAS data are distributed, processed and analysed at more than 130 grid and cloud sites across the world. At any given time, there are more than...

Contact: Pedro Andrade
Abstract:

Over the past two years, the operation of the CERN Data Centres went through significant changes with the introduction of new mechanisms for hardware procurement, new services for cloud infrastructure and configuration management, among other improvements. These changes resulted in an increase of resources being operated in a more dynamic...

Contact: Luca Magnoni
Abstract:

Monitoring the WLCG infrastructure requires to gather and to analyze high volume of heterogeneous data (e.g. data transfers, job monitoring, site tests) coming from different services and experiment-specific frameworks to provide a uniform and flexible interface for scientists and sites. The current architecture, where relational database systems are used to...

Contact: Alejandro Alvarez Ayllon
Abstract:

FTS3 is the service responsible for the distribution of the LHC data across the WLCG Infrastructure. To facilitate its use outside the traditional grid environment we have provided a web application - known as WebFTS - fully oriented towards final users, and easily usable within a browser.

This web application is completely...

Contact: Romain Wartel
Abstract:

This presentation gives an overview of the current computer security landscape. It describes the main vectors of compromises in the academic community including lessons learnt, reveals inner mechanisms of the underground economy to expose how our computing resources are exploited by organised crime groups, and gives recommendations how to better...

Contact: Romain Wartel
Abstract:

Federated identity management (FIM) is an arrangement made among multiple organisations that lets subscribers use the same identification data, e.g. account names & credentials, to obtain access to the secured resources and computing services of all other organisations in the group. Specifically in the various research communities there is an...

Contact: Pawel Szostek
Abstract:

As Moore's Law drives the silicon industry towards higher transistor counts, processor designs are becoming more and more complex. The area of development includes core count, execution ports, vector units, uncore architecture and finally instruction sets. This increasing complexity leads us to a place where access to the shared memory...

Contact: Andrea Sciaba
Abstract:

The Worldwide LHC Computing Grid project (WLCG) provides the computing and storage resources required by the LHC collaborations to store, process and analyse the ~50 Petabytes of data annually generated by the LHC. The WLCG operations are coordinated by a distributed team of managers and experts and performed by people...

Contact: Tim Bell
Abstract:

While travelling, we expect to have access to Internet, or being able to check a mailbox. But until recently, it was difficult to maintain voice conversations while outside of your working place. For some cases we can use mobile phones but the roaming charges are high when abroad. At CERN...

Contact: Tim Bell
Abstract:

The emergence of social media platforms in the consumer space unlocked new ways of interaction between individuals on the web. People develop now their social networks and relations based on common interests and activities with the choice to opt-in or opt-out on content of their interest. This kind of platforms...

Contact: Tim Bell
Abstract:

Cloud federation brings an old concept into new technology, allowing for sharing resources between independent cloud installations. Cloud computing starts to play major role in HEP and e-science allowing resources to be obtained on demand. Cloud federation supports sharing between independent organizations and companies coming from the commercial world such...

Contact: Olof Barring
Abstract:

The Open Compute Project, OCP ( http://www.opencompute.org/), was launched by Facebook in 2011 with the objective of building efficient computing infrastructures at lowest possible cost. The technologies are released as open hardware, with the goal to develop servers and data centers following the model traditionally associated with open source...

Contact: Alessandro Di Girolamo
Abstract:

The ATLAS Distributed Computing infrastructure has evolved after the first period of LHC data taking in order to cope with the challenges of the upcoming LHC Run2. An increased data rate and computing demands of the Monte-Carlo simulation, as well as new approaches to ATLAS analysis, dictated a more dynamic...

Contact: Luca Canali
Abstract:

Data generation rates are expected to grow very fast for some database workloads going into LHC run 2 and beyond. In particular this is expected for data coming from controls, logging and monitoring systems. Storing, administering and accessing big data sets in a relational database system is in certain cases...

Contact: Luca Canali
Abstract:

During LHC run 1 ATLAS and LHCb databases have been using Oracle Streams replication technology for their use cases of data movement between online and offline Oracle databases. Moreover ATLAS has been using Streams to replicate conditions data from CERN to selected Tier 1s. GoldenGate is a new technology introduced...

Contact: Eva Dafonte Perez
Abstract:

Inspired on different database as a service, DBaas, providers, the database group at CERN has developed a platform to allow CERN user community to run a database instance with database administrator privileges providing a full toolkit that allows the instance owner to perform backup/ point in time recoveries, monitoring specific...

Contact: Eva Dafonte Perez
Abstract:

CERN IT-DB group is migrating its storage platform, mainly NetApp NAS’s running on 7-mode but also SAN arrays, to a set of NetApp C-mode clusters. The largest one is made of 14 controllers and it will hold a range of critical databases from administration to accelerators control or experiment control...

Contact: Eva Dafonte Perez
Abstract:

Data science is about unlocking valuable insights and obtaining deep knowledge out of the data. Its application enables more efficient daily-based operations and more intelligent decision-making processes. CERN has been very successful on developing custom data-driven control and monitoring systems. Several millions of control devices: sensors, front-end equipment, etc., make...

Contact: Eva Dafonte Perez
Abstract:

CERN’s accelerator complex is an extreme data generator, every single second an important amount of comprehensively heterogeneous data coming from control equipment and monitoring agents is persisted and needs to be analyzed. Over the decades, CERN’s researching and engineering teams have applied different approaches, techniques and technologies. This situation has...

Contact: Tim Bell
Abstract:

CERN has been running a production OpenStack cloud since July 2013 to support physics computing and infrastructure services for the site.

This talk will cover the different use cases for this service and experiences with this deployment in areas such as user management, deployment, metering and configuration of thousands of...

CHEP: 2013 (Amsterdam, Oct 2013)

Contact: Maria Dimou
Abstract:

In the Wordwide LHC Computing Grid (WLCG) project the Tier centres are of paramount importance for storing and accessing experiment data and for running the batch jobs necessary for experiment production activities. Although Tier2 sites provide a significant fraction of the resources a non-availability of resources at the Tier0 or the Tier1s can seriously harm not only WLCG Operations but also the experiments' workflow and the storage of LHC data which are very expensive to reproduce.

Contact: Andrzej Nowak
Abstract:

As Moore’s Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel “Ivy Bridge-EP” and “Haswell” processor families.

Contact: Andrzej Nowak
Abstract:

This paper summarizes the five years of CERN openlab’s efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT.

Abstract:

The Hadoop framework has proven to be an effective and popular approach for dealing with “Big Data” and, thanks to its scaling ability and optimised storage access, Hadoop Distributed File System-based projects such as MapReduce or HBase are seen as candidates to replace traditional relational database management systems whenever scalable speed of data processing is a priority. But do these projects deliver in practice? Does migrating to Hadoop’s “shared nothing” architecture really improve data access throughput? And, if so, at what cost?

Contact: Maaike Limper
Abstract:

As part of the CERN Openlab collaboration, an investigation has been made into the use of an SQL-based approach for physics analysis with various up-to-date software and hardware options. Currently physics analysis is done using data stored in customised root-ntuples that contain only the variables needed for a specific analysis. Production of these ntuples is mainly done by accessing the centrally produced analysis data through the LHC computing grid and can take several days to complete.

Contact: Edward Karavakis
Abstract:

The Worldwide LCG Computing Grid (WLCG) today includes more than 170 computing centres where more than 2 million jobs are being executed daily and petabytes of data are transferred between sites. Monitoring the computing activities of the LHC experiments, over such a huge heterogeneous infrastructure, is extremely demanding in terms of computation , performance and reliability. Furthermore, the generated monitoring flow is constantly increasing , which represents another challenge for the monitoring systems.

Contact: Lukasz Janyst
Abstract:

The Extended ROOT Daemon (XRootD) is a distributed, scalable system for low-latency clustered data access. XRootD is mature and widely used in HEP, both standalone and as core functionality for the EOS system at CERN, and hence requires extensive testing to ensure general stability. However, there are many difficulties posed by distributed testing, such as cluster initialization, synchronization, orchestration, inter-cluster communication and controlled failure handling.

Contact: Dirk Duellmann
Abstract:

Cloud storage is an emerging architecture aiming to provide increased scalability and access performance, compared to more traditional solutions. CERN is evaluating this promise using Huawei UDS and OpenStack storage deployments, focusing on the needs of high-energy physics. Both deployed setups implement S3, one of the protocols that are emerging as standard in the cloud storage market. A set of client machines has been used to generate I/O load patterns to evaluate the performance of both storage systems.

Contact: Edward Karavakis
Abstract:

The ATLAS Experiment at the Large Hadron Collider has been collecting data for three years. The ATLAS data are distributed, processed and analysed at more than 130 grid and cloud sites across the world. The total throughput of transfers is more than 5 GB/s and data occupies more than 120 PB on disk and tape storage. At any given time, there are more than 100,000 concurrent jobs running and more than a million jobs are submitted on a daily basis.

Abstract:

Using the framework of ITIL best practises, the service managers within CERN-IT have engaged into a continuous improvement process, mainly focusing on service operation. This implies an explicit effort to understand and improve all service management aspects in order to increase efficiency and effectiveness. We will present the requirements, how they were addressed and share our experiences. We will describe how we measure, report and use the data to continually improve both the processes and the services being provided.

Contact: Thomas Baron
Abstract:

In the last few years, we have witnessed an explosion of visual collaboration initiatives in the industry. Several advances in video services and also in their underlying infrastructure are currently improving the way people collaborate globally. These advances are creating new usage paradigms: any device in any network can be used to collaborate, in most cases with an overall high quality.
To keep apace with this technology progression, the CERN IT Department launched a service based on the Vidyo product.

Abstract:

For over a decade CERN's fabric management system has been based on home-grown solutions. Those solutions are not dynamic enough for CERN to face its new challenges such as significantly scaling out, multi-site management and the Cloud Computing model, without any additional staff.

Contact: Fabrizio Furano
Abstract:

In this contribution we present a vision for the use of the HTTP protocol for data management in the context of HEP, and we present demonstrations of the use of HTTP-based protocols for storage access & management, cataloguing, federation and transfer.

Contact: Gautam Botrel
Abstract:

This contribution describes how CERN has designed and integrated multiple essential tools for agile software development processes, ranging from a version control (Git) to issue tracking (Jira) and documentation (Wikis).

Contact: Thomas Baron
Title: Indico 1.0
Abstract:

Indico has evolved into the main event organization software, room booking tool and collaboration hub for CERN. The growth in its usage has only accelerated during the past 9 years, and today Indico holds more that 215,000 events and 1,100,000 files. The growth was also substantial in terms of functionalities and improvements. In the last year alone, Indico has matured considerably in 3 key areas: enhanced usability, optimized performance and additional features, especially those related to meeting collaboration.

Contact: Thomas Baron
Abstract:

For a long time HEP has been ahead of the curve in its usage of remote collaboration tools, like videoconference and webcast, while the local CERN collaboration facilities were somewhat behind the expected quality standards for various reasons. This time is now over with the creation by the CERN IT department in 2012 of an integrated conference room service which provides guidance and installation services for new rooms (either equipped for video-conference or not), as well as maintenance and local support.

Contact: Aurelie Pascal
Abstract:

CERN has recently renewed its obsolete VHF firemen’s radio network and replaced it by a digital one based on TETRA technology. TETRA already integrates an outdoor GPS localization system, but it appeared essential to look for a solution to also locate TETRA users in CERN’s underground facilities.

Abstract:

This contribution describes the evolution of the main CERN storage system, CASTOR, as it manages the bulk data stream of the LHC and other CERN experiments, achieving nearly 100 PB of stored data by the end of LHC Run 1.

Abstract:

Recent developments, including low power devices, cluster file systems and cloud storage, represent an explosion in the possibilities for deploying and managing grid storage. In this paper we present how different technologies can be leveraged to build a storage service with differing cost, power, performance, scalability and reliability profiles, using the popular DPM/dmlite storage solution as the enabling technology.

Abstract:

HammerCloud was designed and born under the needs of the grid community to test the resources and automate operations from a user perspective. The recent developments in the IT space propose a shift to the software defined data centers, in which every layer of the infrastructure can be offered as a service.

Abstract:

In order to ease the management of their infrastructure, most of the WLCG sites are adopting cloud based strategies. In the case of CERN, the Tier 0 of the WLCG, is completely restructuring the resource and configuration management of their computing center under the codename Agile Infrastructure. Its goal is to manage 15,000 Virtual Machines by means of an OpenStack middleware in order to unify all the resources in CERN's two datacenters: the one placed in Meyrin and the new on in Wigner, Hungary.

Abstract:

The recent paradigm shift toward cloud computing in IT, and general interest in "Big Data" in particular, have demonstrated that the computing requirements of HEP are no longer globally unique. Indeed, the CERN IT department and LHC experiments have already made significant R&D investments in delivering and exploiting cloud computing resources.

Abstract:

Data Storage and Services (DSS) group at CERN stores and provides access to the data coming from the LHC and other physics experiments. We implement specialized storage services to provide tools for an optimal data management, based on the evolution of data volumes, the available technologies and the observed experiment and users usage patterns. Our current solutions are CASTOR for highly-reliable tape-backed storage for heavy-duty Tier-0 workflows and EOS for disk-only storage for full-scale analysis activities.

Contact: Jan Iven
Abstract:

After the strategic decision in 2011 to separate tier-0 activity from analysis, CERN-IT developed EOS as a new petascale disk-only solution to address the fast-growing needs for high-performance low-latency data access. EOS currently holds around 22PB usable space for the four big experiment (ALICE, ATLAS, CMS, LHCb), and we expect to grow to >30PB this year.

Contact: Jakub Moscicki
Abstract:

Individual users at CERN are attracted by external file hosting services such as Dropbox. This trend may lead to what is know as the "Dropbox Problem": sensitive organization data stored on servers outside of corporate control, outside of established policies, outside of enforceable SLAs and in unknown geographical locations.

Contact: Andrea Sciaba
Abstract:

The Wordwide LHC Computing Grid project (WLCG) provides the computing and storage resources required by the LHC collaborations to store, process and analyse their data. It includes almost 200,000 CPU cores, 200 PB of disk storage and 200 PB of tape storage distributed among more than 150 sites. The WLCG operations team is responsible for several essential tasks, such as the coordination of testing and deployment of Grid middleware and services, communication with the experiments and the sites, followup and resolution of operational issues and medium/long term planning.

Contact: Tim Bell
Abstract:

CERN's Infrastructure as a Service cloud is being deployed in production across the two data centres in Geneva and Budapest. This talk will describe the experiences of the first six months of production, the different uses within the organisation and the outlook for expansion to over 15,000 hypervisors by 2015. The open source toolset, accounting and scheduling approaches and scalability challenges will be covered.

Contact: Dan Van Der Ster
Abstract:

AFS is a mature and reliable storage service at CERN, having worked for more than 20 years as the provider of Linux home directories and application areas. Recently, our AFS service has been growing at unprecedented rates, thanks to innovations in both the hardware and software components of our file servers.

Contact: Dan Van Der Ster
Abstract:

Emerging storage requirements, such as the need for block storage for both OpenStack VMs and file services like AFS and NFS, have motivated the development of a generic backend storage service for CERN IT. The goals for such a service include (a) vendor neutrality, (b) horizontal scalability with commodity hardware, (c) fault tolerance at the disk, host, and network levels, and (d) support for geo-replication. Ceph is an attractive option due to its native block device layer RBD which is built upon its scalable, reliable, and performant object storage system, RADOS.

Abstract:

The large potential and flexibility of the ServiceNow infrastructure based on "best practices" methods is allowing the migration of some of the ticketing systems traditionally used for the tracing of the servers and services available at the CERN IT Computer Center. This migration enables a standardization and globalization of the ticketing and control systems implementing a generic system extensible to other departments and users.

Abstract:

The network infrastructure at CERN has evolved with the increasing service and bandwidth demands of the scientific community. Analysing the massive amounts of data gathered by the experiments requires more computational power and faster networks to carry the data. The new Data Centre in Wigner and the adoption of 100Gbps in the core of the network are the latest answers to these demands. In this presentation, the network architecture at CERN and the technologies deployed to support a reliable, manageable and scalable infrastructure will be described.

Contact: Olof Barring
Abstract:

In May 2012 CERN signed a contract with the Wigner Data Centre in Budapest for an extension to the CERN’s central computing facility beyond its current boundaries set by electrical power and cooling available for computing. The centre is operated as a remote co-location site providing rack-space, electrical power and cooling for server, storage and networking equipment acquired by CERN. The contract includes a ‘remote-hands’ services for physical handling of hardware (rack mounting, cabling, pushing power buttons, …) and maintenance repairs (swapping disks, memory modules, …).

Abstract:

Administrating a large-scale, multi-protocol, hierarchical tape infrastructure like the one at CERN, which stores around 30PB / year, requires an adequate monitoring system for quick spotting of malfunctions, easier debugging and on demand report generation.
The main challenges for such system are: to cope with log format diversity and its information scattered among several log files, the need for long term information archival, the strict data consistency requirements and the group based GUI visualization.

Contact: Eric Cano
Abstract:

Disk access and tape migrations compete for network bandwidth in CASTOR’s disk servers, over various protocols: RFIO, Xroot, root and GridFTP. As there are a limited number of tape drives, it is important be keep them busy all the time, at their nominal speed. With potentially 100s of user read streams per server, the bandwidth for the tape migrations has to be guaranteed to a controlled level, and not the default fair share the system gives by default.

Contact: Michail Salichos
Abstract:

FTS is the service responsible for distributing the majority of LHC data across the WLCG infrastructure. From the experiences of the last decade supporting and monitoring FTS, reliability, robustness and

Abstract:

The volume of multimedia material produced by CERN is growing rapidly, fed by the increase of dissemination activities carried out by the various outreach teams, such as the central CERN Communication unit and the Experiments Outreach committees. In order for this multimedia content to be stored digitally for the long term, to be made available to end-users in the best possible conditions and finally to be easily re-usable in various contexts e.g.

Abstract:

Physics data stored in CERN tapes is quickly reaching the 100 PB milestone. Tape is an ever-changing technology that is still following Moore's law in terms of capacity. This means we can store every year more and more data in the same amount of tapes. However this doesn't come for free: the first obvious cost is the new higher capacity media. The second less known cost is related to moving the data from the old tapes to the new ones. This activity is what we call repack.

You are here