Computing Engineer (IT-PW)

Type of notice
Internal Mobility

Note: this position is available for a period of three years

As a Computing Engineer in the Platforms and Workflows (PW) group, you will contribute to defining the infrastructure for the Next Generation Triggers (NGT) project and layout the future infrastructure for the CERN experiments.

In close collaboration with the CERN physics community, you will:

  • Lead the effort to design the new system, including specification of hardware resources and participation in tender processes;
  • Evaluate the performance of multiple reference use cases. Develop and run extensive benchmarks against different resource types and configurations, covering distributed training, hyper-parameter optimization and inference;
  • Lead the effort for research and development of a shared platform for machine learning (MLOps) and GPU accelerated workloads, serving the different CERN teams involved. Iterate with end users on different prototype solutions and engage with industry leaders to ensure long term sustainability of our choices;
  • Supervise younger team members and coordinate tasks in the Next Generation Triggers project in the area of computing infrastructure and platforms;
  • Research, develop and deploy multiple prototypes for a scalable platform serving machine learning and other accelerated workloads. Report on aspects of performance, total cost of ownership and sustainability;  
  • Contribute to the efficient use of GPU and other accelerator technologies in both the project and the department, including on-premises and external resources (public cloud and HPC);
  • Ensure appropriate collaboration with vendors, research and industry partners, looking for opportunities for further optimization of our systems and platforms in a fast-moving environment.

We are looking for someone with the following demonstrated experience/skills:

  • Knowledge of operating systems
  • Knowledge of system configuration tools
  • Architecture and design of ICT systems
  • Identification and selection of relevant emerging ICT technologies
  • Knowledge and application of software life-cycle tools and procedures
  • Implementation and support of platforms and services for Machine Learning (ML) 
  • Knowledge of containers and container orchestration systems, in particular Kubernetes and other tools in the cloud native ecosystem
  • Familiarity and previous experience with DevOps practices

Additional experience/skills in the following areas would be an asset:

  • Experience in operating and optimising large scale infrastructures;
  • Previous experience deploying and managing infrastructure and services in public cloud providers.

Contact: Ricardo Rocha

Expiry date
Last modified
26 Apr 2024