HPC/AI Systems Engineer – AI4S

October 8, 2024

Apply for this job

Job Description

The Barcelona Supercomputing Center – Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses MareNostrum, one of the most powerful supercomputers in Europe, was a founding and hosting member of the former European HPC infrastructure PRACE (Partnership for Advanced Computing in Europe), and is now hosting entity for EuroHPC JU, the Joint Undertaking that leads large-scale investments and HPC provision in Europe. The mission of BSC is to research, develop and manage information technologies in order to facilitate scientific progress. BSC combines HPC service provision and R&D into both computer and computational science (life, earth and engineering sciences) under one roof, and currently has over 1000 staff from 60 countries.

Look at the BSC experience:
BSC-CNS YouTube Channel
Let’s stay connected with BSC Folks!

We are particularly interested for this role in the strengths and lived experiences of women and underrepresented groups to help us avoid perpetuating biases and oversights in science and IT research. In instances of equal merit, the incorporation of the under-represented sex will be favoured.

We promote Equity, Diversity and Inclusion, fostering an environment where each and every one of us is appreciated for who we are, regardless of our differences.

If you consider that you do not meet all the requirements, we encourage you to continue applying for the job offer. We value diversity of experiences and skills, and you could bring unique perspectives to our team.

Context And Mission

We are looking for candidates with a technical background who will become part of the Operations Department of the Centre.

The funding for these actions/fellowships and contracts comes from the European Union Recovery and Resilience Facility – Next Generation, within the framework of the General Invitation by the public business entity Red.es to participate in the talent attraction and retention programs within Investment 4 of Component 19 of the Recovery, Transformation, and Resilience Plan.
For more information, please check: https://www.bsc.es/join-us/excellence-career-opportunities/ai4s

Key Duties

  • Installation, maintenance, update and resolution of issues related to IT services of the centre (mail, web, databases, servers, etc.)
  • Configuration and administration of the different storage subsystems and backup system.
  • Configuration and administration of the BSC HPC supercomputing resources.
  • Configuration and administration of BSC cloud platforms (OpenStack, OpenNebula and ovirt ).
  • Configuration and administration of BSC AI platforms.

Requirements

  • Education
    • Degree/Master’s degree in Computer Sciences or similar field.
  • Essential Knowledge and Professional Experience
    • Knowledge and experience in system administration of HPC Linux platforms (4 years minimum)
    • knowledge and experience in system administration of distributed file systems like GPFS (IBM Storage Scale) or lustre
    • Knowledge and experience in system administration of cloud platforms like openstack/opennebula
  • Additional Knowledge and Professional Experience
    • Experience with tools like Kubernetes, Docker Swarm, or Apache Mesos for container orchestration and resource management
    • Experience with GPU clusters, including tools like Nvidia Docker, CUDA, cuDNN, and managing NVIDIA GPUs in a clustered environment
    • Knowledge of AI/ML frameworks like TensorFlow, PyTornch, Nvidia Megatron. Understanding of its deployment and management in a cluster environment
    • Familiarity with docker and managing AI/ML containers with docker
    • Knowledge of object storage systems like Amazon S3, MinIO, or similar technologies
  • Competences
    • Initiative, responsibility and good organizational skills
    • Analytical problem-solving skills
    • Availability to travel and assist with project events/workshops

Conditions

  • The position will be located at BSC within the Operations Department
  • We offer a full-time contract (37.5h/week), a good working environment, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, restaurant tickets, private health insurance
  • Duration: 4 years
  • Holidays: 23 paid vacation days plus 24th and 31st of December per our collective agreement
  • Salary: 50.000,00€
  • Additional Expenses Grant: Each fellowship will be associated with a grant for additional expenses, such as IT equipment, travel, training, stays, etc.
  • Starting date: asap – the incorporation for this vacancy must be before the 16th of December 2024

Applications procedure and process

The selection will be carried out through a competitive examination system (“Concurso-Oposición”). The recruitment process consists of two phases:

  1. Curriculum Analysis: Evaluation of previous experience and/or scientific history, degree, training, and other professional information relevant to the position. – 40 points
  2. Interview phase: The highest-rated candidates at the curriculum level will be invited to the interview phase, conducted by the corresponding department and Human Resources. In this phase, technical competencies, knowledge, skills, and professional experience related to the position, as well as the required personal competencies, will be evaluated. – 60 points. A minimum of 30 points out of 60 must be obtained to be eligible for the position.

The recruitment panel will be composed of at least three people, ensuring at least 25% representation of women.

In accordance with OTM-R principles, a gender-balanced recruitment panel is formed for each vacancy at the beginning of the process. After reviewing the content of the applications, the panel will begin the interviews, with at least one technical and one administrative interview. At a minimum, a personality questionnaire as well as a technical exercise will be conducted during the process.

The panel will make a final decision, and all individuals who participated in the interview phase will receive feedback with details on the acceptance or rejection of their profile.