Europeos.es : Inicio > Trabajo en Luxemburgo > Desarrolladores y analistas de software y multimedios y analistas > Site Reliability Engineer (m/f)

en Español in English auf Deutsch en Français ...

Site Reliability Engineer (m/f). Luxemburgo

Ofertas de Desarrolladores y analistas de software y multimedios y analistas en Luxemburgo

Clasificación del trabajo: Profesionales científicos e intelectuales › Profesionales de tecnología de la información y las comunicaciones › Desarrolladores y analistas de software y multimedios › Desarrolladores y analistas de software y multimedios y analistas no clasificados bajo otros epígrafes.

Traducción de la profesión: Akademische Berufe › Akademische und vergleichbare Fachkräfte in der Informations- und Kommunikationstechnologie › Entwickler und Analytiker von Software und Anwendungen › Entwickler und Analytiker von Software und Anwendungen, anderweitig nicht genannt.

Descripción de la oferta de trabajo:

Your role – Are you ready for a challenge?

The Site Reliability Engineer is mission-driven: has experience with a set of engineering practices for running safe and reliable production systems - design for operability and security, and working with a breadth of tools and approaches to solve a broad spectrum of problems.

SRE has the responsibility of building, running and maintaining the platform on which LIA is relying. This means a strong involvement in incidents/problem resolution.

Core responsibilities:

    Incident response and resolution: when incidents occur, you are responsible for responding promptly, diagnosing the problem, debugging, and implementing appropriate solutions to minimize downtime and restore services;

    Collaboration with cross-functional teams: work closely with other teams, to understand their requirements, provide support, and ensure smooth operations of the whole platform;

    Documentation and knowledge sharing: maintaining accurate documentation of configurations, troubleshooting procedures, and best practices is crucial. You collaborate with colleagues to share insights and enhance the overall team knowledge;

    Incident post-mortems and continuous improvement: after resolving incidents, conduct post-mortem reviews to identify root causes, document findings, and suggest improvements to prevent similar incidents in the future. Actively participate in continuous improvement efforts to enhance system reliability and resilience;

    Security and compliance management: collaborate with security teams to ensure security controls and compliance requirements are met. Implement security measures, apply patches, and perform vulnerability assessments to protect against potential threats.

Network management:

    Capacity planning and optimization: analyse network traffic patterns, anticipate growth, and plan for future capacity requirements. This involves optimizing network configurations, adjusting bandwidth allocations, and scaling network resources;

    Physical hardware management: you are responsible for cabling, maintaining and monitoring the physical hardware in all Datacentres, offices and foreign offices;

    Network security management: you collaborate with security teams to ensure the network is adequately protected against potential threats. This includes implementing security measures, monitoring for anomalies and breaches, and conducting regular audits;

    Automation and tooling development: develop scripts, tools, and automation workflows to streamline network management tasks, such as device provisioning, configuration management, and monitoring.

Other duties

Infrastructure management:

    Infrastructure monitoring and alert management: monitor the infrastructure components, such as servers, virtual machines, containers, and cloud resources, to ensure their health and availability. Respond to alerts and take necessary actions to resolve issues promptly;

    Configuration management and automation: use configuration management systems or infrastructure-as-code to manage and automate the deployment and configuration of infrastructure resources. Maintain consistent configurations, track changes, and automate repetitive tasks;

    Capacity planning and resource optimization: analyse resource usage trends, forecast future demand, and plan accordingly. Optimize resource allocation, scale infrastructure, and recommend improvements to meet performance requirements;

    Disaster recovery: work on disaster recovery strategy and implement mechanisms to ensure data and service availability in case of disasters or failures. Conduct regular disaster recovery drills to validate recovery procedures and maintain readiness.

Database management:

    Database monitoring and performance optimization: monitor database performance metrics, identify bottlenecks or issues, and optimize configurations or queries to improve overall performance and response times;

    Backup and recovery management: you are responsible for implementing and maintaining database backup and recovery strategies to ensure data integrity and availability. They would regularly test and verify backup procedures to mitigate data loss risks;

    Database capacity planning and scalability: analyse database usage trends, predict growth patterns, and plan for future capacity requirements. This involves scaling the database infrastructure, optimizing resource allocation, and implementing sharing or partitioning strategies;

    Automation and tooling development: develop scripts, tools, or automation workflows to streamline database management tasks. This includes tasks such as database provisioning, schema management, data migrations, and monitoring.

Leading the development of a long-term technical strategy for our systems and infrastructure, with a focus on security and monitoring.

Making sure incidents are resolved efficiently, perform the root cause analysis and act to prevent other occurrences.

Your profile – Have you got what it takes to become our Site Reliability Engineer?

    Bachelor's or Master's degree in Computer Science, Engineering or a related subject;

    Proven work experience (5 years) in engineering, or a similar role with a focus on reliability and scalability;

    Working in a dynamic and fast-paced environment and capable of adapting to shifting and evolving business priorities;

    Quick learner, with strong troubleshooting, debugging and analytical skills and enjoying technical challenges;

    Autonomous and solution oriented. You are eager to innovate and try new things;

    Team worker and able to communicate effectively with peers and other departments;

    Highly organized and can adjust priorities, while having great attention to detail;

    Can compromise between a “perfect” and a “good-enough” delivery, while being flexible enough on different applications or process variations;

    Strong team player with good time-management skills and great interpersonal and communication skills;

    Demonstrate leadership, a sense of ownership and pride in your performance and its impact on the company's success;

    Hands-on experience deploying, managing, and debugging apps in Kubernetes (knowledge of Openshift specificities is a plus);

    You speak PromQL regularly;

    You are comfortable with NoSQL databases (MongoDB, Elasticsearch) concepts and operation;

    Working with distributed messaging platforms (JMS, Kafka) and know how to properly allocate resources for good performance and data replication;

    Experience working in an Agile/Kanban development process;

    You have certified knowledge in Windows / Linux operating system administration;

    Fluency in English. Any additional language, in particular French, is a key asset.

Additional Assets

    Proficiency in scripting languages such as PowerShell, Bash, Java and/or Python would be an asset.

    Experience in applying Lean principles is a plus

    Experience with scheduling tools (SMA OpCON) is a plus

    GitOPS (ArgoCD and related tooling)

    Network and application monitoring (Prometheus / PromQL, SNMP, Nagios)

    NoSQL databases (MongoDB, Elasticsearch)

    Kubernetes

    Server and Application Security

    Job scheduling (OPCon, RunDeck)

    Shell scripting (PowerShell, Bash)

    CI/CD

    Kafka / JMS backends

    Programming languages: Java, Python

País del trabajo: Luxemburgo.

Número de puestos: 1.

Nivel educativo: Grado o nivel equivalente.

Experiencia: 5 años.

Empleador: LOMBARD INTERNATIONAL ASSURANCE S.A..

Instrucciones para solicitar:

Vous êtes invité à introduire votre candidature par le site internet : https://lombardinternationaleu.bamboohr.com/careers/469

Oferta de trabajo obtenida del portal Eures, con fecha 24 de Marzo de 2024, y con identificador de la vacante:PES_LU_706084.

Ver las 58 ofertas de trabajo de Profesionales científicos e intelectuales › Profesionales de tecnología de la información y las comunicaciones › Desarrolladores y analistas de software y multimedios › Desarrolladores y analistas de software y multimedios y analistas no clasificados bajo otros epígrafes ofertadas.

Ofertas de trabajo similares: