Job Description
Job Title: Site Reliability Engineer (SRE)
Location: Abu Dhabi, UAE
Responsibilities:
- Utilize your expertise and take a leadership role in designing and implementing scalable, reliable, and efficient solutions.
- Manage and scale virtual machines (VMs) and Kubernetes clusters to ensure optimal performance and reliability.
- Provide deep technical expertise in the areas of databases, with a focus on Postgres, to optimize system performance and reliability.
- Work on the design and scalability of systems in both private and public cloud environments.
- Collaborate with cross-functional teams to enhance system observability and monitoring using tools like Prometheus, Grafana, OpenTelemetry, and other relevant technologies.
- Analyze system and platform metrics to proactively identify and address potential issues before they impact the user experience.
- Contribute to the continuous improvement of our systems, processes, and infrastructure.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Minimum of 5 years of experience as a Site Reliability Engineer (SRE).
- Solid experience with database management, particularly with Postgres.
- Proficiency in container orchestration platforms, especially Kubernetes.
- Strong background in scaling virtual machines and managing cloud environments (private and public).
- Hands-on experience with monitoring and observability tools such as Prometheus, Grafana, and OpenTelemetry.
- Familiarity with system-level metrics and performance analysis.
- Excellent problem-solving skills and the ability to troubleshoot complex issues in a production environment.
Preferred Skills:
- Previous experience with private and public cloud platforms (e.g., AWS, Azure, GCP).
- Knowledge of system-level metrics and instrumentation.
- Familiarity with containerization technologies (Docker, containers).
- Experience with log management and analysis tools.
- Strong scripting and automation skills (e.g., Python, Bash).
- Excellent communication and collaboration skills.