cloudtechtiq
โญ Featured

Site Reliability Engineer

๐Ÿ“ Jaipur
๐Ÿ’ผ Remote
๐Ÿ“… 2
๐Ÿท๏ธ SRE
๐Ÿ• Posted 20 hours ago
// Job Description
We are seeking a Site Reliability Engineer (SRE) to join our engineering team. The SRE will be responsible for ensuring the reliability, scalability, and performance of our systems and services. This role bridges the gap between software development and operations, combining software engineering expertise with systems administration skills to build resilient infrastructure.

Key Responsibilities
Design, build, and maintain scalable, reliable, and secure infrastructure.
Develop automation tools to improve deployment, monitoring, and incident response.
Collaborate with development teams to ensure services are designed with reliability and scalability in mind.
Monitor system performance, identify bottlenecks, and implement solutions.
Manage incident response, root cause analysis, and postmortems.
Implement observability practices (logging, metrics, tracing) to improve system visibility.
Drive continuous improvement in system availability and performance.
Participate in on-call rotations to support production systems.

Required Skills & Qualifications
Strong background in Linux/Unix system administration.
Proficiency in programming languages such as Python, Go, or Java.
Experience with cloud platforms (AWS, Azure, GCP).
Knowledge of containerization and orchestration (Docker, Kubernetes).
Familiarity with CI/CD pipelines and DevOps practices.
Expertise in monitoring tools (Prometheus, Grafana, ELK stack).
Strong problem-solving and debugging skills.
Excellent communication and collaboration abilities.
// Tech Stack Required
Kubernetes
Docker
Aws
Azure
Linux
jenkins
terraform