Senior Site Reliability Engineer (Kafka)
Detail posisi
Kewajiban
• Handle the automation setup, configuration, and management of Kafka clusters, including topic creation, broker configuration, security configuration, and monitoring.
• Design and implement SLOs for the Kafka cluster to ensure it meets performance, availability, and scalability targets.
• Responsible for providing consultancy to developers to write the efficient producer and consumer-based applications connected to Kafka cluster, as well as application of the Kafka cluster to different product requirements.
• Diagnose and troubleshoot problems related to Kafka infrastructure and provide support required to operate the infrastructure.
• Consistently optimize and improve the Kafka infrastructure, to ensure performance and reliability of the system.
• Implement monitoring tools, and alerts, and proactively prevent or reduce the system downtime.
• Stay up-to-date on the latest Kafka, technologies and best practices, and implement them within the Kafka infrastructure.
Kualifikasi
• Minimum 3+ years of experience in Kafka technology with solid hands-on experience.
• Expertise in administration and operation of Kafka cluster.
• Possess good knowledge on best practices around the Kafka ecosystem.
• Solid records of hands-on experience working with Kafka cluster and its clients in a high-volume, high-throughput environment.
• Experience in facilitating and helping developers to onboard applications programmed in different languages.
• Ability to define observability matrices that help identify the reliability and performance of the Kafka systems.
• Knowledge of agile development methodologies.
• Experience in CI/CD, DevOps tools such as KBs, Rancher, Private Cloud, and development tools such as GIT and Jira.