Sr. DevOps, Performance & Monitoring Engineer
About this position
Responsibilities
• Designing, Implementing, and maintaining monitoring solution.
• Developing and maintaining automation and monitoring scripts using languages like Python, Bash, or PowerShell.
• Implementing monitoring and logging solution using Grafana, Prometheus, and Splunk, DynaTrace.
• Ensuring security best practices are integrated into the monitoring solution.
• Responding to and resolving incidents and outages in a timely manner.
• Writing specifications and documentation related to monitoring solution.
• Continuously assessing and improving monitoring practices, tools, and processes.
Requirements
• Fluent in English, both written and verbal.
• Minimum 3 years experience as APM Engineer with DevOps background.
• Having 3 years of experience with monitoring and logging tools (Prometheus, Grafana, Splunk, DynaTrace).
• Having at least 3 years of enterprise application monitoring solution.
• Full understanding of/and experience in implementing DevSecOps and GitOps.
• Candidate should be a team player and a team leader at the same time.
• APM Tools Proficiency: Expertise in using APM tools like New Relic, Dynatrace, AppDynamics, or Datadog.
• Programming and Scripting: Proficiency in programming languages such as Java, Python, C#, or JavaScript.
• Performance Testing: Experience with performance testing tools and methodologies.
• System Architecture: Strong understanding of system architecture, including how different components interact in complex distributed systems.
• Networking Fundamentals: Good grasp of networking concepts and protocols.
• Database Management: Familiarity with databases and SQL, as well as NoSQL databases.
• Operating Systems: Solid knowledge of various operating systems, particularly Linux and Windows.
• Monitoring and Logging: Experience with monitoring solutions and logging tools such as Elasticsearch, Logstash, Kibana (ELK), Grafana, or Splunk.
• Incident Management and Troubleshooting: Strong skills in diagnosing and resolving incidents.