Key Responsibilities:
  • Work closely with development, operations, and product teams to ensure monitoring solutions align with business goals.
  • Create and maintain scripts and automation tools to streamline monitoring and alerting processes
  • Produce and maintain clear documentation on monitoring setups, best practices, and troubleshooting procedures.
  • Train team members and stakeholders on effective use and management of various tools and features.
Incident Management:
  • Follow incident management process, ensuring timely resolution and minimizing service disruptions.
  • Conduct root cause analysis and implement preventive measures to reduce recurring incidents.
  • Develop and maintain incident response procedures and communication protocols.
Change Management:
  • Manage the change management process, ensuring controlled and efficient implementation of changes
  • Assess the impact of proposed changes and mitigate potential risks.
  • Ensure compliance with change management policies and procedures.
Metrics and Reporting:
  • Generate regular reports and dashboards to provide insights into service performance.
  • Use data-driven insights to identify trends and drive continuous improvement.
Transformation and Automation:
  • Identify opportunities for process automation and implement solutions to improve efficiency.
  • Evaluate and implement new monitoring tools
Key Requirements:
  • Proven experience as a Devops Engineer for 5+ years
  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • Minimum of 5 years of experience in monitoring.
  • Proven experience in incident management, change management, and problem management.
  • Strong understanding of ITIL frameworks and best practices
  • Implement and manage Datadog instrumentation for infrastructure, APM, synthetic monitoring, database monitoring, and RUM.
  • Create and manage dashboards, monitors, and log pipelines.
  • Collaborate with cross-functional teams to ensure comprehensive monitoring coverage.
  • Develop and maintain Terraform scripts for configuration.
  • Design and implement CI/CD pipelines for integrations.
  • Provide expertise in other monitoring tools and concepts.
  • Proficiency in creating dashboards, monitors, and log pipelines.
  • Familiarity with other monitoring tools and concepts.
  • Experience with automation tools and technologies.
  • Excellent analytical and problem-solving skills.
  • Strong communication and interpersonal skills.
  • Experience with cloud-based enterprise applications.
 
Must have Skills:
  • Excellent analytical and troubleshooting skills to diagnose and resolve complex issues.
  • Effective communication skills to collaborate with cross-functional teams and convey technical information clearly.
  •  Ability to thrive in a fast-paced environment, managing multiple tasks and projects simultaneously.
  • Previous experience in a similar role or relevant industry experience is highly preferred. Knowledge of cloud platforms like AWS, Azure, or Google Cloud