Key Responsibilities:
- Work closely with development, operations, and product teams to ensure monitoring solutions align with business goals.
- Create and maintain scripts and automation tools to streamline monitoring and alerting processes
- Produce and maintain clear documentation on monitoring setups, best practices, and troubleshooting procedures.
- Train team members and stakeholders on effective use and management of various tools and features.
Incident Management:
- Follow incident management process, ensuring timely resolution and minimizing service disruptions.
- Conduct root cause analysis and implement preventive measures to reduce recurring incidents.
- Develop and maintain incident response procedures and communication protocols.
Change Management:
- Manage the change management process, ensuring controlled and efficient implementation of changes
- Assess the impact of proposed changes and mitigate potential risks.
- Ensure compliance with change management policies and procedures.
Metrics and Reporting:
- Generate regular reports and dashboards to provide insights into service performance.
- Use data-driven insights to identify trends and drive continuous improvement.
Transformation and Automation:
- Identify opportunities for process automation and implement solutions to improve efficiency.
- Evaluate and implement new monitoring tools
Key Requirements:
- Proven experience as a Devops Engineer for 5+ years
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- Minimum of 5 years of experience in monitoring.
- Proven experience in incident management, change management, and problem management.
- Strong understanding of ITIL frameworks and best practices
- Implement and manage Datadog instrumentation for infrastructure, APM, synthetic monitoring, database monitoring, and RUM.
- Create and manage dashboards, monitors, and log pipelines.
- Collaborate with cross-functional teams to ensure comprehensive monitoring coverage.
- Develop and maintain Terraform scripts for configuration.
- Design and implement CI/CD pipelines for integrations.
- Provide expertise in other monitoring tools and concepts.
- Proficiency in creating dashboards, monitors, and log pipelines.
- Familiarity with other monitoring tools and concepts.
- Experience with automation tools and technologies.
- Excellent analytical and problem-solving skills.
- Strong communication and interpersonal skills.
- Experience with cloud-based enterprise applications.
Must have Skills:
- Excellent analytical and troubleshooting skills to diagnose and resolve complex issues.
- Effective communication skills to collaborate with cross-functional teams and convey technical information clearly.
- Ability to thrive in a fast-paced environment, managing multiple tasks and projects simultaneously.
- Previous experience in a similar role or relevant industry experience is highly preferred. Knowledge of cloud platforms like AWS, Azure, or Google Cloud