Senior Site Reliability Engineer (SRE) - Cloud & Distributed Systems Job at Dutech Systems, inc, Austin, TX

alc4NHo5eDMrM3M2eVdqYlkrcUcrMjFUMEE9PQ==
  • Dutech Systems, inc
  • Austin, TX

Job Description

Skills:

SRE, DevOps, AWS, GCP, Kubernetes, Docker, Python, Go, Linux, Distributed Systems, Monitoring, Logging, SLIs, SLOs, CI/CD, Observability

We are seeking an experienced Senior Site Reliability Engineer (SRE) to design, build, and operate highly scalable and reliable cloud-based systems. The ideal candidate will have a strong background in DevOps, distributed systems, and cloud infrastructure , with a focus on automation, observability, and system reliability .

This role involves working in a fast-paced environment to ensure system uptime, performance, and operational excellence.

Key Responsibilities:

  • Design, implement, and manage highly available, distributed systems
  • Maintain and optimize cloud infrastructure (AWS/GCP)
  • Develop automation scripts using Python, Go, Java, or Bash
  • Manage containerized environments using Docker and Kubernetes
  • Define and monitor SLIs, SLOs, and error budgets
  • Implement monitoring, logging, and alerting solutions
  • Lead incident management , root cause analysis (RCA), and postmortems
  • Ensure system security and compliance within operational workflows
  • Improve system reliability through performance tuning and optimization
  • Collaborate with engineering teams to enhance deployment and release processes
  • Create and maintain runbooks, dashboards, and operational documentation

Required Qualifications:

  • 8+ years of experience in SRE, DevOps, or Systems Engineering
  • Strong expertise in Linux/Unix systems and system internals
  • Proficiency in at least one programming/scripting language ( Python, Go, Java, Bash )
  • Experience designing and operating distributed systems
  • Hands-on experience with cloud platforms (AWS or GCP)
  • Experience with Docker and Kubernetes
  • Strong understanding of monitoring, alerting, and logging concepts
  • Experience managing SLIs, SLOs, and error budgets
  • Experience with incident management and RCA processes

Preferred Qualifications:

  • Experience with observability tools (Prometheus, Grafana, Datadog, Splunk, Application Insights)
  • Experience supporting 24x7 production environments and on-call rotations
  • Knowledge of chaos engineering and resiliency testing
  • Experience with canary deployments, feature flags, and progressive delivery
  • Strong documentation and communication skills

Job Tags

Contract work

Similar Jobs

Mandarin Staffing

Full-Time Afternoon Nanny Job at Mandarin Staffing

 ...punctual, organized, and proactive Strong commitment to long-term, stable employment This is an excellent opportunity for a professional nanny seeking a stable, well-compensated afternoon role with a warm, travel-oriented family in a prime Manhattan location.... 

SUNSHINE ENTERPRISE USA LLC

Environmental Analyst Job at SUNSHINE ENTERPRISE USA LLC

 ...Environmental Analyst Company Overview: Our client is a leading provider of environmental solutions dedicated to preserving and enhancing the quality of water resources in Orange County. Position Summary: This position performs complex physical and chemical... 

NewYork-Presbyterian Hospital

Staff Nurse - RN - Ambulatory Preop-PACU - Per Diem - Day Flex Job at NewYork-Presbyterian Hospital

 ...Nurses Help Us Stay Amazing! Perioperative nurses, transform your lives and careers at NewYork-Presbyterian Hospital. Care for families. Pioneer advances. Lead recoveries, and be the voice for patients within our close-knit clinical teams. Were pioneering new... 

NRG

Environmental Specialist Job at NRG

 ...Evaluate test reports, analytical data, and procedures for compliance with applicable regulatory requirements, test methods, and NRG environmental standards. Respond to environmental events at NRG facilities, as required. Coordinate and/or conduct quality assurance... 

Aequor

MCS Senior Associate Quality Assurance Job at Aequor

100% FULLY ONSITE AT USTO Must be flexible as it is possible worker will start on any shift and rotate every 6 months. DAY: 6 AM - 5 PM SWING: 1 PM to 12 AM NIGHT: 7 PM - 8 AM Will change based on the manufacturing team they are supporting(Upstream/Downstream) and rotate...