Skip to main content

Cloud Platform/SRE Engineer

Contract Type: Contract to Hire

Posted Date: April 7, 2026

ECCO Select is a talent acquisition and consulting company specializing in people, process and technology solutions. We provide the talent behind the technology enabling our clients to achieve their goals. For more information about ECCO Select, visit us at www.eccoselect.com.

Position Title:     Cloud Platform / Site Reliability Engineer

Location Information  

Hybrid – Onsite 2 days per week
Dallas, TX
6+ Month Contract to Hire

Position Responsibilities:


As a Cloud Platform / Site Reliability Engineer, you will play a critical role in building, operating, and continuously improving a modern AWS-centric cloud platform with an emphasis on reliability, security, automation, and seamless operations. This is a hands-on position at the intersection of infrastructure engineering, DevOps, and SRE, with a forward-looking focus on integrating AI-powered tooling to drive intelligent automation and optimization.

Key responsibilities include:

  • Championing site reliability engineering (SRE) practices across the organization: define service level objectives (SLOs), establish and track service level indicators (SLIs), manage error budgets, and use reliability metrics to guide engineering priorities.
  • Building and maintaining comprehensive observability for production systems—metrics, logs, traces, and dashboards—to enable proactive monitoring, issue detection, and root cause analysis.
  • Authoring and maintaining operational runbooks and participating in on-call rotations; leading blameless post-incident reviews to drive measurable improvements in Mean Time to Resolution (MTTR) and Mean Time to Detection (MTTD).
  • Driving reliability through capacity planning, resilience and chaos testing, dependency mapping, and progressive rollout strategies.
  • Developing and managing infrastructure using Terraform-first approaches: reusable modules, state management, policy-as-code (such as OPA/Sentinel), and resource governance across multi-account AWS environments (Organizations/Control Tower).
  • Designing, implementing, and operating CI/CD pipelines using GitHub Actions; creating standardized workflows, templates, and paved-road experiences for development teams to increase velocity and reduce deployment risk.
  • Architecting and managing AWS core services: VPC networking, compute (EC2, ECS/Fargate, EKS, Lambda), storage (S3, EFS), and data services (RDS/Aurora, DynamoDB), as well as networking solutions (NAT, Transit Gateway, PrivateLink, Managed Firewall).
  • Standardizing container build/deploy patterns (blue/green, canary, rolling), autoscaling policies, and deployment models for serverless and container-based workloads.
  • Implementing security best practices: least-privilege IAM policies, KMS encryption, WAF rules, Secrets Manager usage, and integrating Security Hub, GuardDuty, and Inspector findings with automated remediation workflows.
  • Enforcing architectural guardrails and governance: Service Control Policies (SCPs), resource tagging standards, and compliance for audit/regulatory needs.
  • Evaluating and integrating AI-powered tooling—LLM-based assistants, coding agents, and retrieval-augmented workflows—to drive platform automation and enhance the developer experience.
  • Leveraging AWS Bedrock services and related AI/ML/automation offerings to build AI agent architectures for DevOps use cases, including incident triage and knowledge management.
  • Monitoring and evaluating AI workloads in production: establishing guardrails, measuring cost, latency, drift, and accuracy, and maintaining robust frameworks for AI integrations.
  • Applying FinOps principles: enforcing cost allocation, resource right-sizing, planning for Reserved Instances/Savings Plans, and configuring budget and anomaly alerts.
  • Maintaining accurate asset and configuration data in ServiceNow CMDB; participating in formal change, incident, and problem management processes.
  • Designing resilient hybrid network connectivity and DNS architectures to support private service access, cross-account service discovery, and failover scenarios.
  • Staying current on industry trends, particularly in the AI/ML and cloud-native operations space, and translating new capabilities into actionable platform improvements.

 

Essential Skills, Experience


  • 4–7 years of combined experience in cloud infrastructure, platform engineering, DevOps, or SRE roles, with an emphasis on AWS environments.
  • Expertise with Terraform, including module authoring, state management, workspaces, and implementation of policy-as-code solutions.
  • Hands-on experience with AWS infrastructure: EC2, ECS/Fargate, EKS, Lambda, RDS/Aurora, DynamoDB, S3, EFS, ALB/NLB, IAM, KMS, WAF, Secrets Manager, Organizations/Control Tower, VPC, Transit Gateway, Direct Connect/VPN.
  • Deep understanding of container orchestration and serverless computing on AWS; ability to architect, deploy, and maintain production-grade workloads.
  • Practical knowledge of networking: VPCs, subnets, routing, hybrid connectivity, load balancing, and DNS architectures (Route 53).
  • CI/CD proficiency with GitHub Actions or similar platforms; ability to create and maintain standardized and secure deployment pipelines.
  • Observable systems mindset: experience architecting and maintaining observability pipelines (CloudWatch, X-Ray, or equivalents); understanding of SLOs/SLIs and on-call/incident response.
  • Solid Linux administration and shell scripting (Bash and/or Python); proficiency with Docker image lifecycle management and registry operations.
  • Knowledge of security fundamentals: implementing least-privilege IAM, secrets management, encryption standards, tagging, and compliance practices.
  • Familiarity with FinOps: cloud cost optimization, budget alerting, and savings planning.
  • Understanding of AI/ML in production: hands-on experience with LLMs, prompt engineering, or AWS Bedrock preferred.
  • Experience integrating AI/ML operations, coding assistants, or retrieval-augmented generation (RAG) is a plus.
  • Familiarity with Azure cloud services is helpful during transitions but not a primary requirement.
  • Demonstrated ability to translate new technologies (especially AI/ML capabilities) into practical operational improvements.

Qualifications:

  • Bachelor’s degree preferred; High School Diploma or equivalent with strong relevant experience required.
  • AWS certifications (Solutions Architect Associate, SysOps Administrator, Developer Associate) preferred. AWS DevOps Professional or Security Specialty a strong plus.
  • Kubernetes certifications (CKA/CKAD) or Docker certifications are a plus.
  • AWS AI Practitioner or Machine Learning Specialty certification is considered a differentiator.
  • Azure certifications are optional but an asset during transition periods.

ECCO Select is committed to hiring and retaining a diverse workforce. Our policy is to provide equal opportunity to all people without regard to race, color, religion, national origin, ancestry, marital status, veteran status, age, disability, pregnancy, genetic information, citizenship status, sex, sexual orientation, gender identity or any other legally protected category. Veterans of our United States Uniformed Services are specifically encouraged to apply for ECCO Select opportunities.

Equal Employment Opportunity is The Law
This Organization Participates in E-Verify