Overview Job ID: 7797
5 - 10 years
Position - 2
Technical Skills Must Have
Python or Java or Kotlin, or Scala and SQL
Job Summary This role is for staff augmen…
Job Description Skillset:
Application Reliability and Performance
Strong experience in application monitoring, troubleshooting, and performance tuning for distributed systems.
Familiarity with observability tools (e.g., Grafana, Prometheus, Splunk, Datadog, or equivalent).
Understanding of scalability principles and experience diagnosing latency, throughput, or memory issues in production.
Ability to perform root-cause analysis and implement durable fixes for recurring incidents.
Software Engineering and Debugging
Solid proficiency in at least one backend programming language used in the system (e.g., Python, Java, Kotlin, or Scala).
Experience reading and debugging code written by other teams to identify and resolve issues quickly.
Familiarity with API-based systems, microservices, and event-driven architectures.
Competence in using Git for version control and following structured change management processes.
Data and Pipeline Operations
Working knowledge of data pipelines and batch/stream processing tools (e.g., Apache Spark).
Understanding of data validation, logging, and error-handling practices in ML or analytics-driven applications.
Ability to support and monitor ETL/ELT jobs, ensuring data flows correctly into downstream forecasting or planning models.
Cloud and Deployment Infrastructure
Hands-on experience with cloud platforms (GCP, Azure, or AWS) for deploying and managing applications.
Familiarity with containerization (Docker, Kubernetes) and CI/CD tools (e.g., Jenkins, GitLab CI).
Basic understanding of infrastructure-as-code (e.g., Terraform or Cloud Deployment Manager) preferred.
Incident Response and Operational Support
Experience in production support or SRE-style operations, including ticket triage, escalation, and communication with end users.
Ability to work within Service Level Objectives (SLOs) and document issue resolution steps.
Comfortable collaborating across time zones and communicating clearly with both technical and non-technical users.
Communication and Collaboration
Strong written and verbal communication skills to coordinate effectively with Minneapolis engineering and data science teams.
Ability to translate user-reported issues into actionable engineering tasks.
Proactive and collaborative approach to working with cross-functional partners (merch planners, demand planning, inventory teams).
Responsibilities Job ID: 7797
5 - 10 years
Position - 2
Requirements Job ID: 7797
5 - 10 years
Position - 2
Job Description Skillset:
Application Reliability and Performance
Strong experience in application monitoring, troubleshooting, and performance tuning for distributed systems.
Familiarity with observability tools (e.g., Grafana, Prometheus, Splunk, Datadog, or equivalent).
Understanding of scalability principles and experience diagnosing latency, throughput, or memory issues in production.
Ability to perform root-cause analysis and implement durable fixes for recurring incidents.
Software Engineering and Debugging
Solid proficiency in at least one backend programming language used in the system (e.g., Python, Java, Kotlin, or Scala).
Experience reading and debugging code written by other teams to identify and resolve issues quickly.
Familiarity with API-based systems, microservices, and event-driven architectures.
Competence in using Git for version control and following structured change management processes.
Data and Pipeline Operations
Working knowledge of data pipelines and batch/stream processing tools (e.g., Apache Spark).
Understanding of data validation, logging, and error-handling practices in ML or analytics-driven applications.
Ability to support and monitor ETL/ELT jobs, ensuring data flows correctly into downstream forecasting or planning models.
Cloud and Deployment Infrastructure
Hands-on experience with cloud platforms (GCP, Azure, or AWS) for deploying and managing applications.
Familiarity with containerization (Docker, Kubernetes) and CI/CD tools (e.g., Jenkins, GitLab CI).
Basic understanding of infrastructure-as-code (e.g., Terraform or Cloud Deployment Manager) preferred.
Incident Response and Operational Support
Experience in production support or SRE-style operations, including ticket triage, escalation, and communication with end users.
Ability to work within Service Level Objectives (SLOs) and document issue resolution steps.
Comfortable collaborating across time zones and communicating clearly with both technical and non-technical users.
Communication and Collaboration
Strong written and verbal communication skills to coordinate effectively with Minneapolis engineering and data science teams.
Ability to translate user-reported issues into actionable engineering tasks.
Proactive and collaborative approach to working with cross-functional partners (merch planners, demand planning, inventory teams).
ATS Match is available
1) Upload your resume. 2) Open any job and click Check ATS Match to see your fit score.
Sign in to check your resume match