The Production Engineer (PE) at ThoughtSpot is a critical, hybrid role focused on bridging the gap between AI Agent Development and Reliability Engineering. This position requires a strong background in software development (specifically Java or Python), deep knowledge of system design, and hands-on expertise in production operations. The PE is responsible for driving operational excellence, accelerating issue resolution, and proactively improving system supportability through robust coding and automation.
What You'll Do
:
Engineering and SRE Liaison: Act as the primary technical expert providing product architecture and system design insights to SRE and ProdOps teams, and operational feedback (supportability, reliability) to Engineering teams.
System Design and Development: Apply deep system design knowledge to production environments, and use coding skills (Java/Python) to develop, implement, and maintain AI-powered tools to streamline ProdOps processes, reduce manual intervention, and improve system reliability.
Incident Resolution and Automation: Analyze recurring production issues from Spotter (AI Agent), identify patterns, and use Java or Python to build robust, scalable solutions (including AI/ML-based scripts) to proactively detect, triage, and resolve incidents.
Defect Triage and Ownership: Own initial triage of customer-found defects, using code analysis skills to gather necessary information before escalating, and often providing code-based workarounds or fixes directly.
Collaboration and Best Practices: Collaborate with Engineering and SRE teams to ensure production-ready solutions, maintain code quality standards, and adopt operational best practices across the organization.
Documentation and Knowledge Transfer: Maintain and enhance knowledge bases, SOPs, and documentation for recurring issues, workflows, and code-based automation tools.
Operational Readiness Review: Participate in exit reviews for product epics to ensure feature quality, system design coherence, and comprehensive operational readiness coverage, with a focus on automation.
What You Bring
:
Strong background in production operations, and large-scale distributed systems.
Mandatory proficiency in reading and writing code in Java and/or Python.
Demonstrated experience and knowledge in system design principles and practices.
Proven experience developing and deploying automation and AI/ML tools for ProdOps use cases (e.g., anomaly detection, automated triage, self-healing scripts).
Excellent communication, documentation, and cross-functional collaboration skills, particularly bridging development and operations.
Experience in support engineering, SRE, or a similar liaison role is preferred.
Familiarity with cloud architectures, analytics platforms, and modern AI/ML frameworks is a plus.
This position is ideal for software engineers with system design expertise who want to leverage their coding skills (Java/Python) to solve complex operational challenges, driving the next generation of production reliability and supportability at ThoughtSpot.
ATS Match is available
1) Upload your resume. 2) Open any job and click Check ATS Match to see your fit score.