Provide L2/L3 production support for big data platforms, ensuring high availability and reliability of clusters and data pipelines.
Monitor big data jobs, workflows, and resource utilization; proactively identify and resolve performance bottlenecks and failures.
Perform incident management, including impact analysis, root-cause investigation, and timely resolution with clear communication to stakeholders.
Manage and support batch and streaming data ingestion processes, validating data quality and integrity across environments.
Collaborate with data engineering teams to deploy, validate, and stabilize new big data solutions and enhancements.
Create and maintain runbooks, standard operating procedures, and knowledge base articles for recurring issues and operational tasks.
Participate in on-call rotations, change management, and release activities for big data components and jobs.
Automate routine support activities and monitoring tasks to improve efficiency and reduce manual interventions.
Track and report key support metrics, providing recommendations for continuous improvement of platform reliability and support processes. Minimum Qualifications:
B.Tech degree in Computer Science, Information Technology, or a related engineering discipline.
3–5 years of hands-on experience in big data environments, focusing on platform or application support.
Strong understanding of big data concepts such as distributed processing, cluster management, and data pipelines.
Proven experience in monitoring, troubleshooting, and resolving issues in big data jobs and workflows in production environments.
Ability to analyze logs, identify root causes, and implement stable fixes or workarounds for recurring incidents.
Solid understanding of ITIL-aligned support practices, including incident, problem, and change management in a production setup. Good to have skills: Linux administration, Shell scripting, SQL, Mon
ATS Match is available
1) Upload your resume. 2) Open any job and click Check ATS Match to see your fit score.