Data Engineer

Tata Consultancy Services•Austin, TX

fulltime

Information Technology

2+ years

Job Description Key Responsibilities:

Design, build, and maintain data pipelines across on-prem Hadoop and AWS
Develop and maintain Java applications, utilities, and data processing libraries
Manage and enhance internal Java libraries used for ingestion, validation, and transformation
Migrate and sync data from on-prem HDFS to AWS S3
Develop and maintain Airflow DAGs for orchestration and scheduling
Work with Kafka-based streaming pipelines for real-time/near-real-time ingestion
Build and optimize Spark / PySpark jobs for large-scale data processing
Use Hive, Presto/Trino, and Athena for querying and validation
Implement data quality checks, monitoring, and alerting
Support Iceberg tables and AWS external tables
Troubleshoot production issues and ensure SLA compliance
Collaborate with platform, analytics, and observability teams

Technical Skills Required Java (Development, maintenance, build tools like Gradle)

AWS (S3, Glue, EMR, Athena, EKS basics)

Hadoop/HDFS, Hive

Apache Kafka (producers/consumers, topics, streaming ingestion)

Apache Spark / PySpark (batch + streaming processing)

Apache Airflow (DAG development and maintenance)

Python

Git and CI/CD workflows

Observability tools (Prometheus/Grafana)

SQL

Location: Austin, TX/ Sunnyvale, CA

Salary Range:$70,000-$135,000 a year