Pyspark developer

Tata Consultancy Services•Chennai, Tamil Nadu, India

fulltime

Information Technology

5+ years

Role- PySpark/Scala Developer

Functional Skills: Experience in Credit Risk/Regulatory risk domain

Technical Skills: Spark ,PySpark, Python, Hive, Scala, MapReduce, Unix shell scripting

Good to Have Skills: Exposure to Machine Learning Techniques

Job Description:

5+ Years of experience with Developing/Fine tuning and implementing programs/applications

Using Python/PySpark/Scala on Big Data/Hadoop Platform.

Roles and Responsibilities:

Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in

consumer and wholesale banking

Enhance Machine Learning Models using PySpark or Scala
Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all
the way to Production Environment
Participate Feature Engineering, Training Models, Scoring and retraining
Architect Data Pipeline and Automate Data Ingestion and Model Jobs

Skills and competencies:

Required:

Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance

Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in

Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.
Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business applies equal conveyance regarding business strategy and IT strategy, business processes and work flow
Flexibility in approach and thought process
Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED

Education Qualification: Master’s degree with a specialization in Statistics, Mathematics, Finance or Engineering Degree

Must-Have

5+ years of experience in data engineering, with strong focus on PySpark/python for big data processing.
Expertise in building data pipelines and ingestion frameworks from relational, semi-structured (JSON, XML), and unstructured sources (logs, PDFs).
Proficiency in Python with strong knowledge of data processing libraries.
Strong SQL skills for querying and validating data in platforms like Amazon Redshift, PostgreSQL, or similar.
Experience with distributed computing frameworks (e.g., Spark on EMR, Databricks).
Familiarity with workflow orchestration tools (e.g., AWS Step Functions, or similar).
Solid understanding of data lake / data warehouse architectures and data modeling basics.