Good to Have Skills: Exposure to Machine Learning Techniques
Job Description:
5+ Years of experience with Developing/Fine tuning and implementing programs/applications
Using Python/PySpark/Scala on Big Data/Hadoop Platform.
Roles and Responsibilities:
Work with a Leading Bank’s Risk Management team on specific projects/requirements pertaining to risk Models in
consumer and wholesale banking
Enhance Machine Learning Models using PySpark or Scala
Work with Data Scientists to Build ML Models based on Business Requirements and Follow ML Cycle to Deploy them all
the way to Production Environment
Participate Feature Engineering, Training Models, Scoring and retraining
Architect Data Pipeline and Automate Data Ingestion and Model Jobs
Skills and competencies:
Required:
Strong analytical skills in conducting sophisticated statistical analysis using bureau/vendor data, customer performance
Data and macro-economic data to solve business problems.
Working experience in languages PySpark & Scala to develop code to validate and implement models and codes in
Credit Risk/Banking
Experience with distributed systems such as Hadoop/MapReduce, Spark, streaming data processing, cloud architecture.
Familiarity with machine learning frameworks and libraries (like scikit-learn, SparkML, tensorflow, pytorch etc.
Experience in systems integration, web services, batch processing
Experience in migrating codes to PySpark/Scala is big Plus
The ability to act as liaison conveying information needs of the business to IT and data constraints to the business applies equal conveyance regarding business strategy and IT strategy, business processes and work flow
Flexibility in approach and thought process
Attitude to learn and comprehend the periodical changes in the regulatory requirement as per FED
Education Qualification: Master’s degree with a specialization in Statistics, Mathematics, Finance or Engineering Degree
Must-Have
5+ years of experience in data engineering, with strong focus on PySpark/python for big data processing.
Expertise in building data pipelines and ingestion frameworks from relational, semi-structured (JSON, XML), and unstructured sources (logs, PDFs).
Proficiency in Python with strong knowledge of data processing libraries.
Strong SQL skills for querying and validating data in platforms like Amazon Redshift, PostgreSQL, or similar.
Experience with distributed computing frameworks (e.g., Spark on EMR, Databricks).
Familiarity with workflow orchestration tools (e.g., AWS Step Functions, or similar).
Solid understanding of data lake / data warehouse architectures and data modeling basics.