Data Pipeline & Infrastructure Development: Build, maintain, and scale data pipelines (ETL or ELT) using tools like Apache Spark, Airflow, and Kafka to support AI and ML workloads.
AI Ready Data Preparation: Transform messy, unstructured data (text, images, video) into structured datasets suitable for model training, including handling feature engineering and vector database ingestion.
ML Model Product ionization: Partner with data scientists to deploy ML models, create APIs for models, and implement MLOPS practices, including monitoring for data drift.
Analytics and Visualization: Create dashboards (Tableau, Power BI, Looker) and run SQL queries to provide actionable business insights, acting as an analytics engineer.
Data Governance & Quality: Ensure data quality, reliability, and security (PII or PHI) within AI systems, ensuring compliance with regulations like GDPR or HIPAA.
Cloud and Data Management: Operate within cloud environments (AWS, Azure, Google Cloud) using services like S3, Redshift, Glue, or Databricks.
Key Skills And Qualifications
Programming Languages: Expert level Python and Advance SQL are mandatory. Java or Scala are preferred for large scale distributed systems.
ML Frameworks: Familiarity with libraries such as PyTorch, TensorFlow, or Scikit learn for data manipulation and model interaction.
Data Engineering Tools: Experience with Apache Spark, Kafka, Airflow, dbt, and Vector Databases (Pinecone, Milvus).
Cloud Platforms: Hands on experience with AWS (Glue, SageMaker) or GCP.
Analytical Skills: Strong ability to perform exploratory data analysis (EDA) and interpret complex datasets.
Soft Skills: Must have Strong communication to bridge technical data engineering with business stakeholders.
Salary Range : $70,000 - $125000 a Year
ATS Match is available
1) Upload your resume. 2) Open any job and click Check ATS Match to see your fit score.