Job Description

  • Develop and maintain a metadata driven generic ETL framework for automating ETL code
  • Design, build, and optimize ETL/ELT pipelines using Databricks (PySpark/SQL) on AWS .
  • Ingest data from a variety of structured and unstructured sources (APIs, RDBMS, flat files, streaming).
  • Develop and maintain robust data pipelines for batch and streaming data using Delta Lake and Spark Structured Streaming.
  • Implement data quality checks, validations, and logging mechanisms.
  • Optimize pipeline performance, cost, and reliability.
  • Collaborate with data analysts, BI, and business teams to deliver fit for purpose datasets.
  • Support data modeling efforts (star, snowflake schemas) de norm tables approach and assist with data warehousing initiatives.
  • Work with orchestration tools Databricks Workflows to schedule and monitor pipelines.
  • Follow best practices for version control, CI/CD, and collaborative development
  • Skills

Ready to Apply?

Take the next step in your AI career. Submit your application to Virtusa today.

Submit Application