Job Description
- Develop and maintain a metadata driven generic ETL framework for automating ETL code
- Design, build, and optimize ETL/ELT pipelines using Databricks (PySpark/SQL) on AWS .
- Ingest data from a variety of structured and unstructured sources (APIs, RDBMS, flat files, streaming).
- Develop and maintain robust data pipelines for batch and streaming data using Delta Lake and Spark Structured Streaming.
- Implement data quality checks, validations, and logging mechanisms.
- Optimize pipeline performance, cost, and reliability.
- Collaborate with data analysts, BI, and business teams to deliver fit for purpose datasets.
- Support data modeling efforts (star, snowflake schemas) de norm tables approach and assist with data warehousing initiatives.
- Work with orchestration tools Databricks Workflows to schedule and monitor pipelines.
- Follow best practices for version control, CI/CD, and collaborative development
- Skills
Ready to Apply?
Take the next step in your AI career. Submit your application to Virtusa today.
Submit Application