Job Description
- Develop and implement efficient data pipelines using Apache Spark (PySpark preferred) to process and analyze large-scale data.
- Design, build, and optimize complex SQL queries to extract, transform, and load (ETL) data from multiple sources.
- Orchestrate data workflows using Apache Airflow, ensuring smooth execution and error-free pipelines.
- Design, implement, and maintain scalable and cost-effective data storage and processing solutions on AWS using S3, Glue, EMR, and Athena.
- Leverage AWS Lambda and Step Functions for serverless compute and task orchestration in data pipelines.
- Work with AWS databases like RDS and DynamoDB to ensure efficient data storage and retrieval.
- Monitor data processing and pipeline health using AWS CloudWatch and ensure smooth operation in ...
Ready to Apply?
Take the next step in your AI career. Submit your application to Virtusa today.
Submit Application