Job Description
Key Responsibilities:
· Design, build and maintain scalable batch and (optionally) streaming data pipelines using Apache Spark 3.x, Scala, Python & SQL
· Implement core ETL/ELT logic in Scala and Python; author efficient Spark DataFrame/Dataset jobs.
· Write and optimize complex SQL for ingestion, transformation and consumption layers.
· Tune Spark jobs for performance and cost: partitioning, join strategies, broadcast, memory/tuning, shuffle reduction.
· Ensure code quality via unit tests, integration tests, CI/CD and code reviews.
· Work with data modeling, schema evolution and data quality checks to ensure reliable outputs.
· Collaborate with platform/DevOps teams to deploy and monitor pipelines (retries, logging, alerting).
· Troubleshoot production issues and perform root-cause analysis.
· Mentor and guide junior engineers, share best practices and drive improvements to the ...
Ready to Apply?
Take the next step in your AI career. Submit your application to Straive today.
Submit Application