Job Description

Job Title

Lead Data Engineer – Python, PySpark & SQL

Location

Canada

Job Type

Full time contract

Responsibilities

  • Build scalable data ingestion and transformation pipelines using Python, PySpark, and SQL.
  • Process raw CSV/text files from AWS S3, including validating headers, schema checks, and malformed file detection.
  • Convert raw data into structured DataFrames and implement reusable data quality checks.
  • Develop advanced transformations using SQL/PySpark (Window functions, LAG(), grouping logic, date gap detection, etc.).
  • Deploy and tune PySpark applications on AWS EMR, optimizing executor memory, cores, shuffle behavior, and cluster performance.
  • Work with AWS services such as S3, EMR, Glue, Lambda, IAM.
  • Debug performance issues (OOM errors, shuffle spill, GC problems) and improve pipeline reliability.
  • Lead design discussions, code reviews, and mentor junio...

Ready to Apply?

Take the next step in your AI career. Submit your application to Princeton IT Services, Inc today.

Submit Application