Job Description
Job Title
Lead Data Engineer – Python, PySpark & SQL
Location
Canada
Job Type
Full time contract
Responsibilities
- Build scalable data ingestion and transformation pipelines using Python, PySpark, and SQL.
- Process raw CSV/text files from AWS S3, including validating headers, schema checks, and malformed file detection.
- Convert raw data into structured DataFrames and implement reusable data quality checks.
- Develop advanced transformations using SQL/PySpark (Window functions, LAG(), grouping logic, date gap detection, etc.).
- Deploy and tune PySpark applications on AWS EMR, optimizing executor memory, cores, shuffle behavior, and cluster performance.
- Work with AWS services such as S3, EMR, Glue, Lambda, IAM.
- Debug performance issues (OOM errors, shuffle spill, GC problems) and improve pipeline reliability.
- Lead design discussions, code reviews, and mentor junio...
Ready to Apply?
Take the next step in your AI career. Submit your application to Princeton IT Services, Inc today.
Submit Application