Job Description
Strong in Python/Scala scripting, minimum 3+ yrs,
Must have hands on experience implementing AWS Big data lake using EMR and Spark.
Working experience with Spark, Hive, Message Queue or Pub/Sub, Streaming technologies (3+ years)
Have 6+ years of experience developing data pipelines using mix of languages (Python, Scala, SQL etc.) and open source frameworks to implement data ingest, processing, and analytics technologies.
Experience leveraging open source big data processing frameworks, such as Apache Spark, Hadoop and streaming technologies such as Kafka.
Hands on experience with newer technologies relevant to the data space such as Spark, Airflow, Apache Druid, Snowflake (or any other OLAP databases).
Experience developing and deploying data pipelines and real-time data streams within a cloud native infrastructure preferably AWS
Experience in using CI/CD pipeline (Gitlab)
Experience in Code Quality implementation (Use...
Ready to Apply?
Take the next step in your AI career. Submit your application to Capgemini today.
Submit Application