Job Description

Position Summary

We are seeking a specialized Data Engineer or Data Scientist to manage the complete lifecycle of the training data that powers our AI models. This role is pivotal in curating, sanitizing, and structuring high-quality speech and text datasets, serving as the foundation for training state-of-the-art Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Machine Translation (MT) systems

Role and Responsibilities

Data Pipeline Architecture
Design, build, and maintain robust pipelines for the ingestion, processing, and management of heterogeneous data sources, ensuring efficient flow from raw collection to model-ready inputs.

Unstructured Data Extraction
Extract and process high-fidelity speech data from complex, unstructured sources, including video feeds, multi-channel audio recordings, and raw text archives.

Corpus Curation & Management
Organize, structure, and analyze complex linguistic data...

Ready to Apply?

Take the next step in your AI career. Submit your application to SAMSUNG today.

Submit Application