Job Description

Summary
We are looking to collect a real-world Data assets for following languages:
pt_BR
fr_FR
it_IT
de_DE
es_ES
zh_CN
ja_JP
ko_KR
ar_SA
hi_IN
For hi_IN, we require:
- 50% Hindi written in Latin script (e.g., aap kaise hain?)
- 50% Hindi written in Devanagari script (e.g., आप कैसे हैं ?)
The data needs to be from native speakers from that language
Please note that synthetic data will not be accepted.
We are looking to collect the following categories:
1. Mail
2. Message
3. Notes
4. Files
5. Voicemail / Audio (only transcript)
6. Webpages
7. Screenshot
8. Photos (Camera capture)
That contain the following data type:
1. Contact Information
2. Event Information
3. People/Relationships
4. Topics of Interest
For Topics of Interest, there are 18 topic groups:
1. Sports (ex: basketball, baseball, cricket, soccer, badminton, motorsport),
2. Movies_tv_shows (ex: Big Bang Theory, Inception, 2001 ...

Ready to Apply?

Take the next step in your AI career. Submit your application to Confidential today.

Submit Application