Experience with big data processing frameworks (e.g., Apache Spark, Hadoop) We are looking for a Data Engineer to help create large-scale datasets that power the next generation of generative models. * Develop and maintain scalable infrastructure for large-scale image and video data acquisition * Manage and coordinate data transfers from various licensing partners * Implement and deploy state-of-the-art ML models for data cleaning, processing, and preparation * Implement scalable and efficient tools to visualize, cluster, and deeply understand the data * Optimize and parallelize data processing workflows to handle billion-scale datasets efficiently * Ensure data quality, diversity, and proper annotation (including captioning) for training readiness * Getting training data from alternative sources such as user preferences into trainable format
mehr