Implement robust, scalable processing pipelines for time-, resource-, and cost-based data leveraging distributed systems (e.g., using dbt, dlt, etc.) * Drive the integration of LLM-driven workflows to automate the transformation of unstructured data into structured knowledge * Define standards for data integrity, quality, and versioning across various systems and platforms * Bring extensive practical experience with ELT/ETL processes, data integration platforms, and related tools such as Apache Spark, Apache Flink, Python/Scala * 5+ years of experience building production-grade integrations-heavy data products (e.g., Spark, Flink, Dask, or Ray), preferably in a startup or fast-paced environment * Proficiency in programming languages (e.g. Python) for production-grade data systems and expertise in SQL and database modeling (e.g. PostgreSQL)
mehr