Build and optimize data pipelines with automated testing, lineage tracking, and privacy-by-design principles
Design, implement, and maintain data warehouses using SQL, Python, and ELT processes
Set up monitoring, testing, backup, and recovery mechanisms to ensure data reliability and availability
Enforce governance, security standards, role-based access control (RBAC), and data masking protocols
Translate business needs into scalable and reusable data solutions
Standardize ingestion and transformation processes for machine learning workflows, including feature preparation and monitoring drift and quality
Develop and deploy scalable machine learning models and pipelines
Create reusable components and maintain detailed documentation for data systems
Promote reliability, maintainability, and scalability in data workflows while optimizing pipelines for AI training and evaluation
Advocate for best practices and reusable patterns across teams
Contribute to onboarding processes and build communities of practice within the organization
Resolve complex challenges, perform root cause analyses, and enhance engineering standards
Make informed decisions, escalate issues when necessary, and improve scalability and cost efficiency of data systems
Align and influence cross-functional teams with minimal supervision
Mentor peers and support the development of team capabilities

Requirements

At least 3 years of proven experience in Data Engineering
Expertise in SQL and ELT design, including core SQL, CDC patterns, and optimization techniques
Proficiency in Snowflake performance tuning, secure data sharing, role-based access, and data masking
Experience in developing Matillion jobs, shared components, and orchestration processes
Knowledge of CI/CD workflows using Bitbucket and Jenkins for building, testing, deploying, and managing environments and versions
Familiarity with data observability practices, including logging, alerts, and automated checks
Hands-on experience with feature engineering for machine learning, dataset validation, and management of data contracts and roles
Practical experience with EMR / Apache Spark for distributed data processing
Proficiency in using AWS for scalable data ingestion, processing, and preparation for machine learning pipelines
Ability to apply RAG principles for policy-aware grounding and design logical and physical data models
Skilled in maintaining documentation, APIs, and reusable patterns while supporting onboarding and best practices
Fluent English communication skills, written and spoken, at a B2+ level or higher