Role Overview
We are seeking a Senior/Lead Platform Engineer who will take ownership of the design, implementation and operation of our core data, analytics and ML infrastructure. This role spans across platform architecture, DevSecOps, DataOps, and ML infrastructure, and requires a combination of strategic thought leadership and hands-on execution. You will build, integrate and operate platforms on AWS and Databricks, enabling scalable, secure, production-grade ML/AI solutions.
Key Responsibilities
- Architect and implement end-to-end data and ML platforms: data lakes, warehouses, streaming and batch pipelines, model training/deployment infrastructure, on AWS + Databricks.
- Lead DevSecOps and DataOps practices: infrastructure as code (IaC), CI/CD pipelines for data & ML workflows, secure multi-account/multi-region cloud operations.
- Integrate AWS services (e.g., S3, Redshift, Kinesis, Lambda, EKS/ECS) with Databricks runtime, Delta Lake, Unity Catalog etc to build scalable, performant pipelines.
- Build and operate ML infrastructure: training clusters, model versioning, MLOps toolchain (e.g., MLflow), model monitoring and observability, automatic retraining workflows.
- Establish data governance, lineage, quality, observability standards across data pipelines and ML workflows.
- Mentor engineering teams, define architectural best practices and guide implementation of high-scale data/ML systems.
- Optimize system performance, cost and scalability; diagnose and resolve large-scale production issues.
- Continuously evaluate new tools and technologies in the areas of cloud, data platform, DevSecOps, ML infrastructure and apply them to drive innovation.
- 7+ years of experience in data platform architecture, cloud/ML infrastructure engineering or related roles.
- Deep technical expertise in Databricks and AWS: demonstrated ability to design, integrate and operate solutions spanning both platforms.
- Strong hands-on implementation skills: you will not just design but build, deploy and operate the platform.
- Proven track record of building and operating scalable ML/AI platforms in production (model training & deployment).
- Expertise in Apache Spark, Delta Lake, modern data pipeline frameworks (batch + streaming).
- Strong background in infrastructure as code (Terraform, CloudFormation), CI/CD for data/ML, and DevSecOps practices.
- Proficiency in Python and SQL; familiarity with Scala or equivalent is a plus.
- Experience with data governance, data lineage, observability and MLOps frameworks (e.g., MLflow, Airflow, dbt).
- Bonus: Experience in fintech, regulated industries or high-security environments.
- Performance bonus up to 2 months
- 13th month salary pro-rata
- 15-day annual leave+ 3-day sick leave + 1 birthday leave + 1 Christmas leave
- Meal and parking allowance are covered by the company.
- Full benefits and salary rank during probation.
- Insurances as Vietnamese labor law and premium health care for you and your family without seniority compulsory
- SMART goals and clear career opportunities (technical seminar, conference, and career talk) - we focus on your development.
- Values-driven, international working environment, and agile culture.
- Overseas travel opportunities for training and working related.
- Internal Hackathons and company's events (team building, coffee run, blue card...)
- Work-life balance 40-hr per week from Mon to Fri.