GCP Infrastructure engineer

Apply now »

Date: Oct 23, 2025

Location: Chennai, TN, IN

Company: NTT DATA Services

Req ID: 343726 

NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.

We are currently seeking a GCP Infrastructure engineer to join our team in Chennai, Tamil Nādu (IN-TN), India (IN).

Job Summary:

We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle — from design and provisioning to automation, monitoring, and optimization — while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.

 

Key Responsibilities:

Cloud Infrastructure & Platform Engineering

  • Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
  • Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
  • Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
  • Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
  • Ensure business continuity through backup, disaster recovery, and multi-region deployments.

Automation & Reliability

  • Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
  • Adopt GitOps practices (Flux) for infrastructure lifecycle management.
  • Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
  • Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.

Security, Governance & Compliance

  • Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
  • Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
  • Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.

Monitoring, Observability & Cost Optimization

  • Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
  • Define KPIs to monitor system health, performance, and adoption across AI workloads.
  • Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.

Collaboration & Enablement

  • Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
  • Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
  • Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.

 

Required Education

Bachelor’s or master’s degree in computer science, Software Engineering, or a related field.

 

Required Experience

  • 5+ years of experience in cloud infrastructure engineering, DevOps, or platform engineering.
  • Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
  • Strong hands-on expertise with Google Cloud Platform (GCP), especially Vertex AI.
  • Experience with IBM Watsonx for AI application deployment and management.
  • Proven skills in Docker, Kubernetes (GKE), and container orchestration at scale.
  • Proficiency in Python, Bash, or other relevant scripting languages.
  • Strong understanding of cloud networking, IAM, and security best practices.
  • Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
  • Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
  • Excellent problem-solving, debugging, and communication skills.

 

Preferred Experience

  • Experience in MLOps practices for model deployment, monitoring, and retraining.
  • Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
  • Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
  • Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
  • Contributions to open-source projects in infrastructure, MLOps, or GenAI.
  • Experience managing infrastructure in regulated industries.

 

Preferred Certifications:

  • Google Cloud Certified - Professional Cloud Architect
  • Google Cloud Certified - Machine Learning Engineer
  • Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
  • IBM Certified Watsonx Generative AI Engineer – Associate
  • IBM Certified Solution Architect - Cloud Pak for Data
  • Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.

 

About NTT DATA

NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com

Whenever possible, we hire locally to NTT DATA offices or client sites. This ensures we can provide timely and effective support tailored to each client’s needs. While many positions offer remote or hybrid work options, these arrangements are subject to change based on client requirements. For employees near an NTT DATA office or client site, in-office attendance may be required for meetings or events, depending on business needs. At NTT DATA, we are committed to staying flexible and meeting the evolving needs of both our clients and employees. NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com and @talent.nttdataservices.com email addresses. If you are requested to provide payment or disclose banking information, please submit a contact us form, https://us.nttdata.com/en/contact-us.

NTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-usThis contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you'd like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here.


Job Segment: Application Developer, Cloud, Open Source, Solution Architect, Developer, Technology

Apply now »