Data Engineer (Production Support) for AWS EMR
Apply now »Date: Apr 3, 2026
Location: Shangai, SH, CN
Company: NTT DATA Services
Job Description: Data Engineering Support (Production Support) for Spark, Scala, Glue, AWS Data Lake environment and Talend or any ETL tool Experience
Position Overview
Expertise in Data Engineering (AWS, Spark, Scala and Glue). Knowledge of Talend and or any ETL tool will be additional advantage.
The ideal candidate will ensure smooth operation, performance, and stability of large-scale distributed data processing jobs and applications deployed in AWS environment.
Some of the endpoints will be in Alibaba cloud. Hence knowledge of Ali will be desirable.
This role requires a mix of strong technical expertise, problem-solving skills, and operational excellence.
Key Responsibilities:
- Monitor data integration (data lake), troubleshoot, and resolve issues in real-time.
- Investigate and debug data processing failures and performance bottlenecks.
- Maintain and support ETL/ELT pipelines built on tools such as Spark, Scala, Hive and Glue.
- Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift, MySQL or Snowflake.
- Perform root cause analysis, identify and analyze data discrepancies if any.
- Implement and monitor automated workflows using AWS tools.
- Analyze and optimize job performance by tuning Spark/Hive configurations and improving query efficiency.
- Identify and address inefficiencies in data storage and access patterns.
- Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance.
- Develop alerting mechanisms and dashboards for proactive issue identification.
- Provide daily/weekly monitoring reports on ‘job status’ and alert on any long running/resource consuming issues
- Collaborate with business users and development team(s).
- Maintain comprehensive documentation (troubleshooting guides, operational workflows, and best practices).
Required Skills and Qualifications
- Hands-on experience with Spark, Scala, Hive.
- Experience on Kafka, NiFi, various Amazon Web Service (AWS) tools,.
- Familiarity with data loading tools like Talend.
- Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
- Knowledge of workflow/schedulers like Oozie.
- Strong knowledge of Shell Scripting, python or Java for scripting and automation.
- Familiarity with SQL and query optimization techniques.
- Experience in production support & operations management. Standard Operating Procedure (SOP) using flow diagrams, source to target mapping, system architecture diagram and use cases
- Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios.
Desirable skills:
- Knowledge of data governance, security, and compliance in cloud environments.
- Certifications in AWS (e.g., AWS Certified Big Data Specialty or AWS Certified Solutions Architect).
Education and Experience
- 5 to 15 years total IT experience.
- Bachelor’s degree in computer science, Engineering, or a related field.
Job Description: Data Engineering Support (Production Support) for Spark, Scala, Glue, AWS Data Lake environment and Talend or any ETL tool Experience
Position Overview
Expertise in Data Engineering (AWS, Spark, Scala and Glue). Knowledge of Talend and or any ETL tool will be additional advantage.
The ideal candidate will ensure smooth operation, performance, and stability of large-scale distributed data processing jobs and applications deployed in AWS environment.
Some of the endpoints will be in Alibaba cloud. Hence knowledge of Ali will be desirable.
This role requires a mix of strong technical expertise, problem-solving skills, and operational excellence.
Key Responsibilities:
- Monitor data integration (data lake), troubleshoot, and resolve issues in real-time.
- Investigate and debug data processing failures and performance bottlenecks.
- Maintain and support ETL/ELT pipelines built on tools such as Spark, Scala, Hive and Glue.
- Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift, MySQL or Snowflake.
- Perform root cause analysis, identify and analyze data discrepancies if any.
- Implement and monitor automated workflows using AWS tools.
- Analyze and optimize job performance by tuning Spark/Hive configurations and improving query efficiency.
- Identify and address inefficiencies in data storage and access patterns.
- Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance.
- Develop alerting mechanisms and dashboards for proactive issue identification.
- Provide daily/weekly monitoring reports on ‘job status’ and alert on any long running/resource consuming issues
- Collaborate with business users and development team(s).
- Maintain comprehensive documentation (troubleshooting guides, operational workflows, and best practices).
Required Skills and Qualifications
- Hands-on experience with Spark, Scala, Hive.
- Experience on Kafka, NiFi, various Amazon Web Service (AWS) tools,.
- Familiarity with data loading tools like Talend.
- Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
- Knowledge of workflow/schedulers like Oozie.
- Strong knowledge of Shell Scripting, python or Java for scripting and automation.
- Familiarity with SQL and query optimization techniques.
- Experience in production support & operations management. Standard Operating Procedure (SOP) using flow diagrams, source to target mapping, system architecture diagram and use cases
- Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios.
Desirable skills:
- Knowledge of data governance, security, and compliance in cloud environments.
- Certifications in AWS (e.g., AWS Certified Big Data Specialty or AWS Certified Solutions Architect).
Education and Experience
- 5 to 15 years total IT experience.
- Bachelor’s degree in computer science, Engineering, or a related field.
Job Description: Data Engineering Support (Production Support) for Spark, Scala, Glue, AWS Data Lake environment and Talend or any ETL tool Experience
Position Overview
Expertise in Data Engineering (AWS, Spark, Scala and Glue). Knowledge of Talend and or any ETL tool will be additional advantage.
The ideal candidate will ensure smooth operation, performance, and stability of large-scale distributed data processing jobs and applications deployed in AWS environment.
Some of the endpoints will be in Alibaba cloud. Hence knowledge of Ali will be desirable.
This role requires a mix of strong technical expertise, problem-solving skills, and operational excellence.
Key Responsibilities:
- Monitor data integration (data lake), troubleshoot, and resolve issues in real-time.
- Investigate and debug data processing failures and performance bottlenecks.
- Maintain and support ETL/ELT pipelines built on tools such as Spark, Scala, Hive and Glue.
- Ensure data quality, consistency, and availability across pipelines and storage systems like S3, Redshift, MySQL or Snowflake.
- Perform root cause analysis, identify and analyze data discrepancies if any.
- Implement and monitor automated workflows using AWS tools.
- Analyze and optimize job performance by tuning Spark/Hive configurations and improving query efficiency.
- Identify and address inefficiencies in data storage and access patterns.
- Set up and manage monitoring tools (e.g., AWS CloudWatch, Datadog, or Prometheus) to track system health and performance.
- Develop alerting mechanisms and dashboards for proactive issue identification.
- Provide daily/weekly monitoring reports on ‘job status’ and alert on any long running/resource consuming issues
- Collaborate with business users and development team(s).
- Maintain comprehensive documentation (troubleshooting guides, operational workflows, and best practices).
Required Skills and Qualifications
- Hands-on experience with Spark, Scala, Hive.
- Experience on Kafka, NiFi, various Amazon Web Service (AWS) tools,.
- Familiarity with data loading tools like Talend.
- Familiarity with cloud database like AWS Redshift, Aurora MySQL and PostgreSQL
- Knowledge of workflow/schedulers like Oozie.
- Strong knowledge of Shell Scripting, python or Java for scripting and automation.
- Familiarity with SQL and query optimization techniques.
- Experience in production support & operations management. Standard Operating Procedure (SOP) using flow diagrams, source to target mapping, system architecture diagram and use cases
- Ability to analyze logs, diagnose issues, and implement fixes in high-pressure scenarios.
Desirable skills:
- Knowledge of data governance, security, and compliance in cloud environments.
- Certifications in AWS (e.g., AWS Certified Big Data Specialty or AWS Certified Solutions Architect).
Education and Experience
- 5 to 15 years total IT experience.
- Bachelor’s degree in computer science, Engineering, or a related field.
Job Segment:
Database, Solution Architect, Computer Science, Developer, SQL, Technology