1 929 Reliability Engineer jobs in Vietnam

Reliability Engineer

Hanoi, Hanoi Microsoft Corporation

Posted 21 days ago

Job Viewed

Tap Again To Close

Job Description

Microsoft is a world leader in the design of hardware devices and entertainment devices. We are currently looking for a creative and talented individual with a passion for technology to drive reliability and qualification of hardware products to advance Hardware's leadership position in exceeding our consumers' durability expectations.
This key position in our Quality **Reliability Engineering** organization, based in Vietnam.
The ideal candidate will have a solid reliability and simulations background with process/manufacturing background in consumer electronics industry (electromechanical) and effective in supplier quality management with in-depth knowledge on reliability testing methodology and reliability analysis.
To qualify for this exciting opportunity, this candidate must possess effective communication, organizational, technical and documentation skills. You must function well in a fast-paced collaborative environment and be able to apply critical thinking and strong problem solving skills to complex production environment scenarios to ensure high availability.
Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
**Responsibilities**
+ Candidate will be responsible for monitoring product performance in the field and will work closely with manufacturing partners and component vendors to perform failure analysis and drive corrective actions.
+ Provide reliability guidance to Contract manufacturers and suppliers for release to manufacture phase and lab qualifications.
+ Develop Suppliers to setup On-Going-Reliability test to monitoring mass production.
+ Work with China and Redmond Reliability teams to develop and to document reliability qualification plans for new products.
+ Managing multiple design qualification activities and development schedule to improve the quality of products.
+ Evaluate and Drive effectiveness of the reliability stresses or resolve reliability issues related to products.
+ Proactively drive root cause investigation of reliability failures and work with cross-functional teams for issues closures.
+ Participate in component vendor selection activity and drive component qualification activity for components that are critical and strategic to Microsoft product requirements.
+ Understanding of the technology, materials and failure mechanisms associated with major electronic and electro-mechanical components/materials.
+ Use knowledge of process capability for electronic component production as well as system-level performance requirements to establish Critical-to-Quality performance metrics.
+ 0-25% overseas travelling opportunity as needed.
**Qualifications**
**Required Qualifications:**
+ Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 2+ years technical engineering experience OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 3+ years technical engineering experience OR 7+ years technical engineering experience.
+ Solid Experience in working with suppliers in setting up Reliability labs and run qualification plans during development and sustaining phase.
+ Familiar with all the various Environmental, Mechanical Reliability test methodologies in ASTM /IEEE Industry Standards and understand basics of.
+ Solid experience in hardware verification, PCBA and Box Build Assemblies process controls and quality controls.
+ Effective English communication skills, verbal and writing.
**Preferred Qualifications:**
+ Statistical analysis skill, familiar with tools as Minitab or Weibull.
+ Understand the PoF with good basic failure analysis knowledge.
+ DFMEA experience.
+ Effective communication and collaboration skills to work with people from a variety of technical backgrounds.
#W+DJOBS
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations ( .
This advertiser has chosen not to accept applicants from your region.

Reliability Engineer , eero

Hanoi, Hanoi Amazon

Posted 7 days ago

Job Viewed

Tap Again To Close

Job Description

Description
The Role:
A Reliability Engineer who's passionate and takes great pride in launching high quality and reliable products into the consumer market. The position will collaborate with cross-functional team members to establish product design and performance validation test methodologies and performance specifications to ensure that product is ready for production. The ideal candidate will be responsible for system reliability testing, packaging reliability testing, accessories reliability testing, reliability calculations, statistical analysis, performance tests and field analysis of eero products from prototypes to mass production. You will partner with the Packaging Engineering, Accessories team, Product Management, Development Engineering, Material Sourcing, Manufacturing Engineering, Strategic Product Development, Manufacturing Partners and Component Suppliers to achieve key product quality, cost, and reliability goals. Specifically, this person will work with eero cross-functional engineers on new and sustaining product reliability tests creations, assessments and acceptance criteria, identify critical field issues and actively implement corrective and preventative actions with partnered CMs, JDMs and/or ODMs based in Asia.
What you'll do:
● Perform system reliability testing, packaging reliability testing, accessory's reliability testing, review testing reports and highlight the reliability results to the cross-functional team (Product Design, Hardware team, Packaging team, Design & Development, Product, Operations).
● Develop system, packaging and accessories reliability plans with goals and quantifiable results: ISTA, MTBF level, ALT, 85C/85RH etc. tests.
● Perform DFR (Design for Reliability Reviews) and DFMEA (Design Failure Mode Effects Analysis) reviews by partnering with Engineering and Manufacturers to achieve key reliability goals (i.e. design margin analysis, preferred parts, suppliers, component/system, alternative components or technologies).
● Execute reliability qualification plans by driving external labs and leveraging internal resources.
● Support system level products (routers) reliability testing, DOEs, and studying/developing new test cases.
● Verify suppliers' reliability calculations and tests at component level in partnership with Component, Supplier Quality and Supply Chain engineers.
● Write Engineering Verification Test plans, execute plans, and create test reports.
● Define a set of production reliability tests and methodologies (packaging, accessories and products), such as ORT, ESS, FMEA, DFX, etc. in order to ensure field reliable parts and products.
● Apply metrics for monitoring the field reliability performance and dynamically act on the findings with corrective actions, using best-in-class methodologies such as 8D, fish bone, 6 Sigma, DMAIC, FMEA, and SPC.
● Identify field trends and set up alerts using applications such as Weibull+ or JMP to perform sound analysis and predictions based on field data.
● Report on findings at core team meetings to reach consensus on the actions to be applied.
● Report critical issues and findings to executive leadership for directions and/or escalations.
● Analyze failures from field, production and qualification tests providing improvement suggestions, based on the failure mechanisms and root causes, to Develop Engineering, Manufacturers and Customer Support.
● Create a culture of continuous improvement at eero and inspire best practices by writing guidelines, providing feedback, solutions, applying innovative metrics and measurements, planning DOEs, and benchmarking the state of the art in comparable industries, technologies and companies.
Basic Qualifications
● Technical Degree (BSEE, BSME, BSCS, Physics, Industrial Engineering, other)
● 8+ years of combined experience in Packaging, Accessories and Product Reliability Engineering and Testing for New Product Introductions and Sustaining.
● 5~10 years of combined experience in consumer electronics manufacturing; experience with Sensors, RF and/or Wi-Fi based products will be a plus
● Experience with industry standards (ISTA, IEC, UL, ASTM, ANSI, TUV, ISO, IPC, MIL, etc.)
● Demonstrated excellent leadership, communication, interpersonal skills.
● Results driven, team player, proven ability to influence design teams and cross-company teams.
● Must have the ability to thrive in a fast-paced, team-oriented environment.
● Familiarity with documentation required for manufacturing assembly & test of RF systems, particularly BOMs, Schematics, Block Diagrams, Release Notes, System Requirement Documents, Assembly Instructions, MFG Test Instructions.
● Ability to work independently on testing & diagnosis of system failures down to the board level and component level product, including test, debug and repair. Work with Design Engineers to resolve issues via reliability test results.
● Strong analytical, technical, problem-solving skills.
● Strong verbal & written communication skills, excellent interpersonal skills, ability to work in a variety of locations (office, external labs, customer sites, contract manufacturers)
Preferred Qualifications
● Familiarity with various operating systems (Mac, Windows, Linux) both GUI and Command Line
● Familiarity with various programming languages and tools (LabVIEW, C/C++/C#, Excel VBA, Python, Scripting, HTML/XML, scripts, batch files, Weibull+, JMP) and test equipment such as stain gauge, environmental chambers, impact and vibration testers, power supplies, salt spray and UV testers, etc.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

100000 An Cu, An Giang WhatJobs

Posted 2 days ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is looking for a highly experienced Senior Site Reliability Engineer (SRE) to join their growing team based in **Hanoi, Hanoi, VN**. This critical role focuses on ensuring the availability, performance, scalability, and security of our client's production systems and services. You will be responsible for designing, building, and operating large-scale, distributed systems, automating infrastructure management, and implementing robust monitoring and alerting solutions. The ideal candidate will have a strong background in systems engineering, software development, and a deep understanding of cloud computing platforms and DevOps practices. You will work closely with development teams to foster a culture of reliability and ownership throughout the software lifecycle. Responsibilities include defining SLOs/SLIs, managing incident response, conducting post-mortems, and driving initiatives to reduce toil and improve system resilience. This position requires hands-on expertise with infrastructure as code (IaC) tools, containerization technologies, and CI/CD pipelines. Collaboration, communication, and a proactive approach to problem-solving are essential. You will be instrumental in maintaining the high availability and performance standards that our users expect. This role offers a significant opportunity to impact the core infrastructure of a dynamic technology company. Responsibilities:
  • Design, implement, and maintain highly available and scalable production systems.
  • Develop and manage infrastructure automation using tools like Terraform, Ansible, or Chef.
  • Implement and manage container orchestration platforms (e.g., Kubernetes, Docker Swarm).
  • Set up and maintain robust monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK stack).
  • Lead incident response efforts, troubleshoot complex issues, and conduct thorough post-mortems.
  • Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Automate operational tasks and reduce manual intervention (toil reduction).
  • Collaborate with development teams to ensure the reliability and performance of new features and services.
  • Participate in on-call rotation to provide 24/7 support for critical systems.
  • Contribute to capacity planning and performance tuning.
  • Ensure security best practices are implemented across the infrastructure.
  • Document system architecture, operational procedures, and incident reports.
Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field; Master's degree is a plus.
  • 5+ years of experience in Site Reliability Engineering, DevOps, or Systems Engineering.
  • Proven experience with cloud platforms such as AWS, Azure, or GCP.
  • Expertise in scripting languages (e.g., Python, Go, Bash).
  • Strong understanding of networking concepts (TCP/IP, HTTP, DNS, load balancing).
  • Experience with CI/CD tools and practices (e.g., Jenkins, GitLab CI).
  • Familiarity with containerization technologies (Docker, Kubernetes).
  • Excellent troubleshooting, problem-solving, and analytical skills.
  • Strong communication and collaboration skills, with the ability to explain technical concepts clearly.
  • Experience with databases (SQL and NoSQL) and their administration.
  • On-call experience and ability to work under pressure.
This on-site role in Hanoi is crucial for maintaining our robust technological infrastructure.
This advertiser has chosen not to accept applicants from your region.

Remote Lead Site Reliability Engineer

30000 Haiphong , Haiphong WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking an experienced and highly skilled Lead Site Reliability Engineer to join their dynamic and innovative team. This is a fully remote position, allowing you to work from anywhere within our operational framework. You will play a critical role in ensuring the availability, performance, scalability, and security of our client's cutting-edge digital platforms and infrastructure. As a Lead SRE, you will be responsible for designing, implementing, and maintaining robust systems, automating operational tasks, and developing strategies to prevent downtime and resolve complex technical issues. This role involves deep collaboration with development, QA, and operations teams to foster a culture of shared responsibility for reliability. You will mentor junior engineers, contribute to architectural decisions, and champion SRE best practices. Your expertise in cloud technologies, containerization, and infrastructure-as-code will be essential. We are looking for a proactive problem-solver who thrives in a challenging, fast-paced environment and is passionate about building resilient systems. This remote-first role emphasizes asynchronous communication and effective collaboration across distributed teams.
Responsibilities:
  • Design, build, and maintain scalable and reliable production systems.
  • Develop and implement automation strategies for deployment, monitoring, and incident response.
  • Identify and address performance bottlenecks and proactively mitigate risks.
  • Lead troubleshooting efforts and conduct post-mortems for incidents.
  • Collaborate with software engineers to ensure reliability is designed into new features.
  • Develop and maintain system monitoring, alerting, and logging infrastructure.
  • Manage CI/CD pipelines and optimize deployment processes.
  • Mentor and guide junior SRE team members.
  • Contribute to architectural discussions and technology selection.
  • Ensure system security and compliance with industry standards.
Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • 5+ years of experience in Site Reliability Engineering, DevOps, or System Administration.
  • Expertise in cloud platforms such as AWS, Azure, or GCP.
  • Proficiency in at least one scripting language (e.g., Python, Go, Bash).
  • Experience with containerization technologies like Docker and Kubernetes.
  • Strong understanding of networking concepts (TCP/IP, DNS, HTTP).
  • Experience with infrastructure-as-code tools (e.g., Terraform, Ansible).
  • Proven ability to diagnose and resolve complex system issues.
  • Excellent communication and collaboration skills for remote teamwork.
  • Experience with monitoring tools (e.g., Prometheus, Grafana, ELK stack).
This is an exceptional opportunity to shape the future of our client's infrastructure.
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer (Remote)

500000 Hoa Sơn WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a highly experienced Senior Site Reliability Engineer (SRE) to join their innovative technology team on a fully remote basis. In this critical role, you will be instrumental in ensuring the reliability, scalability, and performance of our production systems and infrastructure. You will design, build, and maintain robust systems, automate operational tasks, and implement best practices in site reliability engineering. The ideal candidate will have a deep understanding of distributed systems, cloud computing platforms (e.g., AWS, GCP, Azure), and extensive experience with infrastructure as code (IaC) tools, CI/CD pipelines, and monitoring solutions. Your responsibilities will include developing and implementing strategies to improve system availability, latency, and efficiency; proactively identifying and resolving performance bottlenecks; and leading incident response efforts to minimize downtime. This remote-first position requires a proactive, analytical, and collaborative mindset. You will be expected to work autonomously, manage complex technical challenges, and mentor junior engineers. We are looking for an individual with a strong coding background (e.g., Python, Go, Java) and a passion for automation and operational excellence. You will contribute to the design of resilient architectures, participate in capacity planning, and drive improvements in our observability stack. Your expertise will be crucial in maintaining the stability and performance of our critical services. The ability to effectively communicate technical concepts and solutions to diverse audiences is essential. This is an exceptional opportunity to work with cutting-edge technologies, solve challenging problems, and shape the future of our platform in a flexible, remote work environment.

Responsibilities:
  • Design, implement, and manage highly available and scalable systems.
  • Develop and maintain infrastructure automation tools and scripts.
  • Build and manage CI/CD pipelines for efficient software deployment.
  • Implement and optimize monitoring, alerting, and logging systems.
  • Lead incident response and conduct post-mortems to prevent future issues.
  • Collaborate with development teams to ensure system reliability and performance.
  • Conduct capacity planning and performance tuning.
  • Automate operational tasks and reduce manual toil.
  • Contribute to the design and architecture of new systems and features.
  • Mentor junior SREs and share best practices.
Qualifications:
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
  • Strong experience with cloud platforms such as AWS, Azure, or GCP.
  • Proficiency in scripting and programming languages like Python, Go, or Java.
  • Experience with containerization technologies (Docker, Kubernetes).
  • Expertise in infrastructure as code (IaC) tools (Terraform, Ansible).
  • Knowledge of monitoring tools (Prometheus, Grafana, Datadog).
  • Strong understanding of networking, operating systems, and distributed systems.
  • Excellent problem-solving, analytical, and debugging skills.
  • Ability to work effectively in a remote team and manage complex projects.
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer (SRE)

25000 Thai Binh , Thai Binh WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a highly experienced Senior Site Reliability Engineer (SRE) to ensure the performance, scalability, and reliability of their critical infrastructure and applications. This is a fully remote position, enabling you to contribute to our robust systems from any location. The ideal candidate will have a deep understanding of distributed systems, cloud computing, automation, and operational excellence. You will be responsible for designing, building, and maintaining highly available and fault-tolerant systems, as well as proactively identifying and resolving potential issues.

Key Responsibilities:
  • Design, implement, and manage scalable and reliable cloud-based infrastructure (e.g., AWS, Azure, GCP).
  • Develop and maintain automation tools and scripts for deployment, monitoring, and incident management.
  • Implement and enforce best practices for system monitoring, alerting, and logging.
  • Participate in on-call rotation to respond to and resolve production incidents.
  • Conduct root cause analysis for production issues and implement preventative measures.
  • Collaborate with development teams to improve application reliability and performance throughout the software development lifecycle.
  • Manage and optimize CI/CD pipelines for efficient and safe software deployments.
  • Develop and maintain infrastructure as code (IaC) using tools like Terraform or Ansible.
  • Contribute to capacity planning and performance tuning of systems.
  • Document system architecture, operational procedures, and incident post-mortems.
  • Stay current with emerging technologies and industry best practices in SRE and cloud computing.
  • Mentor junior engineers and promote a culture of reliability and operational excellence.
The successful candidate will possess strong troubleshooting and problem-solving skills, with a proactive approach to anticipating and preventing system failures. Excellent communication and collaboration abilities are essential for working effectively with distributed teams. A deep understanding of system architecture, networking, and security principles is required.

Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • Minimum of 6 years of experience in system administration, DevOps, or Site Reliability Engineering.
  • Proficiency with cloud platforms (AWS, Azure, or GCP) and containerization technologies (Docker, Kubernetes).
  • Strong scripting skills (e.g., Python, Bash, Go).
  • Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging systems (e.g., ELK stack).
  • Familiarity with CI/CD tools and practices (e.g., Jenkins, GitLab CI).
  • Solid understanding of networking concepts (TCP/IP, DNS, HTTP).
  • Experience with configuration management tools (e.g., Ansible, Chef, Puppet).
  • Ability to work independently and manage priorities in a remote, fast-paced environment.
Join our client's cutting-edge engineering team and contribute to building and maintaining world-class, reliable systems. This remote role offers a challenging and rewarding opportunity for passionate SRE professionals.
This advertiser has chosen not to accept applicants from your region.

Remote Senior Site Reliability Engineer

200000 Phuong Son WhatJobs

Posted 2 days ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking an experienced and highly motivated Senior Site Reliability Engineer to join their distributed, fully remote team. This role is critical for ensuring the availability, performance, scalability, and security of our client's production systems and infrastructure. You will be responsible for designing, implementing, and automating solutions that enhance system reliability, operational efficiency, and disaster recovery capabilities. Working in a remote-first environment, you'll collaborate with development and operations teams to proactively identify and address potential issues before they impact users. The ideal candidate will have a deep understanding of system administration, networking, cloud computing (preferably AWS or GCP), and infrastructure-as-code principles. You will play a key role in defining and implementing SRE best practices, including monitoring, alerting, capacity planning, and incident response. This is a challenging opportunity to contribute to a high-growth technology company, work with modern tools and technologies, and make a significant impact on the stability and performance of our services. We encourage candidates who are passionate about automation, system resilience, and continuous improvement to apply. Your expertise in scripting languages (Python, Bash), containerization (Docker, Kubernetes), and CI/CD pipelines will be essential for success.

Key Responsibilities:
  • Design, build, and maintain reliable, scalable, and high-performance infrastructure.
  • Develop and implement automation for operational tasks, deployments, and incident response.
  • Monitor system health, performance, and availability, and establish effective alerting mechanisms.
  • Participate in on-call rotations and manage production incidents.
  • Conduct root cause analysis for production issues and implement preventative measures.
  • Manage cloud infrastructure resources and optimize for cost and performance.
  • Collaborate with software engineering teams to improve the reliability and deployability of applications.
  • Develop and maintain infrastructure-as-code using tools like Terraform or Ansible.
  • Perform capacity planning and performance tuning.
  • Contribute to disaster recovery planning and testing.
Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • 5+ years of experience in Site Reliability Engineering, DevOps, or Systems Engineering.
  • Proven experience with cloud platforms such as AWS, GCP, or Azure.
  • Strong proficiency in at least one scripting language (e.g., Python, Go, Bash).
  • Hands-on experience with containerization technologies like Docker and orchestration tools like Kubernetes.
  • Solid understanding of networking concepts (TCP/IP, DNS, HTTP, load balancing).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Familiarity with infrastructure-as-code tools (e.g., Terraform, Ansible, Chef, Puppet).
  • Experience in building and managing CI/CD pipelines.
  • Excellent problem-solving skills and the ability to work under pressure.
  • Strong communication and collaboration skills, especially in a remote environment.
This position is primarily associated with **Thai Nguyen, Thai Nguyen, VN**, but is a fully remote role, allowing you to work from anywhere.
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Reliability engineer Jobs in Vietnam !

Senior Site Reliability Engineer (Kubernetes

Ho Chi Minh City VNG

Posted today

Job Viewed

Tap Again To Close

Job Description

As SRE, you will:

- Build, plan and support ZaloPay Infrastructure, give support around physical and cloud resources, operate ZaloPay services, do asset inventory, make sure system runs smoothly with minimum downtime across environment;
- Roll and maintain Disaster Recovery System;
- Support developer engineers to deploy changes to production. Manage, operate, update system soft-wares and dependencies. Deep dive into core technology stack to understand, tune, scale, monitor and automate deployment;
- Detect and response to system incidents.

**Yêu cầu**:

- Proficient about Operation System/Software Architecture, monitoring, Network protocols, familiarity with microservices architecture and container orchestration with Kubernetes;
- Have ability to provide emergency response by being on-call or by reacting to system alert/monitoring and escalation to next level when needed;
- Have ability to set up, monitor, operate and troubleshoot software along with documentation and providing runbook;
- Execute the solution provide by higher level SRE member to reach specific goals agreed within the team;
- Understand performance bottle neck and where to tune and have method to evaluate the change;
- Know how the monitor key performance metrics to capture performance bottleneck, know what happen to current systems, propose and execute solution and also have method to capture the change to quickly show how the solution fixed the issue and/or improve performance;
- Have general knowledge in all and deep knowledge in one of: Nginx, HaProxy, LVS, Envoy, Istio in term of usage, optimization, and administer in microservices environment;
- Have general knowledge of jenkins pipeline scripting to extend CICD pipeline;
- Have to be fluent in bash scripting, especially in using sed, awk and utilise PCRE to optimize regular expression usage.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer Lead (Linux)

700000 Ho Chi Minh, Ho Chi Minh Aperia Solutions Vietnam Co Ltd

Posted 7 days ago

Job Viewed

Tap Again To Close

Job Description

full-time

Our Site Reliability Engineer Lead (Linux) responsibilities will include but are not limited to.

  • Lead team of experienced Site Reliability Engineers (Linux), in both Vietnam and US
  • li>Design, deploy/install, configure, automate, and maintain systems infrastructure and applications.
  • Investigate and troubleshoot system/application behaving and take the needed action to fix it
  • Being the reference for UNIX-like OS, Docker, Kubernetes & handling escalation internal/external and On-call
  • Being a technical expert for the team leader regarding activities and projects
  • Understand customer demands and propose the best solutions.
  • Automated provisioning, configuration management in cloud environments or on-premises
  • CI/CD of applications
  • Automate repetitive tasks and maintain scripts for the same.
  • Work closely with Project Manager to collect information and understand customers’ needs. Then help them to adopt the right solution.
  • < i>Drive root cause analysis and implement permanent fixes across the landscape.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer Lead (Linux)

700000 Ho Chi Minh, Ho Chi Minh Aperia Solutions Vietnam Co Ltd

Posted 15 days ago

Job Viewed

Tap Again To Close

Job Description

full-time

Our Site Reliability Engineer Lead (Linux) responsibilities will include but are not limited to.

  • Lead team of experienced Site Reliability Engineers (Linux), in both Vietnam and US
  • li>Design, deploy/install, configure, automate, and maintain systems infrastructure and applications.
  • Investigate and troubleshoot system/application behaving and take the needed action to fix it
  • Being the reference for UNIX-like OS, Docker, Kubernetes & handling escalation internal/external and On-call
  • Being a technical expert for the team leader regarding activities and projects
  • Understand customer demands and propose the best solutions.
  • Automated provisioning, configuration management in cloud environments or on-premises
  • CI/CD of applications
  • Automate repetitive tasks and maintain scripts for the same.
  • Work closely with Project Manager to collect information and understand customers’ needs. Then help them to adopt the right solution.
  • < i>Drive root cause analysis and implement permanent fixes across the landscape.

This advertiser has chosen not to accept applicants from your region.

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Reliability Engineer Jobs