NVIDIA is an industry leader with groundbreaking developments in High-Performance Computing, Artificial Intelligence and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery and powers what were once science fiction inventions from artificial intelligence to autonomous cars.
NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong background and understanding of datacenter infrastructure as well as excellent communication and planning skills. We welcome out-of-the-box problem solvers who can provide new ideas while strong at executing tasks. Expect to be constantly challenged, improving and evolving for the better. You and other engineers in this team will help advance NVIDIAs capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications that affect core data science. If youre creative, passionate about what you do, autonomous and love having fun, then what are you waiting for, apply today! For two decades, we have pioneered visual computing, the art and science of computer graphics. With our invention of the GPU - the engine of modern visual computing - the field has expanded to encompass video games, movie production, product design, medical diagnosis and scientific research. Today, we stand at the beginning of the next era, the AI computing era, ignited by a new computing model, GPU deep learning.
What youll be doing:
This is a phenomenal opportunity to join the core group working on AI infrastructure for Autonomous Vehicles/Robotics.
We work on building and improving the Distributed computing infrastructure for creating large scale distributed model training
Designing and architecting datacenters and modular infrastructure targeted towards HPC and Deep Learning Applications from rack and stack to application bringup.
Planning and co-ordinating across multi-functional teams, partners and vendors for execution of infrastructure build-outs
Work with engineering teams across all of NVIDIA to ensure their requirements are correctly translated into infrastructure needs.
Utilize skills to automate complex tasks and improve efficiency
Your work will be deployed both internally within NVIDIA and externally for our customers.
What we need to see:
Solid technical foundation in distributed computing and storage, including substantial experience with all of the following: server systems, storage, I/O, networking, and system software
10+ years of system software engineering experience on large-scale production systems.
10+ years of architecting high performance computing infrastructure at scale.
Proven experience in high performance computing, Deep Learning, and/or GPU accelerated computing domains.
Expert level knowledge in high speed interconnects such as RoCE and InfiniBand
Ability to clearly and concisely communicate complex designs and requirements to peers, customers, and vendors.
General web networking knowledge (DNS, TCP/IP, HTTP, load balancing, firewalls)
You possess a BS/MS in Computer Science/Engineering/Physics/Mathematics, ar comparable Degree, or equivalent experience
Demonstrated experience with python/bash for Scripting and automation
Experience in configuration management tools like Ansible.
Understanding of performance, security and reliability in complex distributed infrastructure. Familiarity with system level architecture, such as interconnects, memory hierarchy, interrupts, and memory-mapped IO
Excellent data analysis skills and demonstrated ability solving complex issues involving multiple software or hardware components.
Highly motivated with strong collaborative and interpersonal skills, you have the ability to work successfully with multi-functional teams, principles and architects and coordinate effectively across organizational boundaries and geographies.
Ways to stand out from the crowd:
Large-scale Distributed System, HPC, DL Infrastructure experience
Familiarity with NVIDIAs server and software ecosystem
Deep knowledge of AWS, Azure or other CSPs