Autonomous Vehicle Infrastructure Systems Lead, Manager - Managed AI
The Deloitte Connected and Autonomous Vehicle (CAV) team is catalyzing and shaping the Autonomous Vehicle (AV) market through a suite of turnkey, as-a-service solutions that deliver improved performance and lower total cost of ownership. These solutions will empower Automotive customers to realize their autonomy ambitions as efficiently as possible.
High Level Role
We are looking for a seasoned, "hands-on" HPC/AI infrastructure systems leader who will drive the scope, detailed design, and deployment of AV infrastructure across on-prem, cloud, and hybrid environments. The key success measure of this prototype will be the delivery of Deloitte's offering in POD configurations as a service for our customers with guaranteed SLAs and TCO targets.
Establish the detailed specification of the DGX A100 that reflects a representative customer's planning, deployment, and on-going operations optimization requirements on TCO, throughput, scalability, and flexibility with their varied workloads
Set up the DGX/Super POD reference environment including DGX A100 compute nodes, fabrics (storage/compute), management networks & software (DeepOps), key system software for optimizing GPU communications I/O and application performance, and user run-time tools for SLURM and Kubernetes containers
Design and document the most efficient setup to meet success metrics (TCO, performance, scale). Specific areas of focus:
Network switch & fabric considerations for non-blocking, scalable bandwidth needs for best performance with varying dataset sizes & locations
Storage and caching hierarchy implementations based on training vs inferencing workloads. Establish storage management guidelines for RAM/NVMe (internal storage) and external high speed storage (DDN, Netapp, etc.) allocation to optimize performance and cost of running varying data-sets and workloads. Establish rules for when to trigger GPU Direct Storage (GDS) feature for lower latency and faster I/O workloads.
Management Servers - infrastructure design & setup for enabling- user logins, provisioning (OS images & other internal infrastructure services for the pod), Work-load management (resource management and scheduling/orchestration), container mgmt., system monitors/logs
Operations/run-time optimization of A100 compute resources (MIG partitions) for varying workloads to maximize the utilization and throughput of jobs being scheduled in a given node cluster
Validate the commercial model with the MVP operational run/playbook
Bachelor's Degree equivalent experience in Computer Architecture, Computer Science, Electrical Engineering or related field. Advanced degree preferred
6+ years of proven experience in design, deployment, and operations of HPC production grade environments leveraging both SLURM and Kubernetes clusters
Deep understanding of scale out compute, networking, and external storage architectures for optimizing performance and acceleration of AI/HPC workloads
Proven experience deploying, upgrading, migrating, and driving user adoption of sophisticated enterprise scale systems.
Prior software, solutions development background and proven ability to demonstrate complex new technologies
Programming skills to build distributed storage and compute systems, backend services, microservices, and web technologies
Well versed in agile methodology
Comfortable with a customer focused, high paced environment
Ability to travel up to 50% on average, based on the work you do and the clients and industries/sectors you serve
Limited immigration sponsorship may be available
From developing a stand out resume to putting your best foot forward in the interview, we want you to feel prepared and confident as you explore opportunities at Deloitte. Check out recruiting tips from Deloitte recruiters.
At Deloitte, we know that great people make a great organization. We value our people and offer employees a broad range of benefits. Learn more about what working at Deloitte can mean for you.
Our people and culture
Our diverse, equitable, and inclusive culture empowers our people to be who they are, contribute their unique perspectives, and make a difference individually and collectively. It enables us to leverage different ideas and perspectives, and bring more creativity and innovation to help solve our client most complex challenges. This makes Deloitte one of the most rewarding places to work. Learn more about our inclusive culture.
From entry-level employees to senior leaders, we believe there's always room to learn. We offer opportunities to build new skills, take on leadership opportunities and connect and grow through mentorship. From on-the-job learning experiences to formal development programs, our professionals have a variety of opportunities to continue to grow throughout their career.
As used in this posting, "Deloitte" means Deloitte Consulting LLP, a subsidiary of Deloitte LLP. Please see for a detailed description of the legal structure of Deloitte LLP and its subsidiaries.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law.
Deloitte will consider for employment all qualified applicants, including those with criminal histories, in a manner consistent with the requirements of applicable state and local laws. See notices of various ban-the-box laws where available.
Requisition code: 107190