BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20250822T115806Z
LOCATION:Room 5.0A52
DTSTART;TZID=Europe/Stockholm:20250618T103000
DTEND;TZID=Europe/Stockholm:20250618T110000
UID:submissions.pasc-conference.org_PASC25_sess104_msa160@linklings.com
SUMMARY:Resource-Efficient AI System Design
DESCRIPTION:Ana Klimovic (ETH Zurich)\n\nToday’s large-scale AI model trai
 ning and serving jobs require many hardware accelerators to run, making th
 ese jobs extremely costly and power-hungry. Yet despite requiring many GPU
 s to run, AI jobs often underutilize individual GPUs for a variety of reas
 ons, including data preprocessing stalls, communication stalls, low batchi
 ng opportunities, and imbalanced memory and compute usage of individual op
 erators within a job. This inefficient use of hardware accelerators furthe
 r increases costs. In this talk, we will discuss why optimizing hardware a
 ccelerator (e.g., GPU) utilization is key to improving the cost and energy
  efficiency of AI workloads and how we can achieve this. I will present se
 veral computer systems that we are building as part of the Swiss AI initia
 tive to optimize GPU cluster configurations and job parallelization strate
 gies for distributed AI training jobs and efficiently share GPUs while max
 imizing performance.\n\nDomain: Climate, Weather, and Earth Sciences, Phys
 ics, Computational Methods and Applied Mathematics\n\nSession Chairs: Flor
 ina Ciorba (University of Basel) and Marie-Christine Sawley (ICES Foundati
 on)\n\n
END:VEVENT
END:VCALENDAR
