BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000711Z
LOCATION:501-502
DTSTART;TZID=America/Denver:20231113T121000
DTEND;TZID=America/Denver:20231113T123000
UID:submissions.supercomputing.org_SC23_sess440_ws_ai4s120@linklings.com
SUMMARY:Protein Generation via Genome-Scale Language Models with Bio-Physi
 cal Scoring
DESCRIPTION:Gautham Dharuman, Logan Ward, Heng Ma, and Priyanka V. Setty (
 Argonne National Laboratory (ANL)); Ozan Gokdemir (University of Chicago);
  Sam Foreman, Murali Emani, Kyle Hippe, and Alexander Brace (Argonne Natio
 nal Laboratory (ANL)); Kristopher Keipert and Thomas Gibbs (NVIDIA Corpora
 tion); Ian Foster (Argonne National Laboratory (ANL)); Anima Anandkumar (C
 alifornia Institute of Technology); and Venkatram Vishwanath and Arvind Ra
 manathan (Argonne National Laboratory (ANL))\n\nLarge language models (LLM
 s) trained on vast biological datasets can learn biological motifs and cor
 relations across the evolutionary landscape of natural proteins. LLMs can 
 then be used for de novo design of novel proteins with specific structures
 , functions, and physicochemical properties. We employ a pre-trained genom
 e-scale language model that uses codons as tokens and integrate it into a 
 workflow for targeted generation of sequences. Our framework suggests new 
 gene sequences that are ranked for downstream evaluation by metrics that c
 ollectively capture extensive sequence-specific, biophysical, and biochemi
 cal properties. We demonstrate our integrated workflow to design novel var
 iants of the enzyme, malate dehydrogenase (MDH), that exhibit more favorab
 le activation energies than their natural counterparts (reduction of 4.01 
 kJ/mol) with sustained sequence generation rates of 10^4/hr and simulation
  rates of 10^2/hr on 64 nodes of Polaris with about 99.7% system utilizati
 on during the run.\n\nTag: Artificial Intelligence/Machine Learning\n\nReg
 istration Category: Workshop Reg Pass\n\nSession Chairs: Murali Emani (Arg
 onne National Laboratory (ANL)); Gokcen Kestor (Barcelona Supercomputing C
 enter (BSC); University of California, Merced); and Dong Li (University of
  California, Merced)\n\n
END:VEVENT
END:VCALENDAR
