BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260422T000605Z
LOCATION:DEF Concourse
DTSTART;TZID=America/Denver:20231116T100000
DTEND;TZID=America/Denver:20231116T170000
UID:submissions.supercomputing.org_SC23_sess300_spostg107@linklings.com
SUMMARY:Better Data Splits for Machine Learning with Astartes
DESCRIPTION:Jackson Burns (Massachusetts Institute of Technology (MIT))\n\
 nMachine Learning (ML) has become an increasingly popular tool to accelera
 te traditional workflows. Critical to the use of ML is the process of spli
 tting datasets into training, validation, and testing subsets to develop a
 nd evaluate models. Common practice is to assign these subsets randomly. A
 lthough this approach is fast, it only measures a model's capacity to inte
 rpolate. These testing errors may be overly optimistic on out-of-scope dat
 a; thus, there is a growing need to easily measure performance for extrapo
 lation tasks. To address this issue, we report astartes, an open-source Py
 thon package that implements many similarity- and distance-based algorithm
 s to partition data into more challenging splits. This poster focuses on u
 se-cases within cheminformatics. However, astartes operates on arbitrary v
 ectors, so its principals and workflow are generalizable to other ML domai
 ns as well. astartes is available via the Python package managers pip and 
 conda and is publicly hosted on GitHub (github.com/JacksonBurns/astartes).
 \n\nRegistration Category: Tech Program Reg Pass, Exhibits Reg Pass\n\n
END:VEVENT
END:VCALENDAR
