Skip to content Skip to navigation

Online Workshop 5: Making More Sense With Machines: AI/ML Methods for Interrogating and Understanding Our Textual Heritage in the Humanities, Natural Sciences, and Social Sciences.

Online Workshop 5: Making More Sense With Machines: AI/ML Methods for Interrogating and Understanding Our Textual Heritage in the Humanities, Natural Sciences, and Social Sciences.

This workshop will be held on ZOOM and hosted by University of Illinois and the HathiTrust Research Center, 29th–30th November 2022 from 10am to 2pm Central Time both days.

Schedule and Time Conversions:
Tuesday 29 November 2022: 10am-2pm Central Time || 3pm-7pm UTC || 4pm-8pm UK
Wednesday 30 November 2022: 10am-2pm Central Time || 3pm-7pm UTC || 4pm-8pm UK

 

Registration:
The workshop is free to attend, but registration is essential to receive the joining information:

 

Title: “Making More Sense With Machines: AI/ML Methods for Interrogating and Understanding Our Textual Heritage in the Humanities, Natural Sciences, and Social Sciences.”

Our cultural heritage includes texts in the widest imaginable variety of subjects, including not only the humanities and arts, but also in the natural and social sciences; likewise, our largest digital libraries – including that of the HathiTrust (and its Research Center, which hosts this workshop) – consist of legacy documents in practically all areas of human thought and creativity.

These digitized heritage libraries represent some special challenges both to computational study in general, and to emerging AI/ML approaches in particular: digital library documents are much longer, often by orders of magnitude, and much more diverse, than most of the training sets and algorithms that have been at the foundation of modern machine learning.

This workshop, the fifth in the series, will focus on the work of interrogating documents of many types and scope, with the aim of unlocking their data and making it more accessible and more computable.  Our shared goal is to make our heritage digital collections in all subject areas richer and more usable through the application and enhancement of computational methods both old and new.

Programme

Listed times are CST – please convert accordingly

 

Day 1 (Tuesday 29 November), 10:00 am to 2:00 pm CST

10:00 Glen Layne-Worthey, J. Stephen Downie, UK AEOLIAN Team Welcoming remarks and general workshop introduction

10:30 Jill Naiman (University of Illinois Urbana-Champaign) “Document Layout Analysis for Scientific Article Figure & Caption Extraction”

11:15 Hema Natarajan (Benetech Corporation) “Making Math Accessible, One Image at a Time”

12:00 Break

12:30 Undergraduate Research Showcase (lightning talks):

  • Morgan Cosillo on OCR post-correction with NLP machine learning model
  • Rushdan Jimoh on natural and artificial “page aging” processes for machine learning

12:45 Nikolaus Parulian and Glen Layne-Worthey (University of Illinois) “Machine Learning to Identify Creative Content and Paratext at the Page Level”

1:30 Peter Organisciak (University of Denver) “Neural Nets to Identify Work Relationships in HathiTrust”

Day 2 (Wednesday 30 November), 10:00 am to 2:00 pm CST

10:00 Glen Layne-Worthey Summary of Day 1, and introduction to Day 2 topics

10:15 Janet Swatscheno (HathiTrust Research Center, University of Michigan) Tutorial: “HathiTrust Extracted Features for Machine Learning”

10:45 Julian Schröter (Universität Würzburg & University of Illinois) “Modeling prototypicality of genre concepts with machine learning and the c@1-score”

11:15 Ben Schmidt (Nomic AI) “How Small Can Big Data Get?  HathiTrust Extracted Features in Bits and Browsers”

11:45 Break

12:15 Undergraduate Research Showcase (lightning talks):

  • Kiara Balleza on crowd-sourcing the extraction of non-textual page elements
  • David Zhu on a Tesseract “parameter sweep” for OCR optimization

12:30 Ming Jiang (Indiana University Indianapolis) and Yuerong Hu (University of Illinois) “The impact of OCR quality on BERT embeddings”

1:00 Ryan Dubnicek and Ted Underwood (University of Illinois) “Piloting a machine-learning approach to identify English-language fiction in the HathiTrust Digital Library”

1:30 Roundtable (all speakers) “Challenges and opportunities with AI/ML methods for understanding our textual heritage across the disciplines”

Speaker Information:
Address

ONLINE

Booking

29th to 30th November 2022

Event Categories
 Workshops
Latest Events