• Huvudmeny

Webinar series on crowdsourcing and automatic transcription

At the moment, many archives and museums around the world are in the process of tranferring handwritten records to a digital (and searchable) format. If they are successful, the positive effects are obvious: apart from opening up the collections for the public, this sort of digitization also makes the material much more accessible (and interesting) for researchers.

We are planning to organize a series of webinars, where invited archivists and researchers present their ongoing projects involving crowdsourcing and/or automatic transcription. What is the main objective of the project? How is it organized? What are the results? What could have been carried out more efficiently, and what are the plans for the future? Every webinar will consist of a presentation of the project (ca 30 minutes), followed by a general discussion

The host of this series of webinars is Språkbanken Sam, which is part of the national e-infrastructure Nationella språkbanken, ‘The National Language Bank of Sweden’.länk till annan webbplats The webinars will be held on Zoom. Please use the registration form if you want to attend.

13 oktober, 10-12

Sanita Reinsone, Riga
Crowdsourcing in Practice: Digital Archives of Latvian Folklore (in english)

Crowdsourced transcription of manuscripts of the Archives of Latvian Folklore (ALF) is being carried out since 2014 when its digital archive http://garamantas.lvlänk till annan webbplats was made open to the public. Two years later, a specialised digital platform lv100.garamantas.lvlänk till annan webbplats was launched to more promote crowdsourcing of folklore manuscripts. Since 2016, volunteers have spent more than 24.700 hours in deciphering ALF's manuscripts in eleven languages providing invaluable help in making the folklore collections digitally accessible. The presentation will give insight into different user involvement strategies practised by ALF, reveal the main challenges and problems, as well as will discuss the motivation for participating.

20 oktober, 10-12

Críostóir Mac Cárthaigh, Dublin
Meitheal Dúchas.ie: Sharing the work of digitizing the National Folklore Collection
(in english)

Meitheal Dúchas.ie is a crowdsourcing project established in 2015 to promote the digitization of folklore texts from the National Folklore Collection, University College Dublin, hosted on its digital platform www.duchas.ielänk till annan webbplats. In the intervening years, almost 6,000 people have taken part in the project. They include academic researchers, students, educators, local historians, artists and writers. To date, more than 250,000 pages of archive material have been digitized, a process that has quickened noticeably in recent months as a consequence of the Covid-19 pandemic. It is hoped to extend this crowdsourcing model to other elements of the National Folklore Collection, including photographic and audio material.

17 november, 13-15

Karl-Magnus Johansson, Göteborg
Machine Learning and Local Knowledge -- A Presentation of an Ongoing Handwritten Text Recognition and Citizen Science Project at the National Archives in Gothenburg (in swedish)

In early 2020 a Handwritten Text Recognition (HTR) and Citizen Science project was initiated at the Swedish National Archives in collaboration with GPS400 - Centre for Collaborative Visual Research at the University of Gothenburg. The project’s archival material consists of police reports from Gothenburg 1868-1902 in more than 22 000 pages of handwritten text. To produce high quality training data for the HTR-model, as well as to raise the quality of the automatically transcribed data, people from civic society were invited to participate in the project. In this presentation, archivist Karl-Magnus Johansson talks about his experiences of the ongoing project, in connection to recent studies of the relationship between data and local knowledge.

24 november, 13-15

Erik Magnusson Petzell, Göteborg Automatic transcription of dialect texts (in swedish)

In this seminar, I will describe my ongoing work with automatic transcription of 19th century dialect texts, handwritten in a traditional phonetic alphabet that is only marginally used today. Such texts exist in archives all over Scandinavia, and through them, we are granted access to the linguistic subtleties of an era that is too distant to have been caught on audio tape. So far, I have only scraped the surface of this great pile of detailed dialect data.

For practical reasons, I have started with texts from the dialect archive in Gothenburg, where I work. In the presentation, I will describe all the steps involved in converting the image of handwritten text to a digital and fully searchable correlate, highlighting various difficulties I have encountered on the way. These include transliteration issues (How does one transcribe non-Unicode fonts?), problems with machine learning (How can a HTR model trained on one hand/dialect be extended to more hands/dialects?), and not least challenges relating to output: In order to make the old dialect texts useful for different sorts of linguistic research, the precise phonetic transcription cannot constitute the only resource. In addition, there is need for several conversions of the original text into different more or less simplified formats, which, in turn, can be useful also for non-linguists (both other researchers and members of the general public).

How to best accomplish such a multi-layered resource is one of the questions that I look forward to discussing with the webinar attendants. Another one regards crowdsourcing. What would be suitable tasks for members of the public in this project? Layout analysis and metadata extraction only? Or more advanced tasks, such as corrections of machine transcriptions or even manual transcriptions of new hands?

Uppdaterad 08 oktober 2020

When

13 oktober, 10-12
Sanita Reinsone, Riga, Crowdsourcing in Practice: Digital Archives of Latvian Folklore

20 oktober, 10-12
Críostóir Mac Cárthaigh, Dublin
Meitheal Dúchas.ie–Sharing the work of digitizing the National Folklore Collection

17 november, 13-15
Machine Learning and Local Knowledge – A Presentation of an Ongoing Handwritten Text Recognition and Citizen Science Project at the National Archives in Gothenburg

24 november, 13-15
Erik Magnusson Petzell, Göteborg – Automatic transcription of dialect texts

Where

The webinars will be held on Zoom.

Sign up

Registration form

Questions?

Fredrik Skott
Erik Magnusson Petzell