At the end of the nineteenth century, periodicals such as La Comète Belge were an important professional and social communication medium, connecting itinerant showpeople across Europe. It was used to circulate news, advice, complaints, meeting minutes, marriages, rumours, and much more. Tucked away between these longer stretches of text, is a remarkable set of practical lists, called “maisons recommandées”. These offered showpeople trusted places to eat, sleep, repair equipment, or simply meet colleagues while on the road. I was curious about the location of these establishments, the services they offered, and the changes they went through over time. These are all aspects we currently have no knowledge of.

One of these establishments was the Café Universel, owned by Jules Claeys, located at Grand Place 13 in Bruges. He systematically put advertisements for his café in the Comète Belge for several years. On the coast, in Blankenberge, the owner of L’Agneau, B. Van Loo-Geerst, rented out furnished rooms and offered the possibility to eat fish and steak at any hour.

Showpeople as well as establishment owners seem to have benefitted from the publication of these lists. For historians, such lists are a goldmine: they reveal everyday infrastructures of mobility, social networks, and the micro-geographies of funfair life. Yet these lists are frustratingly inaccessible to digital analysis because of the journals’ dense and multi-column layout, various fonts and sizes, and the combination of text and images.

When I set out to analyse these lists in La Comète Belge (1905–1919), conventional OCR tools could not cope with the structure of columns, lists, and tables. On a corpus of 216 journal issues, containing 4198 pages in total, manual transcription would have taken months and was not feasible. The recent rise of AI models with computer vision capabilities and its inferring skills therefore opened up an intriguing possibility: could large language models (LLMs) help unlock these historical sources?

This blogpost recounts some preliminary experiments, including its surprises, limitations, and promising results, showing how AI can support, though never replace, critical historical work on complex periodical sources.

The challenge: navigating jumbled data

To analyse how showpeople navigated urban space, I first needed to extract the text from these lists, something that is easier said than done.

The hobbled journey began with the PDFs of La Comète Belge, provided by the Royal Library of Belgium (KBR). While I was glad these already had an OCR layer, the text of these lists was, however, not properly recognized. Redoing the OCR also did not help. Layout recognition (LR) and optical character recognition (OCR) software have been improved tremendously over the past two decades. Commercial software such as ABBYY FineReader, the freely available Google Tesseract Engine and the arrival of the Transkribus project have proven useful and have helped historians tremendously in their research endeavours. However, none of them provided sufficient results on the lists I wanted to extract.

There were OCR mistakes, cities and establishments did not match, and there was no order or structure to be found, making the data unusable for further analysis. The reason was its complex layout. These establishment lists were not only a list but also a table: they have a two-column structure with the city on the left, and information on the establishment in question on the right. In addition, the city names are not repeated and apply to several establishments, as you can see in the image below. This was an unfortunate set-back, though one I hoped to remedy with AI.

The experiment: taking AI by the hand

In July 2025 I did a small experiment to see how ChatGPT’s o4-mini model performed on these “maisons recommandées” lists. To do so I constructed a prompt for ChatGPT, consisting of several elements to help it perform well. For example, I gave the AI the role of an expert in extracting information from lists in historical sources, and told it that the provided images came from nineteenth-century French journals for itinerant showpeople. I also gave it detailed extraction instructions; some warnings where it needed to be careful, an example, the output format, a recap of the task; after which I asked the model if it understood everything.

At first, when providing the AI with single images of lists, it performed relatively well, but another hurdle popped up when feeding it batches of images. The results became less accurate as the AI model ignored part of the task, hallucinated, simulated information, and generated empty CSV files. While it resulted in strange, even funny, reasonings of the AI model, its hallucinations and simulations are extremely problematic for historical research.

How could I get the AI back on track? I started to pay attention to ChatGPT’s “thoughts” and the code it generated. Within its internal outputs I could grasp where it went haywire. By asking it follow-up questions or asking it to double check information I tried to minimize its hallucination and simulation and steer the AI back in the right direction. The generated output was also checked manually for possible inaccuracies.

Results: maps, train stations and ex-showpeople

To analyse how showpeople navigated urban space, I first needed to extract the text from these lists, something that is easier said than done.

The hobbled journey began with the PDFs of La Comète Belge, provided by the Royal Library of Belgium (KBR). While I was glad these already had an OCR layer, the text of these lists was, however, not properly recognized. Redoing the OCR also did not help. Layout recognition (LR) and optical character recognition (OCR) software have been improved tremendously over the past two decades. Commercial software such as ABBYY FineReader, the freely available Google Tesseract Engine and the arrival of the Transkribus project have proven useful and have helped historians tremendously in their research endeavours. However, none of them provided sufficient results on the lists I wanted to extract.

There were OCR mistakes, cities and establishments did not match, and there was no order or structure to be found, making the data unusable for further analysis. The reason was its complex layout. These establishment lists were not only a list but also a table: they have a two-column structure with the city on the left, and information on the establishment in question on the right. In addition, the city names are not repeated and apply to several establishments, as you can see in the image below. This was an unfortunate set-back, though one I hoped to remedy with AI.

From this overview of cities, I zoomed in deeper on the level of individual cities. A majority of establishments was often located in the proximity of train stations, as is illustrated on the map below, and which became an important means of transportation for showpeople. In Brussels, in particular around the Gare du Midi, several hotels, brasseries and cafés offered their services to showpeople.

Geographical data is not the only information that can give us insights. The temporal and quantitative data extracted can also offer new information on showpeople’s relationship to the urban space. Many establishment owners kept advertising their establishment for several years. The earlier mentioned Café Universel did so for nine consecutive years between 1906 and 1914 and again in 1919. During 1914-1918 none or only a few (1914) issues of the Comète Belge were published, so little to no data is available. However, when the journal started publishing again in 1919, it is interesting to see that many of the establishments that had advertised before continued to do so after the war.

Among these recurring establishments, we find for example Hôtel Habis in Antwerp and the café Chez Raymond in Ghent, the proprietors of which were (ex-)itinerant showpeople. The hotel was owned by Edouard Opitz-Van Haverbeke. The Opitz family and their relatives were an important showpeople family that had migrated to Belgium. The café was owned by Raymond de la Ruelle, who was for many years a weightlifter at the fairground, but would open his café in 1899; and was thus also excellently placed to help out fellow showpeople. These establishments were places where showpeople could read La Comète Belge and inquire about information related to funfairs, as well as enjoy a good drink or meal, or make use of the divertissement offered, such as billiard rooms, a hairdresser, concert, or cinema.

Fast-changing AI

As AI models change continuously, I tested out four different AI models (ChatGPT5, Gemini 2.5 pro, Claude opus 4.1 and Qwen 3-max) in October and November 2025. For all models, the same image and prompt was given: “Context: 19th century French journal with list of establishments. Extract in table format: city, name establishment, owner, address, other information if present”.

In the timespan of just a few months, mistakes were considerably fewer than in the first test, with ChatGPT 5 and Gemini 2.5 pro performing very well. They only made small mistakes such as incorrectly spelling an owner’s name and adding the unnecessary abbreviation M[onsieur] or Prop[itaire]. With an additional prompt, this would have probably been corrected and would have given a perfect result.

Conclusion: a cautious future for AI and itinerant showpeople periodicals

This experiment has shown that AI contains much potential for extracting historical information from sources with complex layouts or suboptimal data. The analysis of these lists, for example, would not have been feasible without it, as manual data input of hundreds of lists across a fifteen-year period would have taken too much time. Aside from speed or feasibility, the use of AI means gaining access to spatial, social or economic patterns that were long hidden within these periodicals. The extracted data offers the opportunity to map the use of infrastructure by showpeople, compare cities and regions, follow individual proprietors or showpeople over time, and much more.

At the same time, the use of AI remains a risk due to its hallucinations and subtle inconsistencies in extracted text, though this is minimized with every new version of an AI model that becomes publicly available. The challenge is to use AI transparently and cautiously, to always keep a historian firmly in the loop and develop best practices for AI in historical research.