Curating shared knowledge about artifacts on Wikidata

Making Arabic periodicals findable and accessible

Till Grallert

Humboldt-Universität zu Berlin

NFDI4Memory

Methods Innovation Lab

Global Digital Humanitis Symposium

2026-04-15

http://tillgrallert.eu/slides/dh/2026-gdhs/

Background

Arabic periodicals

The first mass medium of the Eastern Mediterranean and a global Arabic ideosphere

Figure 1: New Arabic periodicals between 1855 and 1929 by place of publication
Figure 2: Front page of the newspaper Kawkab Amīrkā #1, 15 April 1892, New York

My research interests

… or what I would want to do

Who are the most important authors?

Figure 3: Undirected network of authors in al-Ḥaqāʾiq, al-Ḥasnāʾ, Lughat al-ʿArab, and al-Muqtabas. Colour of nodes: betweenness centrality; size of nodes: number of periodicals; width of edges: number of articles.

What are the most important periodicals?

Figure 4: Directed network of periodicals referenced in al-Ḥaqāʾiq, al-Ḥasnāʾ, Lughat al-ʿArab, al-Muqtabas, and al-Zuhūr

neo-colonial absences

what are we up against

Figure 5: Search in ZDB for “مرآة الشرق”
  1. limited knowledge of the “original” artefact
  2. survival and collection bias leads to digitisation bias
  3. networked digital infrastructures and global capitalism
  4. digital divides between the global south and the global north
  5. linguistic imperialism and the digital artefact
Figure 6: al-Muqtabas 6 on HathiTrust (Original in Princeton) outside the USA

Survival bias: violent destruction

Syria

Figure 7: The National Archives of Syria in Damascus the day after their conflagration on 16. July 2023. Source: “«أصبح رمادًا وركاما»... حريق كبير يدمر سوقًا في قلب دمشق” (2023)

Palestine

Figure 8: The Great Omari Mosque in Gaza after its complete destruction by the Israeli military on 8. Dec. 2023. Source: Estrin and Bashir (Requiem 2024)

Lebanon

Figure 9: Meeting room at the Orient-Institut Beirut damaged by the Beirut Port explosion on 4 August 2020. Source: OIB

Iraq

Figure 10: The Iraq National Library and Archive in Baghdad after its destruction by arson in April 2003 during the American invasion. Source: Wikipedia

Collection (and cataloguing) bias

Figure 11: Periodicals and their holding institutions (Wikidata)
Table 1: Arabic periodicals until 1929: holdings and digitization
periodicals % of total % of digitised
published 3550
known holdings 775 21.83
digitized 233 6.56
multiple digitisations 66 1.86 28.33

Digitisation bias

mind the <gap/>!

Table 2: Comparison of digitized periodicals between the Global South and the Global North
Arabic periodicals (1798–1918) WWI as mirrored by Hessian regional papers
community c. 420 mio. Arabic speakers c. 6.2 mio. inhabitants
periodicals 2054 newspapers and journals 125 newspapers
digitized 156 periodicals 125 newspapers with more than 1.5 million pages
type mostly facsimiles facsimiles and full text
access paywalls, geo-fencing open access
interface mostly foreign languages only local and foreign languages
Figure 12: Map of Arabic dialects. Source: reddit
Figure 13: Map of Hesse in Europe. Source: https://www.iz.sk/sk/projekty/regiony-eu/DE7

Digital divides: Access to power and the internet

Figure 14: Protester holding a sign reading “Darling, you are as beautiful as an additional hour of electricity” in Baghdad, July 2015. Source: Twitter

Arabic

Script

  • Second most important script after Latin
    • currently used by 14 languages: Arabic, Persian, Urdu, Pashto, Uzbek, Uighur …
  • Writing direction: right to left (RTL)
  • Letters: mostly connected in direction of writing, letterform depends on position within the string

Language

  • Fifth most important language
    • One of six official languages of the United Nations
    • Official language in 26 countries
    • >420 million speakers
  • Lithurgical language of 1,6 billion Muslims
Figure 15: Approximate distribution of Arabic script use along current national boundaries. Solid colours: contemporary primary script; vertical stripes: contemporary use as a secondary national script; horizontal stripes historical use (Nemeth Arabic Type-Making in the Machine Age 2017, fig 1.1)

Linguistic imperialism

epistemic violence of the means of knowledge production

Figure 16: Arabic Linotype, 1910s. Source: (Nemeth Arabic Type-Making in the Machine Age 2017, fig. 2.7)

Arabic can be “reliably” shared with Unicode since 1991 … if protocolls, software, fonts etc. support it across the entire technology stack.

Figure 17: 32 variants of encoding Meccan” (مكية) (Milo “Visually Misleading Characters in the Arabic URL” 2014, 4)
Figure 18: In-browser search for “مك” in the Wikidata entry for “Mecca” (Q5806)

Transliteration, the undead solution of yore

Transliteration into Latin script served the need of colonial administrations and academics with the technological affordances of the time.

مرآة الشرق

The Arabic title of a newspaper published by بولس شحادة in Jerusalem, 1919–38

Meraat al-Sherk

The official transcription provided by the paper’s masthead

Figure 19: Front page of Mirʾāt al-Sharq #192, 22 Nov. 1922, Jerusalem. Source: EAP.

Mirʾāt al-Sharq

Following the system of the International Journal of Middle East Studies (IJMES)

Mirʾāt aš-Šarq

Following the system of the Deutsche Morgenländische Gesellschaft (DMG)

mrMp Alcrq

Buckwalter transliteration

The long tail of ASCII in discovery systems

How do we search for مرآة الشرق?

  • original Arabic: مرآة الشرق
  • original Latin: Meraat al-Sherk
  • IJMES: Mirʾāt al-Sharq
  • DMG: Mirʾāt aš-Šarq
  • Buckwalter: mrMp Alcrq
Figure 20: Front page of Mirʾāt al-Sharq #192, 22 Nov. 1922, Jerusalem. Source: EAP.

failure

  • Arabic script (data or interface)
  • IJMES
  • removing or substituting hamza and ʿayn: mir'at sarq
Figure 21: Search in ZDB for “mir’at sarq”

success

  • original Latin title
  • DMG
  • removing all diacritics and articles: mirʾat sarq
Figure 22: Search in ZDB for “Mirʾāt aš-Šarq”

Proposed solution

Minimal computing

minimal computing connotes digital humanities work undertaken in the context of some set of constraints. This could include lack of access to hardware or software, network capacity, technical education, or even a reliable power grid

(Risam and Gil “Introduction” 2022, sec. 3)

this implies learning how to produce, disseminate, and preserve digital scholarship ourselves, without the help we can’t get, even as we fight to build the infrastructures we need at the intersection of, with, and beyond institutional libraries and schools.

(Gil and Ortega “Global Outlooks in Digital Humanities 2016, 29)

Figure 23: Frankfurter Kitchen, Source: WikiCommons, CCO

Minimal Computing

What do we need?

  • knowledge shall be FAIR: findable, accessible, interoperable, and reusable
  • support for multilingual data and interfaces
  • longterm sustainability of the tech stack
  • user management, documentation, support for crowd sourcing

What do we have at hand?

  • community of volunteers
  • high level of domain knowledge

(no funds, no institutional stakeholders)

Wikidata

  • largest public knowledge graph
    • 5-star linked open data
    • FAIR, CC0 (public domain)
  • community driven
    • integrated into larger knowledge environments
    • robust user management
  • multilingual by design
  • Open software stack
Figure 24: Wikidata user interface in Korean for Mirʾāt al-Sharq

Workflow for publishing metadata on Wikidata

Data collection: Project Jarāʾid (2012–)

  • Bibliographic record of all Arabic periodical titles published between 1798 and 1929
    • websites and open datasets (TEI/XML) for more than 3500 periodicals
    • additional authority files for c.2700 persons, 220 places, 180 libraries
  • Crowd-sourcing among scholars
  • Networking and reconciling existing information:
    • Integration of holding information from library catalogues such as ZDB, AUB, BnF, HathiTrust
    • Conversions from MARC, MODS, and HTML to TEI/XML
<biblStruct source="https://projectjaraid.github.io wd:Q186844 wd:Q124855340 wd:Q107011742" subtype="newspaper" type="periodical">
   <monogr>
      <title level="j" xml:lang="ar">مرآة الشرق</title>
      <title level="j" source="wd:Q186844" type="sub" xml:lang="ar">جريدة اسبوعية حرة</title>
      <title level="j" source="https://jrayed.org/en/newspapers/meraatalsherk/1919/09/17/01/ wd:Q124855340" type="sub" xml:lang="ar">جريدة عربية سياسية حرة</title>
      <title level="j" source="wd:Q124855340" xml:lang="ar-Latn-EN">Meraat al-Sherk</title>
      <title level="j" xml:lang="ar-Latn-x-ijmes">Mirʾāt al-Sharq</title>
      <title level="j" source="wd:Q186844" xml:lang="ar-Latn-x-dmg">Mirʾāt aš-Šarq</title>
      <title level="j" source="wd:Q186844" type="alt" xml:lang="ar-Latn-FR">Meraat alsherq</title>
      <title level="j" source="wd:Q186844" type="sub" xml:lang="ar-Latn-x-dmg">ǧarīda usbūʿīya ḥurra</title>
      <idno type="OCLC">33001662</idno>
      <idno type="OCLC">50276604</idno>
      <!-- ... -->
      <idno type="wiki">Q25212027</idno>
      <idno type="wiki">Q124971778</idno>
      <idno source="wd:Q186844" type="zdb">1019615-8</idno>
      <textLang mainLang="ar"/>
      <editor type="owner">
         <persName ref="wd:Q125160760" xml:lang="ar">
            <forename>بولس</forename>
            <surname>شحادة</surname>
         </persName>
      </editor>
      <editor source="wd:Q124855340" type="editor">
         <persName ref="wd:Q125159749" xml:lang="ar">
            <forename>نقولا</forename>
            <surname>شحادة</surname>
         </persName>
      </editor>
      <imprint>
         <pubPlace>
            <placeName ref="geon:281184 wd:Q1218" xml:lang="ar">القدس</placeName>
         </pubPlace>
         <publisher source="wd:Q124855340">
            <orgName xml:lang="ar">مطبعة مرآة الشرق</orgName>
         </publisher>
         <date type="onset" when="1919-09-17">1919</date>
         <date resp="#hEAP119" type="terminus" when="1938">1938</date>
      </imprint>
   </monogr>
</biblStruct>

Import data to Wikidata

model

Create a data model of Wikidata entities and property statements

map

Map the original data model (TEI/XML) to the new data model

  • to custom XML for import into OpenRefine
  • to QuickStatements for direct import to Wikidata

reconcile

Reconcile entities with existing Wikidata items with OpenRefine

upload

Create new Wikidata items and statements with QuickStatements

Figure 25: Schematic data model for Arabic periodicals on Wikidata, using Mirʾāt al-Sharq as an example

Archive data

all dependencies will break eventually

Why

  • Everyone can edit Wikidata
  • WikiMedia will fold
  • One might need to cite a specific version

How

  • SPARQL and RESTful APIs to the rescue
    • save a local copy of the graph
  • bash script as a wrapper
  • deploy via GitHub, GitLab actions for periodic runs
  • push periodic release to publicly-funded, open repository (Zenodo) for long-term preservation (Grallert “Project Jarāʾid 2024)
    • make sure to add ORCIDs for all contributors
    • provides versioned DOIs

Results

Visibility

before

Figure 26: Map of all Arabic periodicals published before 1930 (items created before 18 March 2024, SPARQL)

after

Figure 27: Map of all Arabic periodicals published before 1930 (SPARQL)

Visibility

before

Figure 28: Bubble chart of publication languages (items created before 18 March 2024). Note the surprising prominence of Swedish (SPARQL)

after

Figure 29: Bubble chart of publication languages (as of today, SPARQL)

Complex queries

SPARQL is powerful BUT has a steep learning curve

Examples

Community

Daily edits from bots and human contributors

Figure 30: Wikidata item for al-Majalla, Buenos Aires, 1915–
Figure 31: Edit history for al-Majalla

Thank you

Thank you!

  • Contributors to OpenArabicPE: Jasper Bernhofer, Dimitar Dragnev, Patrick Funk, Talha Güzel, Hans Magne Jaatun, Daniel Kolland, Jakob Koppermann, Xaver Kretzschmar, Daniel Lloyd, Klara Mayer, Tobias Sick, Manzi Tanna-Händel, and Layla Youssef
  • Contributors to Project Jarāʾid: Hala Auji, Philippe Chevrant, Marina Demetriadou, Lamia Eid, Stacy Fahrenthold, Ulrike Freitag, Till Grallert, Rana Issa, Nicole Khayat, Peter Magierski, Leyla von Mende, Adam Mestyan, Christian Meier, Daniel Newman, Geoffrey Roper, Sinai Rusinek, Philip Sadgrove, Ola Seif, and Rogier Visser
  • Slides: tillgrallert.eu/slides/dh/2026-gdhs/
  • Paper: 10.46298/transformations.14749
  • Mastodon: @tillgrallert@digitalcourage.social
  • Email: ,
  • ADHO SIG multilingual DH: multilingualdh.org
  • DHd AG multilingual DH: ag.multilingualdh.de

References

Berners-Lee, Tim. 2009. “Linked Data - Design Issues.” June 18, 2009. https://www.w3.org/DesignIssues/LinkedData#fivestar.
Estrin, Daniel, and Abu Bakr Bashir. 2024. A Requiem for Gaza’s Iconic Sites, Destroyed in the War. Weekend Edition Sunday. NPR. https://www.npr.org/2024/02/04/1226295081/gaza-iconic-sites-destroyed-in-war.
Fiormonte, Domenico. 2021. “Taxation Against Overrepresentation? The Consequences of Monolingualism for Digital Humanities.” In Alternative Historiographies of the Digital Humanities, edited by Dorothy Kim and Adeline Koh, 333–76. Earth: punctum books. https://doi.org/10.53288/0274.1.00.
Gil, Alex, and Élika Ortega. 2016. “Global Outlooks in Digital Humanities: Multilingual Practices and Minimal Computing.” In Doing Digital Humanities: Practice, Training, Research, edited by Constance Crompton, Richard J Lane, and Ray Siemens, 22–34. Abingdon: Routledge.
Grallert, Till. 2024. “Project Jarāʾid: Snapshot of Bibliographic Metadata from Wikidata for All Arabic Periodicals Published Worldwide Before 1930.” Zenodo. https://doi.org/10.5281/zenodo.14875615.
Milo, Thomas. 2014. “Visually Misleading Characters in the Arabic URL.” Deco Type.
Nemeth, Titus. 2017. Arabic Type-Making in the Machine Age: The Influence of Technology on the Form of Arabic Type, 1908-1993. Leiden: Brill. https://doi.org/10.1163/9789004349308.
Phillipson, Robert. 1997. “Realities and Myths of Linguistic Imperialism.” Journal of Multilingual and Multicultural Development 18 (3): 238–48. https://doi.org/10/db3cnb.
Risam, Roopika, and Alex Gil. 2022. “Introduction: The Questions of Minimal Computing.” Edited by Alex Gil and Roopika Risam. Digital Humanities Quarterly 16 (June). http://digitalhumanities.org/dhq/vol/16/2/000646/000646.html.
“«أصبح رمادًا وركاما»... حريق كبير يدمر سوقًا في قلب دمشق.” 2023. Newspaper. الشرق الاوسط. July 16, 2023. https://aawsat.com/node/4436121.