
Trinity Hall Trinity Lane 海角社区 CB2 1TJ
Marieke Meelen鈥檚 research interests include information structure, comparative syntax and historical linguistics. She is currently part of two projects: the Emergence of Egophoricity (with Prof Hill at SOAS, University of London) and 鈥楶aganTibet鈥 (with Prof Ramble, EPHE-PSL, Paris) as well as the recently-finished AHRC-funded 鈥楾he History of Subject Pronouns (with Prof Willis at Oxford University and Prof Meier in Berlin). As the PI of an ELDP-funded research projects documenting endangered languages in Nepal, she is interested in NLP and corpus creation for low-resource languages, having developed both ASR and HTR models for various Tibeto-Burman varieties.
As part of her British Academy postdoctoral fellowship, she worked on the history of V2 word orders across Indo-European languages and developing a historical treebank of Welsh. Her doctoral thesis combined methods from computational and historical linguistics to reconstruct verb-initial and verb-second word order patterns and information structure in Welsh in their Celtic historical context. She is also a computational linguistic consultant for a project on the annotation of Middle Welsh texts at the Philipps-Universit盲t in Marburg.
Marieke was awarded her PhD at Leiden University in 2016 supervised by Prof Lisa Cheng and Prof Alexander Lubotsky.
Historical Linguistics, NLP for low-resource languages (from a linguistics perspective).
I鈥檓 particularly interested in mentoring postdocs and supervising PhD students with a strong linguistics background hoping to work on Celtic or Tibeto-Burman languages in areas of my research interests:
- Historical Linguistics (Syntax, Reconstruction and Information Structure)
- Grammaticalisation & Pragmaticalisation
- NLP for low-resource languages
- Celtic & Tibeto-Burman languages
- ERC AG 鈥楶aganTibet鈥 (2023-2028)
- AHRC 鈥楨mergence of Egophoricity鈥 (2022-2026)
- AHRC-DFG 鈥楬istory of Subject Pronouns in Northern Europe鈥 (2021-2024)
- ELDP SG 鈥楢n Audio-Visual Archive of South Mustang Tibetan鈥 (2022-2023)
- Meelen, M. (in press) Syntactic reconstruction in Celtic. In Carnie et al. (eds.) Formal Approaches to Celtic Linguistics, Language Science Press.
- Meelen, M. (in press) Middle Cornish syntax. in Nurmio et al (eds.) Palgrave Handbook of Celtic Languages & Linguistics.
- Meelen, M. and Willis, D. (2024). The diachrony of Welsh subject pronouns in Elliott Lash (ed.) Studia Celtica Posnaniensia Vol 9. Special Issue: Noun phrase and pronominal syntax in medieval and early modern Celtic languages. 85-112.
- O鈥橬eill, A. and Meelen, M. (2024). Diachronic Annotated Corpus of Newar (DACON): from Manuscript to Morphosyntax in Cahiers Linguistique Asie Orientale, 1-30. DOI: 10.1163/19606028-bja10047听听听
- Meelen, M, Faggionato, C. and Hill, N. eds (2024). Tibetan digital humanities and natural language processing. Proceedings of the IATS 2022 panel as a Special Issue of the Revue d鈥橢tudes Tib茅taines 72.
- Meelen, M, Nehrdich, S. and Keutzer, K. (2024). Breakthroughs in Tibetan NLP & Digital Humanities. In Meelen, M, Faggionato, C. and Hill, N. (eds) Tibetan digital humanities and natural language processing. Proceedings of the IATS 2022 panel as a Special Issue of the Revue d鈥橢tudes Tib茅taines. pp. 5-25.
- Meelen, M, O鈥橬eill, A and Coto-Solano, R. (2024). End-to-End Speech Recognition for Endangered Languages of Nepal in Moeller et al (eds.) Proceedings of the Comput-EL workshop at the EACL, pp. 83-93.
- Meelen, M, Hill, N. and Fellner, H. (2022) What are cognates? in Papers in Historical Phonology vol 7. DOI:
- Meelen, M. and Willis, D. eds. (2022). Creating annotated corpora for historical languages Special Issue for Journal of Historical Syntax, Vol. 6, pp. 1-6.
- Felbur, R, Meelen, M and Vierthaler, P. (2022). Crosslinguistic Semantic Textual Similarity for Classical Tibetan & Chinese in Journal of Open Humanities Data 8, 23. DOI:
- Faggionato, C., Hill, N., & Meelen, M. (2022). NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties. LREC-EURALI Proceedings, pp. 1-6.
- Darling, M., Meelen, M., & Willis, D. (2022). Towards coreference resolution for Early Irish. LREC-CLTW听 Proceedings, pp. 85-93,
- Meelen, M and Willis, D. (2022). Towards a historical treebank of Middle and Modern Welsh Syntactic parsing in Meelen & Willis (eds). Annotating Historical Corpora: Special Issue for Journal of Historical Syntax, 5:1-32, 听听
- Meelen, M and Pujol I Campeny, A. (2021). Old Catalan Morphosyntax: Developing an Annotated Corpus in Journal of Open Humanities Data, 7:30, pp. 1鈥12. DOI: 听听
- Meelen, M., Roux, 脡., & Hill, N. (2021). Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 20(1), 1-11.
- Meelen, M. (2020). The Emergence of V2 in Welsh. In Woods鈥擶olfe (eds) Rethinking V2, Oxford University Press, pp. 426-454.
- Meelen, M and Roux, E. (2020). Meta-dating the PArsed Corpus of Tibetan (PACTib) in Kilian Evang, Laura Kallmeyer, Rafael Ehren,Simon Petitjean, Esther Seyffarth, Djam茅 Seddah (Editors) Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories, pp. 31鈥42
- Meelen, Marieke & Nurmio, Silva (2020) 'Adjectival agreement in Middle Welsh translated prose' in Journal of Celtic Linguistics. pp. 1-28.
- Meelen, Marieke, Mourigh, Khalid & Cheng, Lisa (2020) 'V3 word order in Dutch urban varieties鈥, in Andra虂s Ba虂ra虂ny, Theresa Biberauer, Jamie Douglas and Sten Vikner (eds.) Clausal Architecture and Its Consequences: Synchronic and Diachronic Perspectives, pp. 55-84.
- Faggionato, C., & Meelen, M. (2019). Developing the old Tibetan treebank. In Proceedings of the RANLP. pp. 304-312.
- Hill, N. W., & Meelen, M. (2017). Segmenting and POS tagging Classical Tibetan using a memory-based tagger. Himalayan Linguistics, 16(2), 64-86.
- Meelen, Marieke & Hill, Nathan (2017) Segmenting and POS tagging Classical Tibetan, Himalayan Linguistics 16 (2), pp. 64-89.听
- Meelen, Marieke, Hill, Nathan, & Handy, Christopher. (2017a) The Annotated Corpus of Classical Tibetan (ACTib), Part I - Segmented version, based on the BDRC digitised text collection, tagged with the Memory-Based Tagger from TiMBL [Data set]. Zenodo. 听
- Meelen, Marieke, Hill, Nathan, & Handy, Christopher. (2017b) The Annotated Corpus of Classical Tibetan (ACTib), Part II - POS-tagged version, based on the BDRC digitised text collection, tagged with the Memory-Based Tagger from TiMBL [Data set]. Zenodo. 听听听听听听听听听
- Meelen, Marieke (2017) 'Object-initial word order in Middle Welsh narrative prose' in Widmer & Poppe (eds.) Referential Properties and Their Impact on the Syntax of Insular Celtic Languages. pp. 145-178.听
- Meelen, Marieke (2016) Why Jesus and Job spoke bad Welsh: the origin and distribution of V2 orders in Middle Welsh, Utrecht: LOT publications.
- Van Baren, Eva, Meelen, Marieke & Meijs, Lucas (2015) 'Promoting Youth Development Worldwide: The Duke of Edinburgh鈥檚 International Award' in Journal of Youth Development 10 (1).