Publications

Scientific publications

A.D. McCarthy, C. Kirov, M. Grella, A. Nidhi, P. Xia, K. Gorman, E. Vylomova, S.J. Mielke, G. Nicolai, M. Silfverberg, T. Arkhangelskij, N. Krizhanovsky, A. Krizhanovsky, E. Klyachko, A. Sorokin, J. Mansfield, V. Ernštreits, Y. Pinter, et al.
UniMorph 3.0: Universal Morphology
// Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, 11–16 May, 2020. P. 3915–3924
Keywords: morphology, lexical database, multilinguality
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphologicalparadigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schemafor rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We haveimplemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and morecorrect. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in severalways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological dataavailable from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns HopkinsUniversity in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources sincethe UniMorph 2.0 release described at LREC 2018.
Indexed at Scopus, Google Scholar
Last modified: July 17, 2020