The third and final day at the Social Media Access Days at the German National Library starts with a paper by Alexander König, whose focus is on metadata for social media data. Creating quality metadata is critical for the appropriate storage and reuse of datasets, and especially so for social media dataset as the conditions of the data gathering and the state of the platform at the time of gathering also need to be captured in the metadata.
The CLARIN Knowledge Centre for Computer-Mediated Communication and Social Media Corporate (CKCMC) is currently developing a metadata standard for such data, incorporating the FAIR principles for data storage and use; this also builds on related standards like LC-meta and UNIC. The approach here is intended to be modular, so it can describe different data types (textual, audiovisual) as they are made available by the various social media platforms.
The first module in this are the Social Media Metadata Profiles (SMP), which are designed to capture information on the user interface, interaction types, and general platform features. The user interface shapes how users are able to act on the platform, of course (in terms of viewing, engagement, connection, and posting options; language options; constraints on content types and lengths; and social status features like likes or badges); interaction types including the connection graph model (reciprocal or non-reciprocal), key communication types (one-to-one, one-to-many, many-to-many), conversation threading structures, modality transitions (from audiovisual to text and back), and the availability of automated and AI communication facilities; and general platform features cover the intended and actual audiences of a platform, distinctions between ordinary and elevated user states (premium users, moderators), content visibility (public, login requirement, restricted), and moderation approaches.
Not all of these features are necessarily immediately clear for a given platform, but capturing as many of them as possible in the metadata profile is still useful, both for the context of a distinct dataset and because such features will change over the life of a platform and must therefore be tracked in order to understand their impact on the datasets and the activities they cover. However, these features may also differ between users and between countries, because platforms engage in A/B tests of new functions or because of applicable national regulations, so capturing all of this may be complicated.
To further develop this approach, a CMC Metadata Working Group is now being established and open to new members; once the SMP module has been developed, work will move on to other modules of the overall metadata framework. Key platforms will be prioritised here at first, documenting these platforms and their features before moving on to others. Student projects, volunteer hackathons, and other initiatives would also be welcome as ways to push this work along.
The Social Media Metadata Profiles, in particular, could also be reused for the metadata descriptions of datasets from the same platforms, as long as those platforms have not changed substantially in their functionality between data gathering activities; hopefully, this will simplify the dataset description process and produce better metadata.











