Our Blog

Emerging best practices in dataset collection development

data-policiesSecondary datasets are increasingly important to researchers as they attempt to answer questions, make predictions and test hypotheses in new and powerful ways. For libraries that strive to provide information to support research needs, these datasets can be considered a ‘new currency’ in collection development.

There are many unique considerations in the collection and acquisition of datasets. Currently existing dataset collection development policies, guidelines and programs were gathered from web searches of academic library websites, calls to listservs and personal communications. A total of 18 policies, guidelines, or programs were identified and considered in this work. A literature review was conducted with a focus on the collection of commercially available datasets. The purpose of this overview was to get a sense of current approaches to dataset collection development at other research institutions, to determine key considerations in dataset purchasing, and to highlight particular challenges in implementing a dataset collection development program.

Key findings:

  • Liaison librarians and subject selectors can and should be involved in working with researchers and faculty across disciplines, particularly in the beginning stages of the dataset evaluation process. They can help determine if free datasets, or datasets already held in library collections, meet researcher needs and can get the word out to departments.
  • Requests can be handled on an ad-hoc basis or via formal application procedures. Two institutions examined in this studied provided an online application process through which researchers could apply for library support for dataset purchasing (University of Cincinnati and the University of Illinois).
  • License negotiation can be lengthy and tedious; commercial vendors selling datasets are often used to working with individual researchers, not libraries or institutional licensing arrangements.
  • Decide whether datasets will be treated like other electronic acquisitions. Licenses may be negotiated by e-resource acquisitions departments with expertise in negotiating terms of use.
  • Datasets should be integrated into the normal cataloguing workflow, and should be considered a part of the digital preservation program.
  • The amount the library is willing and able to contribute to a given dataset should be considered, with joint purchases between the library and the researchers when possible.
  • Data should be provided in a format that can be supported by the library and used by the researcher. Consider readability in commonly used statistical software. Datasets that come with adequate documentation and relevant metadata are preferred.
  • Consider the language and ease of cataloguing. Datasets should comply with the library’s existing storage capabilities.
  • Confidential data requires special storage and access considerations. The commercial supplier of the data and the data itself should be vetted for quality and reliability, and long-term access ensured.
  • Datasets purchased should be institutionally accessible to all faculty, students and staff. Terms should be in accordance with those for other electronic resource purchases made by the library.
  • Consider fair use and the rights of scholars to data derivatives. Datasets with a broad subject appeal to the research community, supporting the mission of the institution, should be prioritized.
  • Consider currency, the value of historical data, and geographic scope. Will the value of a dataset increase or decrease over time?

This information was copied and pasted from a May 1, 2014 slide presentation by Sarah Young, Health Science and Policy Librarian, Cornell University. Click here to review the source file for this information.

Tags: , , ,

This is a unique website which will require a more modern browser to work! Please upgrade today!