The Life Cycle of Structural Biology Data


Research data is acquired, interpreted, published, reused, and sometimes eventually discarded. Understanding this life cycle better will help the development of appropriate infrastructural services, ones which make it easier for researchers to preserve, share, and find data.

Structural biology is a discipline within the life sciences, one that investigates the molecular basis of life by discovering and interpreting the shapes and motions of macromolecules. Structural biology has a strong tradition of data sharing, expressed by the founding of the Protein Data Bank (PDB) in 1971. The culture of structural biology is therefore already in line with the perspective that data from publicly funded research projects are public data.

This review is based on the data life cycle as defined by the UK Data Archive. It identifies six stages: creating data, processing data, analysing data, preserving data, giving access to data, and re-using data. For clarity, ʻpreserving dataʼ and ʻgiving access to dataʼ are discussed together. A final stage to the life cycle, ʻdiscarding dataʼ, is also discussed.

The review concludes with recommendations for future improvements to the IT infrastructure for structural biology.

