Top tips for best practice
Get it right from the start. Gill Baxter explains why in this short video produced by the Centre for Digital Scholarship.
Further videos discussing the importance of research data management are available from the Centre for Digital Scholarship at https://vimeo.com/user15519350
Take a look below at our top tips, over time this will become second nature.
A data management plan (DMP) is a document that describes:
- What data will be created
- What policies will apply to the data
- Who will own and have access to the data
- What data management practices will be used
- What facilities and equipment will be required
- Who will be responsible for each of these activities.
It will enable you think about and plan what you need to have in place to manage your data safely and efficiently.
All new PhD students must start a DMP alongside their academic Needs Analysis. The DMP must be updated and reviewed before every progression review. Even if you are not doing a PhD you might still find writing a DMP useful.
Take a look at our Research Data Management: Data Plan for your PhD as this offer advice on creating a basic DMP with links to further guidance (suitable for all).
Documentation is important in good data management.
Documentation introduces your data, provides a detailed description of their key attributes, and contextualises them. Your documentation should describe what you did and why you made the choices you made.
Documentation can be written at many “levels” and comes in many forms (as described below). Together, all of the documentation associated with a research project should answer a series of important questions (although any particular piece of documentation may only answer a subset of those questions).
- What was/is the context of data collection (empirical, theoretical, and/or normative)?
- How did you generate/collect the data?
- In what form are the data (e.g., “interview transcript”)?
- What are the data about (i.e., what is your research about)?
- How are the data formatted, structured, and organised?
- How did you transform or manipulate the data (e.g., modify format)?
- How did you validate/assure the quality of the data?
- What ethical or legal limits (e.g., confidentiality, copyright) are there on access to/use/re-use of the data?
- (Perhaps) How did you analyse the data?
Depending on the type of research you are doing you may want to consider the use of:
- Research or laboratory notebook
- Experimental protocols
- Readme files
This content has been reused from https://managing-qualitative-data.org/modules/2/a/
Take a look at the Data Documentation video produced by the University of Edinburgh at https://media.ed.ac.uk/media/1_16xz4vid
Files are commonly saved within a folder structure. You should consider whether one big-flat folder for all your files or a hierarchical tree structure would be the most appropriate for the piece of work or project you are doing. A complex structure can encourage the use of shorter less meaningful file names that are dependent on that structure. This may mean that when the folder structure is removed, for example when you provide your data to a collaborator, the file names may have little or no meaning. To avoid this try to use names that match your environment and contain:
- Something meaningful to you (such as what you are doing with the file)
- Something meaningful to someone else (such as an experiment number or project name.
Develop a system for file naming that works for your project or work, use it consistently and make sure it is part of the assigned metadata.
Read the UK Data Service guide: Organising data.
- How will you name your files and describe your data so you and others can easily locate/re-use the data?
- Is there a discipline/community standard?
Filename Example Limit file names to 32 characters 32CharactersLooksExactlyLikeThis.csv Don’t use special characters or spaces NO email@example.com
NO name.date VI .2.txt
Use versioning NO ProjID_latest.txt
Use leading zeros in sequential numbering to allow for multi-digit versions
For a sequence of -10: 01-10
For a sequence of 1-100: 001-010-100
Don’t use generic data file names that may conflict when moved from one location to another NO MyData.csv
It can be difficult knowing if you are using the latest version of document. You can develop a strategies to help you manage this, including the version in the file name. This can be done in any of the following ways:
- the date recorded in the file name or within the file, for example HealthTest-2008-04-06
- version numbering in the file name, for example HealthTest-00-02 or HealthTest_v2
- a file history, version control table or notes included within a file, where versions, dates, authors and details of changes to the file are recorded
Filename Description LiteratureReview_1.0 Original document LiteratureReview_1.1 Minor revisions made LiteratureReview_1.2 Further minor revisions LiteratureReview_2.0 Substantive changes
Take a look at the UK Data Service version control strategy
Backup regularly using robust storage location (not a USB storage stick or external hard drive)
- Also known as My Documents
- Use for all research data containing personal or sensitive data
- Backed up every 2 hours (kept for 30 days)
- Four hourly snapshots (kept for 90 days)
- To access network storage from Linux machines see: http://linuxdesktops.soton.ac.uk/mount.html
OneDrive through Office 365
- Cloud based storage
- 5TB of Storage, max file size 10GB
- Restrictions on filenames
- All data held in secure centres which are within the EEA
- Do NOT use personal OneDrive
- Also known as My Documents
This completes the managing data section. To complete the course please take the end of module quiz to check your understanding.