Skip to Main Content

Research Data Management (RDM)

Spreadsheet Best Practices

  • Top row should be headers with labels.
  • Include a README or a data dictionary to explain labels (see Documentation Tab)
  • Each row under that is a single record.
  • Each column is a single variable.
  • Every column should be consistent.
    • All numbers should have the same number of decimal places. All dates should be formatted consistently. All text fields should use controlled vocabulary. All coded values should be consistent. Etc.
  • Don't use color or comments to add meaning, this formatting will not migrate well into new formats and can easily be misunderstood. 
    • Instead, add another column with the information you want to note
  • Don't leave cells empty.
    • Have a method for noting "No Answer", "Null", or "Missing" values so these cells are not mistaken as zeros or otherwise misinterpreted.
  • Put notes in a separate file.
    • The point of a spreadsheet is to have your data organized neatly so you can run calculations on it easily and/or reorder and filter your data. Putting notes in the document limits your ability to do this. 
  • Double check your dates, numeric fields, and gene names
    • Excel has many bad habits related to dates and numeric fields. It auto-formats fields that it perceives to be dates. If you're using a Mac to open an Excel file that was created on a Windows machine, you might notice a 4-year variation in the dates. A recent study found a large number of papers have gene name errors where Excel converted the names of some genes to dates. Excel does not support dates prior to January 1, 1900 (or 1904 if using a Mac).

Credit

Grateful acknowledgement to the University of Pennsylvania Penn Libraries for their permission to use and modify their template: Data Management Resources

MCW Libraries
8701 Watertown Plank Road
Milwaukee, WI 53226
(414) 955-8300

Contact Us
Locations & Hours
Send Us Your Comments