Whenever people ask me for advice for graduate school, my first response is to have an organization system that works for you. Let me tell you why.
It is the fall of 2021. I am in my last med school rotation, anxiously awaiting residency interview invites. I get an email from my PhD advisor – “Hey we are working on a grant. Did you ever treat cells with [this reagent] and measure [this protein expression]?” I had completed my PhD more than 2 years prior, but I was able to go to a single excel document, search for a few key search terms, and see if anything came up. I did not have to go through an endless search through folders on my computer or say “I know I did it I just don’t have the data anymore”.
While this was after my PhD was done and I was no longer obligated to help with my lab’s grants (#academiaproblems – don’t work for free!), I had many similar experiences during my PhD years where more recent data had me re-thinking previous data and I was glad that I was able to go back to easily find the data from years before. I was also able to go back and find other useful information like how I troubleshot certain problems. My organization system also came in use for finding reagents in the lab and limiting my time searching through -80 C freezers and liquid nitrogen stores.
This is not to say that my organization system was perfect – I certainly learned from many mistakes – but it still highlights the fact that some extra time spent documenting will save you time (and your lab money) in the long run.
While electronic lab notebooks were becoming more widely used during my graduate school time, I began with a physical notebook and stuck with it throughout. I did use an overall hybrid way of organizing my data, however, since most data these days are acquired in an electronic format.
Some people choose to organize their data by project, but what worked best for me is to just put them in chronological order. I saved the first few pages of each notebook for a table of contents that I initially filled out by hand but by the end would wait until the notebook was full and then I would type up the table of contents, print it out, and tape it in (way easier to do and read than doing it by hand). The only exception is that I ended up putting all of my mouse genotyping data in a separate notebook that I could bring with me to the mouse house to make sure I was correctly marking cages.
In addition to the physical notebook and table of contents, I ended up making an excel document as a searchable table of contents where each sheet corresponded to a different physical notebook, then I had columns for page number, date, and then a shorthand description of what was included on the page that included the technique (western blot, qPCR, etc.), the samples used, and the read outs. For example, “Western Blot – WT and KO fed/fasted mice – P-ERK, ERK, GAPDH”. This way, I would search for all pages with western blot data, all pages with data for my fed/fasted study, or specific read outs such as all pages with P-ERK expression. Having a consistent naming system allowed me to rapidly identify the required data, as I mentioned in the example in the intro.
As I approached the end of my PhD and dissertation writing was nigh (thus I was spending less time in the lab itself), I also ended up taking photos with each page of my physical notebook so that I could reference it from home. This was helpful so that I could see my notes about each set of data in addition to seeing the raw data itself on my computer. This was especially useful to find western data for the same samples but where the raw data may be collected on different days (see how next to each western image below I include a date in a numbering format described below – this helped me find the raw data on my computer). As shown below, each gel/set of gels that I ran had its own page within my notebook where I would put all of the info together. Admittedly, I could have included more info about my westerns such as the concentration of antibody used, whether it was in BSA or milk, etc. but I honestly got a little lazy.
Just as I filled out my lab notebook chronologically, so too did I keep my raw data in a chronological format on my computer. I used a naming system for my folders and documents that made it so that my documents showed up in chronological order whether I went through the folders or I searched my computer for a certain document. This format involved a year-month-day numbering – the absolute best numbering format (shout out to my undergrad lab for teaching me this!) Unfortunately, I don’t have access to the server that holds my raw data anymore, but below is an example of how the folders were structured. Within my data folder, there was a folder for the year. Within this, each month had a folder labeled with the year then month and “XX” to indicate that any day was included. Within each month folder, there were folders for each day that I had data to include, which as you can see below sorted appropriately in order. Then within each of the day folders, I had all of the raw data collected on that day.
When it came to analyzed/synthesized data for papers, I then sorted the data separately from my chronological, raw data folders. Each paper had its own top level folder. Within that folder, there were folders for figures, drafts, etc. as well as a folder for data. Since I did mostly mouse experiments and these mouse experiments ended up generating quite a bit of data, I mostly organized this data folder by experiment type (if mouse, then by sex, age, type of treatment, length of treatment, etc. or if cell culture then cell type and treatment). Within each experiment folder, I organized data by type such as protein expression (western blot), mRNA expression (q-RT-PCR), histology, etc.
In addition to organizing my experimental data, I also had a system to organize supplies and know exactly where certain things could be found. This was particularly important for items kept in very cold places (-80 freezer and liquid nitrogen especially) because it is absolutely no fun to spend endless time searching in these places for something that you need. 🥶
Our cell line stocks were stored in liquid nitrogen in 9×9 or 10×10 boxes. We had enough stocks for some lines that I had entire boxes dedicated to tubes of HepG2 or HEK293T cells, for example. However, even within these boxes there were variation of the passage number of the cell stocks. With the help of my lab mates, I created an excel document that was kept on our shared server where we recorded the EXACT location of each cell line stock. The document was structured where each sheet was a different box – numbered rack 1/box 1, rack 1/box 2, etc. Within each sheet, there was a row for each coordinate in the box: row A/column 1, row A/column 2, row A/column 3; etc. It was a lot of work up front to go through and put the information into the excel document (it helped having 1 person to look at the tube, keeping the whole box on dry ice, while another person input information in the excel document to minimize the time the tubes were out of the liquid nitrogen). Importantly, as a result, I could know that the tube I wanted was in this box number on this rack, and once I got that box out I could know that the tube was in this specific coordinate within the box! The challenge was keeping this updated as tubes were removed or added (usually folks updated), but at least it gave us a starting point to try to make the whole process easier. I only had to recheck all the tubes/update the excel sheet once during my 5 years in the lab.
Working with mice, I accumulated a lot of tissue samples that had to be stored at -80 C. To keep track of all of these (and to combine some data for my projects), I also had an excel document. I had one sheet dedicated for a table of contents, which box # had which experiment’s samples and which -80 C freezer was that box located (we had both a chest and an upright freezer). Then each box had its own respective sheet within the excel document to outline more info about its content.
One particularly useful thing I did with this is that I actually included all of the data about each mouse in this excel document. This included ear tag number, genotype, sex, date of birth, sac date, experimental condition (control vs treatment), as well as data collected at the time of sacrifice such as body weight, liver weight, adipose tissue weight, and any other info that was collected. Finally, I also marked which tissue were present in the box – including but not limited to liver, adipose tissue, gallbladder (as I was in a bile acid lab, we always collected this but I never actually did anything with it!) I did this for a couple of reasons – 1) I needed to organize the data somewhere, it might as well be in this document and 2) at some point when I left the lab, others could easily go to find out what I had and all of the data for it instead of having to go search through multiple lab notebooks to hopefully find all of the data organized by the date that the tissue were collected (as I unfortunately had to do for a few projects I worked on from previous lab members).
My lab did a lot of qPCR to measure gene expression, so we had a lot of primer stocks. I personally had 4-5 boxes of primer stocks that I had created during my PhD. Since these need to be in pairs (and things in pairs often find a way to become separated), I assigned each primer a specific location in these boxes. Just as I did with the cell culture stocks, I made an excel sheet that noted the exact location of each tube. In addition, I put the primer sequence in the excel sheets so that, again, all the info was accessible in the same place.
Now, one extra thing I did in this case that was helpful is that in addition to this electronic documentation, I also wrote the box # and coordinate on the top of each tube – for example, “Box 2 A8”. I bought a bunch of those little dot stickers to make it easier to label. This labeling was important because I was not the only person using these primers. Having the location labeled on the top made it easier for everyone else (and myself) to put these back where they belong.
Something that I always worked on doing (but never found a good, consistent format) was organizing the papers that I had downloaded and [sometimes] read. In the beginning, I tried using reference managers like Zotero (because it was free) however I wasn’t a fan of the format. I tried downloading the PDFs and organizing them into folders based on general topics, but then I’d forgot to check these folders and would end up downloading the same paper multiple times. Some of the main papers I references throughout my PhD were ones that I ended up printing out, reading, and highlighting multiple times without realizing I had already done so until I found another version of it already printed out at home. In the end, I was ok with this redundancy because it was often nice to go back to a paper I had previously read with a new outlook given my more recent data or based on other papers I had read. It wasn’t exactly efficient but there was still some positive aspect to it.
When it came to writing papers and my dissertation, what ended up working for me was to create a word doc where I would organize my references alphabetically by last name. As I was working on the draft of my documents, I would include the reference as (last name, year) in the text. Then, when I got ready to actually submit the manuscript, I would use a reference manager (End Note or Zotero) to actually put in the references and format them appropriately for each journal. A couple reasons I did this are 1) my lab typically wanted to do this using End Note and I didn’t want to buy it so my other option was to only use it on the computers at lab and 2) I had heard horror stories of references linked by reference managers getting messed up when sharing a document with someone. However, linking all the references at the end could be a very time consuming process (for my 100+ page dissertation draft that I wrote for my prelim exam, it took 6+ hours! I watched Grey’s Anatomy the whole time…) So when it came to my final dissertation, I actually just left my references in the format (last name, year) in the text and copied the text from my separate references document into the references section of my final dissertation document.
I hope this was a helpful way to think about organization as a grad student, whether you are just starting out or are near the end. Again, this is by no means a perfect way to organize and some of it was certainly time consuming, but I think it paid off in the end. I would love to hear how others kept organized during graduate school or if you have other advice for students starting graduate school!