Hello, my name is Simon Bungers, and I’m the founder and CEO of labfolder, an electronic lab notebook built by scientists, for scientists. Although I usually spend my days being the chief promoter of the electronic laboratory notebook our team has created, this article is NOT about labfolder. Rather, it is an attempt to create a comprehensive guide to the electronic lab notebook (ELN), in. It’s the most user-friendly electronic lab notebook in the industry. Request a demo! Upgrade from your outdated ELN to Benchling’s cloud-based digital lab notebook. It’s the most user-friendly electronic lab notebook in the industry. Request a demo! Collaboration and cross-learning.
![Electronic lab notebook template Electronic lab notebook template](https://www.pantechsolutions.net/media/catalog/product/cache/1/image/600x600/9df78eab33525d08d6e5fb8d27136e95/f/a/fake-news-detection-using-machine-learning.jpg)
I was interested to read C. Titus Brown‘s recent post, “Is version control an electronic lab notebook?”
I think version control is really important, and I think all computational scientists should have something equivalent to a lab notebook. But I think of version control as serving needs orthogonal to those served by a lab notebook.
As Titus points out, a traditional lab notebook serves two purposes: provenance and protocol. Version control could be useful for provenance, but I don’t really care about provenance. And for protocol, version control doesn’t really matter.
Version control
I really like git with github. (See my tutorial.) But for me, the basic need served by version control is that embodied in the question, “This shit worked before; why isn’t it working now?”
You don’t want to edit working code in place and so possibly break a working system. Version control lets you try things out, and to try something out in any version of your software, from any point in time.
The other basic use of version control is for managing projects with multiple contributors. If there are multiple programmers working on a software project, or multiple authors working on a manuscript, version control is the best way to manage things, particularly for merging everyone’s efforts.
These are really useful things, but version control is more about merging and history and not so much reproducible research.
Make is the thing
To me, the basic tool to make research reproducible is GNU make (see my minimal tutorial). You create a Makefile that documents all analysis steps in a project. (For example, “Use this script to turn these raw data files into that combined file, and use this script to create figure 1 and that script to create figure 2, then combine them with this LaTeX file to make the manuscript PDF.”)
With GNU make (see also rake), you both document and automate these processes. With well-documented/commented scripts and an all-encompassing Makefile, the research is reproducible.
Add knitr, and you’ve got a notebook
The other ingredient to create the computational scientist’s equivalent of a lab notebook is knitr, which allows one to combine text (e.g., in Markdown or asciidoc) and code (e.g., in R) to make documents (e.g., in html or PDF) that both do the work and explain the work. Write such documents to describe what you did and what you learned, and you’ve got an electronic lab notebook.
You could even get rid of your Makefile by having an over-arching knitr-based document that does it all. But I still like make.
But it’s so much work!
Going into a file and deleting a data point is a lot easier than writing a script that does it (and also documents why). But I don’t think you should be going in and changing the data like that, even if it is being tracked by version control. (And that is the main complaint potential users have about version control: “Too time consuming!”)
I think you have to expect that writing well-documented scripts and knitr-based reports that capture the totality of a data analysis project will take a lot of work: perhaps double (or more!) the effort. But it will save a ton of time later (if others care about what you did).
I don’t really want to take this time in the midst of a bout of exploratory data analysis. I find it too inhibiting. So I tend to do a bunch of analyses, capturing the main ideas in a draft R script (or reconstructed later from the .Rhistory file), and then go back later to make a clean knitr-based document that explains what I was doing and why.
It can be hard to force myself to do the clean-up. I wish there were an easier way. But I expect that well-organized lab scientists devote a lot of time to constructing their lab notebooks, too.
A simple experiment manager for deep learning experiments
labnotebook
allows you to:
- flexibly save all your experimental data in a postgres database through a very simple interface, including configuration, models, results, and training curves.
- monitor any indicators from your running experiments by streaming them through a web application:
- access all this data forever through the web app, through sqlalchemy, or through traditional sql text queries.
All you need to do is to modify your code to include labnotebook.start_experiment()
and labnotebook.stop_experiment()
and pass the info you would like to save to the database as arguments. As an option, you can save information for each training step by using labnotebook.step_experiment()
.
You can see a very simple example notebook here.
Another example of how to log while training a ConvNet in PyTorch is here.
Why labnotebook?
In the life sciences, scientists write everything in their lab notebooks. I wanted a similar permanent store for my PyTorch experiments that allowed me to:
asynchronously look at what was going on. TensorBoard obviously provides excellent functionality, albeit with an interface and storage system that I didn't especially like. It's very hard to keep track of all the indicators of old experiments and to compare them to newer experiments.
store everything forever in a queryable database. Sacred provides some of this functionality, but the interface is complex and inflexible. In addition, I think experimental data is relational data intermixed with nosql data, and postgres is better adapted to the type of queries for this kind of experimental data.
For a quick read on the tech stack choices I made, check out my blog post.
Set up a postgres database:
Follow the detailed installation guides, create your database, and make a note of your database's url. It's usually of the form postgres://<username>:<password>@localhost/<databasename>
.
Note you need version 9.4+.
Install labnotebook:
![Electronic Lab Notebook Machine Learning Electronic Lab Notebook Machine Learning](http://g02.a.alicdn.com/kf/HTB1f88oMpXXXXcBXpXXq6xXFXXX8/227784035/HTB1f88oMpXXXXcBXpXXq6xXFXXX8.jpg)
Clone the repository:
Enter the directory and install labnotebook locally:
Start the API:
Once you've installed the package, you can run the following command on your database url to start the API:
Start the webapp:
Navigate to the frontend
directory and serve it; for example using python 3's http.server:
Then open the serving address and port, typically http//localhost:8000
if you're using the python server.
A simple example notebook is available here.
A more realstic example, training a convolutional neural network on MNIST in PyTorch and logging with labnotebook
is available here.
This is a very early alpha version of the tool that I'd thought some people might enjoy. I haven't tested it on older browsers or frameworks.For now I've tested this only on Ubuntu, with PyTorch-style experiments, using chromium.I'm happy to get any feedback of how this runs on other platforms!
Contribution!
I'm by myself working on this and there are bugs galore. Any contributions are welcome; let me know if you would like to contribute but not sure how to go about doing it, and I'll walk you through it!
The front-end of this project uses VueJS, Vuetify and Highcharts.
If you like this and want to be updated on what I'm doing, follow me on twitter?