Neurophysiological analytics for all ! Free open-source software tools for documenting , analyzing , visualizing , and sharing using electronic notebooks

X David M. Rosenberg and X Charles C. Horn Biobehavioral Oncology Program, University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania; Department of Neuroscience, University of Pittsburgh, Pittsburgh, Pennsylvania; Division of Gastroenterology, Hepatology, and Nutrition, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania; Department of Anesthesiology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania; and Center for Neuroscience, University of Pittsburgh, Pittsburgh, Pennsylvania


NEW & NOTEWORTHY
To boost reproducible research, neurophysiology analysis is performed with free, open-source software within the Jupyter electronic notebook.These notebooks use Python and R programming languages, with real experimental data, in the analysis of neural signals and generation of publication-quality graphics.This approach documents all steps of analysis, which can be shared in easily readable formats including PDF, HTML, and a web-based computer environment (an example provided) capable of regenerating the analysis directly from raw data.
NEUROPHYSIOLOGICAL RESEARCH is often complex, with a successful scientific workflow encompassing many different stages.Data acquisition is an important first step, but analysis, and its documentation, can often be more challenging.Many laboratories have unique workflows that rely on physical notebooks, data storage to computer hard drive and the cloud, and a vast array of software.For software usage, analyses typically include a host of commercial platforms, such as spreadsheets (e.g., Microsoft Excel) and more complex numerical and plotting programs, including MATLAB (The MathWorks), Maple (Waterloo Maple), Mathematica (Wolfram), Origin (OriginLab), IGOR Pro (WaveMetrics), SAS (SAS Institute), SPSS (IBM), SigmaPlot (Systat Software), GraphPad Prism (GraphPad Software), and others.Researchers are faced with several key problems: 1) keeping track of metadata and notes that are critical for understanding analyses; 2) financial costs of software; 3) barriers to sharing analyses between laboratories using different software; and 4) the lack of a clear way to document the analytic process.This last issue is critical for generating reproducible research, a target of current initiatives (Collins and Tabak 2014;Landis et al. 2012).
Recent developments permit documentation of complex analytics using electronic notebooks (Shen 2014).One actively developed approach is the Jupyter notebook (Table 1), which is free and open source (Perez 2015;Ragan-Kelley et al. 2014).This approach was originally called "IPython" notebook (interactive Python; Perez and Granger 2007) but is now named "Jupyter" for the computer language-agnostic features, including the capability to use over 40 programming languages (Table 1); the name is a combination of three languages: Julia, Python, and R. The present report demonstrates how to use Jupyter notebooks to perform many analytic and plotting functions using Python and R libraries.This approach produces a sharable visual representation of the sequential analytic steps for each neurophysiology project, including metadata and notes-a significant impetus to achieving the guidelines for reproducible research (Sandve et al. 2013).

APPROACH TO DEMONSTRATING THE METHODOLOGY
We begin by discussing key aspects of notebook usage, with reference to example notebooks provided from a GitHub/ Binder installation or by downloading Data Supplements 1-3.1 Binder (http://mybinder.org)can run the code and access the raw data located on GitHub as a temporary executable environment, with no software installation requirements; to initiate, access the GitHub repository (https://github.com/cchorn/Neurophysiological-Analytics-for-All) and launch the Binder link located on the README document.Because Binder is interactive, the reader can also test new data and code; the environment is temporary, and any changes made will be lost when notebook webpages are closed.The secondary method for running notebooks is to download Data Supplements 1-3.After Jupyter and its dependencies are installed (see below), the supplied Jupyter notebook files (plain text written in JavaScript Object Notation, JSON, with the ".ipynb" extension) can be rendered in an internet browser (see SETTING UP THE SOFTWARE ENVIRONMENT).After running a notebook, a folder entitled "OUTPUT_FILES" will be created containing analytic results, e.g., SVG (scalable vector graphics) image files.Finally, each Data Supplement contains an HTML notebook copy, viewable in a browser (HTML copies are also automatically produced from notebook code, which demonstrates how to create a sharable report).The README.md file in each Data Supplement is a text file written in markdown (see below), containing a list of critical software dependencies, files, and folders.
The data used to show notebook methods originate from projects using the musk shrew, a small-animal model for testing physiological mechanisms of the vomiting reflex (note that laboratory rats and mice lack this reflex; Horn et al. 2013); data include unpublished recordings of gastric vagal afferent activity and a published report on the effects of gastric electrical stimulation on emesis and other physiological parameters (i.e., EKG data;Horn et al. 2016).We use emetic stimuli to activate vagal afferents, including gastric distension (e.g., spike train analysis notebook, see below).The goal of these experiments is to define the role of specific vagal signals in emetic activation, which could lead to the development of novel therapeutic strategies to control chronic nausea and vomiting in humans, for example, those patients treated with cytotoxic cancer chemotherapy.

SUBJECTS AND DATA COLLECTION
Electrophysiological data were derived from adult male musk shrews (Ͼ45 days of age, ϳ80 g in body wt), offspring from breeding stock obtained from the Chinese University of Hong Kong (a strain originating from Taiwan; Wang 1994).Individually housed shrews were fed a mixture of 75% Purina Cat Chow Complete Formula and 25% Complete Gro-Fur mink food pellets (Temple 2004).All animals were maintained on a 12:12-h light-dark cycle (0700 -1900 light period) in a facility accredited by the Association for Assessment and Accreditation of Laboratory Animal Care International; experiments were approved by the University of Pittsburgh's Institutional Animal Care and Use Committee.
Prior to surgery, food was removed for 2 h to empty the stomach.Shrews were injected with 1 g/kg urethane (ip; Sigma-Aldrich) to produce a surgical level of anesthesia, with additional injections if animals responded to toe pinch.Animals were placed on a regulated heating pad with a rectal probe to maintain body temperature at 37°C (CWE, Ardmore, PA).Heart rate was recorded with electrodes attached to the flanks.After an incision in the ventral neck, a tracheal tube was inserted to monitor intratracheal airway pressure (CWE; air pressure transducer), respiratory rate, and the occurrence of emetic episodes (Andrews et al. 1996;Uchino et al. 2002); the right carotid artery was cannulated to measure blood pressure (Kent Scientific, Torrington, CT).After an abdominal incision, the stomach was exposed, incised skin edges were retracted and elevated, and the body cavity was filled with mineral oil to prevent drying and to insulate stimulation electrodes (Pt-Ir, 50.8-m diameter; A-M Systems, Sequim, WA) for producing compound action potentials; two electrodes were placed on the abdominal vagus, 0.5 cm apart.For stomach distension, a balloon made of latex condom material, fixed to PE tubing (4-mm diameter), was positioned in the gastric lumen through a 5-mm incision in the lateral fundus and secured in place by a purse-string suture.Balloons were filled at a rate of 8 ml/min with 0.15 M NaCl (37°C) with a syringe pump (Kent Scientific) operated by a computer (Power 1401 and Spike2, version 7 software; CED, Cambridge, UK).
To record vagal responses, a custom-designed platform was positioned beneath the left cervical vagus and insulated with mineral oil (Horn 2014).To isolate neural units, bundles of nerve fibers were teased from the trunk with fine forceps (WPI, no.555227F) and placed on electrodes (Pt-Ir, 50.8-m diameter; A-M Systems).Nerve signals were amplified by 20,000 and band-pass filtered, 100 Hz and 1 kHz (P511 differential pre-amp and high-impedance headstage; Grass Technologies, Warwick, RI).Electrical stimulation of the vagus was produced by trains of 10 pulses (biphasic, 0.5 ms, 2 mA, 500-ms interpulse interval; isolated pulse stimulator, model 2100, A-M Systems).All raw signals (vagal, EKG, and blood pressure) were acquired to computer (Power 1401 and Spike2; CED).

SETTING UP THE SOFTWARE ENVIRONMENT
Jupyter can be installed on Windows, Mac OS X, and Linux operating systems (Table 1).An easy method for installing Jupyter and relevant Python and R packages is through the Anaconda software download from Continuum Analytics (currently a free installation; Table 1).Anaconda includes Ͼ300 commonly used Python packages.R and its essential packages, Ͼ80, can also be installed by using Anaconda from the command-line interface (Table 1).The command line is commonly used for software updates and administration; readers not familiar with this approach, and other general scientific computing topics, are referred to the excellent resources provided by Software Carpentry, a volunteer nonprofit organization (http://software-carpentry.org and http://swcarpentry.github.io/shell-novice) (Wilson 2014).All of the critical free opensource software for producing the analyses included in the present report is listed in Table 1.
Users can also install R and Python packages from the command line, an important method for obtaining packages beyond Anaconda's repository.The "pip" Python package manager is used to install individual packages (Table 1).Similarly, R can be obtained from the R project website (Table 1) and its packages installed from the R command line; e.g., the command "install.packages("ggplot2")"installs the "ggplot2" package.The Windows operating system also needs an installation of Python (Table 1; Mac OS X and Linux systems have Python installed by default); the reader is directed to tutorials for installing Python on Windows (http://docs.python-guide.org/en/latest/starting/install/win;https://docs.python.org/3.5/using/windows.html).
One critical issue in using packages is version control because functionality may differ between versions, complicating the execution of code.At the top of our notebooks we include code that lists the package versions used, which can be compared to the "README.md"text files containing the essential packages needed for running these notebooks.Furthermore, it is possible to establish a specific software environment, essentially duplicating the environment we used for running these notebooks (http://conda.pydata.org/docs/using/envs.html).Similarly, one can also install specific versions of Python in these environments (http://conda.pydata.org/docs/py2or3.html);Python 2.7 was used in creating our notebooks.The required environment for running our supplied notebooks can be quickly accessed, with no software requirements, by launching the Binder (https://github.com/cchorn/Neurophysiological-Analytics-for-All).
The analyses in the present report were conducted on a computer using the Debian distribution of the Linux operating system (http://www.debian.org).The Debian distribution has some advantages for analytic work because of the large number of packages available through its manager (Ͼ43,000).Debian also has a unique repository of neuroscience-related software called NeuroDebian, including suites for electrophysiology, modeling, and image analysis (http://neuro.debian.net)(Halchenko and Hanke 2012).Running Linux could be a barrier for some users, especially for those limited to one computer who need another operating system for specific programs (e.g., Microsoft Office suite).Although beyond the scope of the present report, an easy solution is to run a second operating system in a virtual environment using VirtualBox (https:// www.virtualbox.org).

STARTING THE NOTEBOOK AND ENTERING CONTENT
Jupyter can be started from the command line by using the "jupyter notebook" command (or through the Anaconda graphical user interface) to initiate a session in the internet browser showing a file directory (Fig. 1A).From here, notebooks can be created with different kernels, such as Python and R (Fig. 1A; Table 1).In Jupyter, a kernel is the interactive process that runs code in a specific language and returns the output for the user to see (http://jupyter.org).Several types of data can be entered into notebook cells (using a menu selection), including markdown, raw text, headings, and computer code (Fig. 1, B-D; Data Supplement 1: Notebook content).Markdown is a simplified text notation system, which can be converted to formatted HTML containing headings, bold, italics, bullets, etc. (http://daringfireball.net/projects/markdown). Links to websites and other notebooks can be included in a notebook cell; for example, a link written in markdown using the syntax "[R Statistics](R_Statistics.ipynb)" will be rendered as "R Sta-tistics" by running the cell (Fig. 1B).Image files can be displayed by using code to import the image file (Fig. 1D).Lengthy output of notebook cells can also be collapsed into a reduced window by double clicking the left margin of the cell; there are notebook shortcut commands for menu functions (https://ipython.org/ipython-doc/1/interactive/notebook.html).A Jupyter notebook example is available to testing from the Project Jupyter website (https://try.jupyter.org),for temporary creation of notebooks and other files.

IMPORTING AND EXPORTING DATA
We review here the import of text and electrophysiological signal data files (Data Supplement 1: Notebook content).Text files are most easily manipulated as comma-separated value (CSV) files, which can be accessed and created with spreadsheet programs (e.g., Microsoft Excel and "Calc" from Libre Office, a free open-source office suite, https://www.libreoffice.org)and text editors (Windows Notepad, Atom editor, https://atom.io,etc.).Several packages exist to import data directly into Python or R. CSV files can be imported into the notebook with the "pandas" Python package for data structures, e.g., with the command "pd.read_csv('filename.csv')" (McKinney 2010); another method is the CSV module in the standard Python library (https://docs.python.org/2/library/csv.html).The "pandas" package can also format data matrixes by transposition, cutting columns and rows, and combinations (Table 1).Data can be exported to several types including a spreadsheet (e.g., Microsoft Excel, Data Supplement 1: Notebook content).Similarly, the "read.csv("filename.csv",header ϭ TRUE)" command is used to import CSV files into R.
Numerous readers for electrophysiology files are available from the Python "Neo" package; these include import functions for Alpha Omega, Ascii, Axon Instruments, BrainVision,  Brainware, KlustaKwik, HDF5, MATLAB, Neuroexplorer, Plexon, Spike2, TDT, and other file formats (Table 1, Garcia et al. 2014).We show this process using a Spike2 CED file (Fig. 2; Data Supplement 1: Notebook content).Although Fig. 2 could appear as overly complex in a notebook, it does offer the ability to precisely control the import of data at each step.If users encounter problems with a block of code, they can comment out parts by using the "Ctrl" ϩ "/" keys, which places a "#" symbol at the beginning of each line.Once code has been thoroughly vetted it can be externalized as a Python script (a ".py" file) and called into the notebook with a short command (e.g., Data Supplement 3: Compound action potentials).In addition to "Neo," the Python "NumPy" package is required for importing electrophysiology files (van der Walt et al. 2011).Our import of the file "nerve_signal.smr"shows how to check the number of data points, time in seconds, minutes, and hours, the sampling rate, and a plot of part of this signal file (Fig. 2, A and B).We then save these data to the HDF5 format (Fig. 2C; https://www.hdfgroup.org).HDF5 and similar formats are growing standards in the scientific community for data sharing (Eglen et al. 2014;Jayapandian et al. 2015;Ray et al. 2016;Teeters et al. 2015).HDF5 is a hierarchical data format, with space for metadata, which makes it possible to work with large file formats because data segments are loaded into RAM on demand; a free open-source viewer and editor for these files is available (Table 1; HDFView).

DATA ANALYSIS
Spike sorting and train analysis.Rapid advances in recording devices and computational approaches have allowed for the recording of many dozens of neurons simultaneously.These recordings typically require sophisticated spike sorting processes to delineate the activities of putative single units.Therefore, a laboratory may experiment with different sorting methodologies over a short period of time, many of which are available as open-source resources (Carlson et al. 2014;Hill et al. 2011;Pouzat and Detorakis 2014;Quiroga et al. 2004).The compatibility of the Jupyter notebook with numerous programming languages makes it ideal for implementing disparate spike sorting algorithms, enabling side-by-side comparisons.Our spike sorting examples use "OpenElectrophy," an opensource Python suite (Table 1 and Data Supplement 2: Spike sorting and train analysis) (Garcia and Fourcaud-Trocme 2009)."OpenElectrophy" contains processes for signal filtering, spike detection, feature extraction, and clustering.
After the formation of single units, we implement a series of algorithms to evaluate the quality of individual clusters.These processes are based upon UltraMegaSort 2000 (http://physics.ucsd.edu/neurophysics/software.php),where the spike timestamps and waveforms of each cluster are evaluated for physical constraints (e.g., interspike intervals that are less than a standard refractory period) and waveform similarities to other clusters (Hill et al. 2011).While the original UltraMegaSort 2000 algorithms were written in MATLAB, we have implemented these functions in Python.The notebook allows for the validation of clusters immediately after spike sorting, while also recording the resultant cluster quality output.At the end of our spike sorting notebook (Data Supplement 2, "Spike-Sorting.ipynb"or the HTML copy), we display the tables of the quality metrics associated with each cluster.This setup provides a consistent set of standards for a spike sorting workflow, with the ability to quickly scrutinize the results of various sorting parameters in a given data set.In Fig. 3 we show graphics from our spike sorting (generated by the notebook) as well as the quality metrics (Fig. 3D).These graphics were developed with custom code, naturally extending the capabilities of "OpenElectrophy" in the notebook.
We then analyzed the activation of single units during stimulation.In our studies, we are interested in how single Fig. 2. Python code for importing electrophysiology data using the "Neo" package (Table 1; Garcia et al. 2014); screenshots selected from the notebook in Data Supplement 1. A: an example data file from a data acquisition system is imported (Spike2, CED); many other data formats can be imported with "Neo" (Table 1).After import, the data are checked for the number of data points, time, and acquisition frequency.B: an additional check is performed by displaying the data and selecting a time range, using the "matplotlib" package in Python (Table 1).C: data are then saved to an HDF5 file format for sharing (e.g., Teeters et al. 2015); a time range can also be selected.Note that the HDF5 file can be imported back into the notebook (Data Supplement 2, spike sorting) and the original imported data can be used for additional analyses (Data Supplement 3, compound action potentials).
units encode gastric mechanical distension and the subsequent onset of emesis.In our notebooks, we seamlessly integrate the R package "Spike Train Analysis with R" ("STAR") immediately after spike sorting in Python (Pouzat and Chaffiol 2009).One can either create an entirely new notebook that runs an R kernel or use the "R Magic" command supported by the Python package "rpy2" to run R code in a notebook of a different kernel (Data Supplement 1; Table 1).Magic commands are special commands that control the notebook kernel (https:// ipython.org/ipython-doc/3/interactive/magics.html).We use STAR to analyze the spike train associated with cluster 2 from our spike sorting example in a separate notebook using the R kernel (Data Supplement 2, "Spiketrain-Analysis.ipynb" or the HTML copy).Figure 4 shows two graphics generated with "STAR," showing the spiking activity of cluster 2. Stimulus presentation (inflation of the intragastric balloon) started at 60 s.Regardless of the user's choice of Python and R packages, the analytic sequence can be combined (and documented) in the final output file.
Compound action potentials.A common task in electrophysiology is the analysis of evoked potentials.In this section we demonstrate this in a notebook using Python packages and custom scripts to create a quantitative analysis of area under the curve.Our example data are derived from electrical stimulation of the abdominal vagus in a musk shrew in vivo preparation and recording the compound action potential from the cervical vagus.A stimulation trial occurred as a sequence of 10 biphasic stimulation pulses (see SUBJECTS AND DATA COLLECTION).Data Supplement 3 (Compound action potentials) contains a notebook for two trials of 10 pulses each ("Compound-Action-Potentials.ipynb" or the HTML copy).
We also demonstrate how to externalize a series of Python commands that might otherwise make the notebook difficult to read.These commands are written in a text file using Python code, "pulses_10_electrical.py,"which can be called from within the notebook by using a magic command "%run -i pulses_10_electrical.py" (Data Supplement 3).This notebook file takes the raw data from the nerve signal import (similar to Fig. 2), parses the data into segments based on the occurrence of the timestamp of each electrical stimulation, averages the compound action potential for 10 traces, creates time bins based on nerve conduction velocity, and exports these data to a new text file that contains area under the curve measures for each bin (Data Sup-plement 3). Figure 5 shows plots produced by this notebook for the first trial of 10 pulses of electrical stimulation.

STATISTICAL ANALYSIS
The extensive capabilities of R for statistical analysis are available in Jupyter notebooks by running the R kernel (Table 1) or by using the "rpy2" Python package that produces a bridge to R (Table 1).Use of the R kernel approach is shown in Data Supplement 2, "Spiketrain-Analysis.ipynb."Here we Fig. 4. Spike train analysis performed with the R "STAR" package (Spike Train Analysis with R; Pouzat and Chaffiol 2009) (Data Supplement 2).A: peristimulus time histogram (PSTH) for the spike timings of cluster 2 from our spike sorting example (Fig. 3).Gray region indicates the inflation of the intragastric balloon (4 ml), with distension lasting for the duration of the recording.B: graphical representation of accumulated spikes from the same spike train in A. x-Axis shows the timing of the spikes, whereas black line indicates the accumulation of the spikes throughout the recording.The straight diagonal line is a reference.Fig. 5. Analysis of the compound action potentials recorded from the cervical vagus after electrical stimulation of abdominal vagus in the musk shrew; these plots were produced by the Jupyter notebook in Data Supplement 3 with the Python "matplotlib" package (Table 1).A: 10 compound action potentials produced by 10 electrical pulses (0.5-ms pulse duration, biphasic) each separated by 500 ms; evoked activity can be seen between 40 and 80 ms compared with spontaneous activity.B: average of the 10 traces from A. C: rectification of the averaged signal in B. D: after conversion of each time segment to a conduction velocity (m/s), the area under the curve (AUC) was computed using the composite trapezoidal rule in the Python "NumPy" package (Table 1).A Python script for this analysis is included in Data Supplement 3.
show the bridge to R by using the second notebook from Data Supplement 1, in which we also demonstrate descriptive statistics, an analysis of variance (ANOVA), and plotting in the "ggplot2" R package.
In Data Supplement 1, the file "R_Statistics.ipynb"(or HTML copy) demonstrates how to import a text data file using Python; these data are then transferred to R after loading the "rpy2" package (using the command "%load_ext rpy2.ipython").Data transfer from Python to R is accomplished by running the Jupyter magic command "%%R -i dataname", where "dataname" is the name of the data object in Python.Alternatively, to load a file directly into R, one can use the "read.csv("filename.csv")"command (shown in the notebook).Descriptive statistics for group means, standard deviations, and standard error of the mean are shown in the notebook using the "plyr" R package for grouping data.Next, data are analyzed with an ANOVA from the "ez" R package.The design was a 3 ϫ 5 factorial experiment (time ϫ condition), but the analysis indicated no statistically significant main or interaction effects (at a criterion of P Ͻ 0.05).Finally, we plot these data using the popular "ggplot2" R package, based on the book The Grammar of Graphics (Wilkinson and Wills 2005).
This section has only briefly explored the topic of statistical analysis in the Jupyter notebook; R packages for specific functions can be searched from a database (http://rseek.org).There is also a growing list of statistical packages for Python, including the "StatModels" module, and some of the functionality of "ggplot2" has been replicated in Python (Table 1).Currently the "ggplot2" (R) and "matplotlib" (Python; Hunter 2007) packages offer the most extensive plotting features, including detailed control of nearly all graphical elements.

PUBLICATION-QUALITY GRAPHICS
With appropriate packages, the Jupyter notebook is capable of displaying and exporting graphics in multiple formats, generating publication-quality graphics.Both "matplotlib" and "ggplot2" (Table 1) provide a high degree of control of graphical output, including conventional features, such as titles, labels, and axes, as well as legends and multilayered plots.All of the graphics in Fig. 3 (except Fig. 3D, which was generated with the "pandas" Python package) and Fig. 5 were generated with "matplotlib." We also export graphics to SVG format, which permits additional editing in image editors.SVG is a vector image format that stores a collection of objects rather than a fixed set of pixels, allowing for the scaling and manipulation of image elements without the loss of resolution typically experienced with bitmap formats (e.g., JPEG, PNG, and GIF).In image editors, such as Inkscape (http://inkscape.org;free and open source), one can modify individual components of an SVG image, such as fonts or line thicknesses.Inkscape is a powerful vector image editor that provides functionalities similar to proprietary software, e.g., CorelDRAW (Corel) and Adobe Illustrator (Adobe Systems).A free, open-source editor for bitmap files is GIMP (GNU Image Manipulation Program; https://www.gimp.org),which, similar to Adobe Photoshop (Adobe Systems), includes photo editing but also implements programming languages such as Python for scripted editing.
Another useful free package is ImageMagick (http:// www.imagemagick.org),with tools for image editing and file

SHARING
One of the primary strengths of Jupyter and the other free open-source tools is the substantial opportunities for sharing and collaboration (Fig. 6).A critical requirement for sharing is that files must be readable by others with few or no barriers in cost or time commitment.Our proposed pipeline (Fig. 6) produces many plain text files and other formats that are readable by software that exist on virtually all computers, e.g., an SVG graphics file can be displayed in an internet browser.HTML and markdown files are plain text files viewable in a browser (markdown will need a viewer installed).Jupyter notebook files are also plain text (written in JSON), which can be served to an internet browser using the free open-source Jupyter software (Table 1).Using the Python library "nbconvert" (included in Jupyter), a user can convert a dynamic ".ipynb" notebook into static file formats such as HTML, LaTex/PDF, and markdown (https://nbconvert.readthedocs.org).At the bottom of our notebooks, we include code to convert notebooks into HTML; the same operation can be performed from the command line.In addition, notebooks can be presented as slide presentations using the Jupyter menu option; a notebook can be exported as a HTML slide show and presented in an internet browser using the "Reveal.js"HTML presentation framework (http://lab.hakim.se/reveal-js).
The accessibility of file formats directly impacts the functioning of scientific teams and community sharing.For laboratory teams, Jupyter notebooks and their associated files can be established on a local server where members can edit and access the same notebooks.The JupyterHub application from Project Jupyter (https://github.com/jupyter/jupyterhub)provides this functionality.Project files can also be placed on the GitHub platform as a repository; GitHub (https://github.com) is a commercial company that offers web-based hosting for "git" repositories ("git" is a version control system), including free accounts and more extensive paid subscriptions, with hosting for private and public repositories.The analytics from the present report are placed on GitHub (https://github.com/cchorn/Neurophysiological-Analytics-for-All).
Although GitHub has a social coding focus, it also contains projects for hosting information through wikis and websites.GitHub can render Jupyter notebook files, allowing users to see a notebook without having to download the file and run a local Jupyter server; GitHub hosts many of the free open-source projects that are included in the present report, including Project Jupyter, "pandas," "NumPy," "matplotlib," "Open-Electrophy," "Neo," "ggplot2," and "ez," where users can download the latest versions and post "issues" when they discover a problem with the software.Feedback on posted issues to a GitHub project can come directly from the development team; for more general help, users can post questions to sites such as Stack Overflow (http://stackoverflow.com), a community of 4.7 million developers.The Jupyter project also has the nbviewer service that provides hosting for public notebooks (http://nbviewer.jupyter.org).

DISCUSSION
In this article we have shown examples of how the Jupyter notebook may be used in conjunction with Python and R to combine multiple analytic workflows into a single, sharable platform.Complex operations such as spike sorting, spike train analysis, and the processing of compound action potentials can be easily integrated into a concise sequence of operations.Importantly, Jupyter is a free, open-source project that incorporates the strength of Ͼ40 programming languages.Many of these languages contain libraries for statistical operations, signal processing, and even the analysis of neurophysiological data (e.g., "OpenElectrophy" and "STAR").Pooling these powerful resources allows for sophisticated manipulation of experimental data in a single environment, effectively streamlining workflows that normally require separate, often costly, programs.The user is able to optimize these packages with his or her own custom code to formulate very specific outputs, such as validation of single units immediately after spike sorting.Furthermore, packages like "matplotlib" and "ggplot2" can be used in combination with free software for image editing to produce publication-quality graphics.Our proposed workflow is displayed in Fig. 6, showing how all of these resources can be combined to create efficient analytics and graphics generation, which can ultimately be shared with team members and the scientific community.
Furthermore, using the free electronic notebook with multiple programming languages avoids the barriers of proprietary software that include cost, sharing, and customization of analysis.Purchased software is often restrictive in its functionality and saves analyzed data in exclusive file formats, making the sharing of methods and files between laboratories difficult.Project Jupyter overcomes these challenges by including features to convert notebooks to HTML, PDF, and other widely readable file formats.These electronic notebooks can be rendered on the increasingly popular GitHub website, allowing users to see the contents of the notebook without file download or software installation.By using these widely available platforms and formats, individuals can open their analysis to the feedback of the larger neurophysiology community.
However, it is important to recognize that the Jupyter notebook requires proficiency in general coding principles.Nevertheless, the many benefits of adopting an electronic notebook demonstrated in this report far outweigh this challenge.Some users may be reluctant to use coding in the analysis of neurophysiological data, preferring to operate solely from a graphical user interface.However, coding is a ubiquitous element in neurophysiology, even in proprietary software such as MATLAB, NeuroExplorer (Nex Technologies), and Spike2 (CED).These proprietary software suites include unique languages for building custom sequences of operations (scripts).Investigators who use this type of software will therefore, in many cases, also end up coding.Alternatively, users can spend time becoming familiar with more general-purpose coding such as Python, extending their capabilities beyond specific software.Furthermore, investigators can create a series of analytic workflows customized to meet unique needs of their studies.The code required for such workflows is often less complex than that contained in our supplied notebooks.For example, figures can be generated with much simpler statements.The function "plot" included in R is capable of generating a graphic with only a matrix of data (i.e., a dataframe) being the necessary input into the function.One can control specific details of a figure by providing additional arguments, as seen in most of our notebooks.The advantage of passing these additional arguments into the code is that exported figures require less manual editing in image-editor software.Investigators can choose the proper balance between detailed code and manual tuning of figures in order to meet their specific needs.
With proficiency in coding comes the issue of reliability of custom-designed algorithms in data analysis.It is crucial that investigators validate the output of their code with libraries that already exist.As an example, our metrics following spike sorting to assess cluster quality were modeled after those in UltraMegaSort 2000(Fee et al. 1996;Hill et al. 2011).We challenged the accuracy of our code by running the same cluster data sets across both the Python code and the MATLAB code developed by Hill et al. (2011).Scientists will have to validate their code in a similar way, with more complex algorithms perhaps requiring validation at multiple points along the sequence of operations.
Reproducible research requires sharing the data along with the analytic methods.Passing the original data through a shared analysis can both validate the analytics and assist in testing the approach on new data.There currently exist a large number of internet platforms to share data (http:// www.re3data.org);for example, Zenodo (https://zenodo.org)allows data storage of up to 2 GB per file.A similar project is figshare (https://figshare.com),allotting 20 GB of free private space, with up to 5 GB files, per account.Figshare users can control access to private data, or they can choose to make data publicly accessible.A third project, DRYAD (http:// www.datadryad.org;using payment plans), integrates publications with data; this system accepts file collections up to 10 GB and allows users to share data for peer review if the manuscript has been submitted to a participating journal.These data repository projects permit the creation of a citable Digital Object Identifier (DOI).Finally, the "ActivePapers" project takes this a step further by deploying the manuscript, data, and analytic steps in one framework (Hinsen 2014).
Although the present approach focuses primarily on data analytics neurophysiology, careful documentation of data collection is equally important to reproducible research.Often this involves note-taking in a physical notebook; however, there has been substantial growth in the use of electronic laboratory notebooks (ELNs), with numerous commercial systems (Rubacha et al. 2011) and several free open-source alternatives (Barillari et al. 2016;Khan et al. 2006;Voegele et al. 2013).Depending on the ELN system, there can be one or more disadvantages, including 1) the amount of "lock-in," with no easy way to exit a system that is no longer maintained or supplanted by more useful technology; 2) difficulty in setting up and maintaining institutional servers; and 3) requirements for internet connectivity, often less reliable on mobile devices.Individual laboratory needs for storage of specific information in an ELN can vary greatly.For our neurophysiology experiments, we document procedures and data collection using plain text files written in markdown (transcribed from physical notebooks) and stored in private git repositories hosted on GitHub.This has several advantages: 1) a version control system, like git, provides a detailed history of file changes; 2) git can be accessed from the command line or through graphical user interface clients on Windows, OS X, and Linux operating systems, including mobile devices running iOS or Android (e.g., Working Copy on iOS, http://workingcopyapp.com,and Pocket Git on Android, http://pocketgit.com); 3) because git synchronizes the local repository with the GitHub repository, there is no need for a continuous internet connection; and 4) by using plain text there is no requirement for specialized software to read files, but additional structure can be included by using markdown syntax, e.g., headings and lists.Our physical notebook pages are also imaged with a mobile device camera and included in the git repository; images can be rendered along with text using markdown with a text-to-html conversion tool, e.g., markdown viewer for Firefox (https://addons.mozilla.org/en-US/firefox/addon/markdown-viewer).Once text and images are rendered in a browser it is easy to print to PDF.More complex "weaving" of file types into a single report can be achieved with the Python "Pweave" package (http://mpastell.com/pweave).With these approaches, there is no need for dedicated ELN software in our workflow.
Although Jupyter is a powerful electronic notebook, there are alternatives.SageMathCloud (https://cloud.sagemath.com;SageMath) hosted on Google Cloud requires no installation and runs from a web browser.Users can code in SageMath, R, Python, Julia, and other languages.Another powerful feature is the ability for multiple users to simultaneously edit notebooks in real time, including Jupyter notebooks and the SageMath Worksheets.SageMathCloud includes editors for LaTeX, Markdown, and HTML, as well as the ability to work within the Linux terminal.Users with a free account have access to 3 GB of memory and 1 GB of storage per project.Additionally, free projects across all users are operated on designated "free machines," meaning that performance may be further limited; feature upgrades can be purchased.Wakari (https://wakari.io) is a similar product from Continuum Analytics, also with a free starter account.Both SageMathCloud and Wakari can be considered web-accessible approaches for creating Jupyter notebooks.Another alternative is Apache Zeppelin (https:// zeppelin.incubator.apache.org),developed by the Apache Software Foundation.Zeppelin is open source and comes with support for interpreters such as Python, markdown, Shell, and Scala; multiple users can access a single notebook simultaneously, with changes reflected in real time; however, Apache Zeppelin will need server installation.The R package "knitr" is designed to generate rapid reports of code, graphics, and data analysis in R (http://yihui.name/knitr), which can be used in RStudio, a free open-source integrated development environment (a server version is available for a fee; https://www.rstudio.com).Similar to the Jupyter notebook, "knitr" can generate reports in multiple formats such as HTML, PDF, and markdown.It is also important to mention that Jupyter notebooks can be run in other software, for example, in the editor Emacs (https://www.gnu.org/software; https://github.com/millejoh/emacs-ipython-notebook).Emacs also has "org-mode," which can function as a notebook for code and prose (Delescluse et al. 2012;Schulte et al. 2012).The free open-source Atom text editor (https://atom.io) has a plug-in for running Jupyter notebooks (https://atom.io/packages/jupyternotebook).
Regardless of specific software, electronic notebooks and other open-source approaches are exceptionally powerful tools for analyzing neurophysiological data.A large number of software projects are developed on GitHub, with significant interactions between developers and users, potentially producing substantial growth in the use of free open-source software.There are currently Ͼ77,000 packages in the Python index (https:// pypi.python.org/pypi)and Ͼ8,000 in R (https://cran.r-project.org/web/packages), and these ecosystems are beginning to have large impacts on the analytics used in scientific publications (Perkel 2015;Tippmann 2015).The use and reporting of these approaches can generate reproducible research and positively impact the quality of neurophysiology projects.colony.We express our gratitude to the reviewers of this manuscript for their meaningful guidance, which contributed to producing a much higher-quality report.

GRANTS
This work was supported by funding from the National Institutes of Health (NIH), including support from the NIH SPARC program (U18 EB-021772) and a grant to the University of Pittsburgh Cancer Institute (UPCI), P30 CA-047904 (Cancer Center Support Grant).This project used the UPCI Animal Facility, which was also supported in part by the P30 CA-047904 award.

DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the author(s).

Fig. 1 .
Fig. 1.Example of a Jupyter electronic notebook, a free open-source software developed by Project Jupyter (Table 1, Perez 2015; Perez and Granger 2007; Ragan-Kelley et al. 2014); screenshots are selected from the notebook in Data Supplement 1. A: after starting a notebook server, a directory is displayed in the internet browser where notebooks can be created with different kernels, e.g., Python and R (red circle).B: notebook cells can contain markdown (a plain text syntax), which is converted to formatted text by running the notebook cell (black box), and raw text.C: different levels of headings can also be used.D: computer language code can be included; an example of using a Python module for displaying images is shown [the neurophysiologist Charles Sherrington; image from Wellcome Library, London (Licensed under Creative Commons Attribution CC-BY 4.0)].

Fig. 3 .
Fig.3.Spike sorting of extracellular signals, recorded from a cervical vagus nerve bundle from the musk shrew.Each image (and a screen capture of the table in D) was created with custom code in Python (Data Supplement 2: Spike sorting and train analysis).A: the waveforms of each cluster are visualized, along with the spiking frequency across the duration of the recording; distension of the stomach occurred at 60 s, with an intragastric balloon.B: histogram counts of interspike intervals (ISIs) for each of the 3 clusters; bins are 1 ms in width.Any ISI under 3 ms is considered a violation of the refractory period.C: visualizing the 3 clusters in a 3-dimensional feature space; features are the coefficients of the principal component analysis (PCA).D: the results of the 3 clusters passed through quality metric testing.Each test challenges the clusters against physical constraints such as refractory violations.We have created a Python script (Data Supplement 2, spike sorting) based on the original MATLAB algorithms ofHill et al. (2011).Only cluster 2 passes the quality metrics tests (total error rate Ͻ 1.0%).

Fig. 6 .
Fig.6.Pipeline of free open-source software tools available for neurophysiological research.Data, metadata, and notes can be entered into Jupyter notebooks for extensive analysis and plotting.This produces sharable file formats, including the notebooks, plain text files, graphics files, and large data files (electrophysiology and images) in HDF5-associated formats.Ultimately, these outputs can be shared with team members and the scientific community.
.R. and C.C.H. conception and design of research; D.M.R. and C.C.H. performed experiments; D.M.R. and C.C.H. analyzed data; D.M.R. and C.C.H. interpreted results of experiments; D.M.R. and C.C.H. prepared figures; D.M.R. and C.C.H. drafted manuscript; D.M.R. and C.C.H. edited and revised manuscript; D.M.R. and C.C.H. approved final version of manuscript.

Table 1 ,
Perez 2015;Perez and Granger 2007;Ragan-Kelley et al. 2014); screenshots are selected from the notebook in Data Supplement 1. A: after starting a notebook server, a directory is displayed in the internet browser where notebooks can be created with different kernels, e.g., Python and R (red circle).B: notebook cells can contain markdown (a plain text syntax), which is converted to formatted text by running the notebook cell (black box), and raw text.C: different levels of headings can also be used.