PyPedia: A crowdsourced python online IDE for open and reproducible science
Alexandros Kanterakis, Morris Swertz
Genomics Coordination Center, University Medical Center Groningen
Although the quality and the variety of modern scientific software available as open source today is astounding, we observe that there are still barriers that prevent scientists from adopting open and reproducible practices in their analysis. From the point of view of a researcher that seeks an open source approach to her analysis the problems are usually the difficulty of configuring, installing and understanding the inner logic of a software even if the source code is available. Moreover, software that is developed in research institutes does not follow strict professional guidelines, lacks documentation, support and most importantly does not provide unit-tests and proofs of the integrity of the methods that it contains. From the point of view of a researcher that has already developed a scientific computational method, the fear of scientific misconduct or the unconfidence that the code is not good enough prevents her to openly publish it.
In the other hand we have experienced the success of the Wikipedia project where millions of editors with different backgrounds and expertise have collaborated voluntarily into creating qualitative scientific content. This model of online community coordination with minimum interference and administration for the accomplishment of a common goal is called crowdsourcing. In this work we present PyPedia, an effort to adopt the crowdsourcing paradigm for the development of python scientific methods. In its essence PyPedia is a wiki where each article is a python function or class. Apart from the source code, each article contains as sections, the documentation, an HTML form for online execution, the under-development code, unit-tests and the edit permissions of each one of these sections. In the source code an editor can invoke a function or instantiate a class that has been defined in another article without the need to import anything. Every edit attempt to the source code is validated from the unit-tests. Through the editable HTML form anyone can execute the method and have the results appear on browser. The online execution and the unit-test validation happens in the Google App Engine that is a python sandboxed environment suitable for potential unsafe code. A user can also request a file that contains the standalone, constraint-free version of the source code of an article. The same file can be sent to a remote computer that the user controls i.e. in the cloud. There is also a python library through which we can import a function or class from PyPedia to a local python namespace with a simple “import foo” statement. Articles are divided in two categories. The User category contains articles that are maintained by individual users who control the content and the edit permissions. The Main category contains articles that have been taken from the pool of User articles and exhibit qualitative and “pythonic” solutions to known scientific computational problems. All this functionality is bundled with a REST interface that acts as a code provider. Through this interface any external tool can request all the code required for the execution of a python statement. Additionally we can provide a specific timestamp and get time-specific versions of the code. Practically that means that a researcher that yields a result through code hosted in PyPedia, can provide a single URL and through that other tools or researches can reproduce the complete analysis even if the articles where the code reside have been changed since then.
PyPedia is an open source, online, IDE environment hosted in the same content management system as Wikipedia and guided by the same paradigm of crowdsourced editing. The main goal is to offer a convenient environment where open and reproducible science can flourish under the formation of a vivid and creative community.