(original) (raw)

I also see reproducibility and citation graphs as distinct concepts.


If it's reproducibility you're after, bibliographic citations are very unlikely to enable someone else to assemble an identical build environment from which the same conclusion should be repeatably derivable.

A ScholarlyArticle can be reproducible with no citations whatsoever.
A ScholarlyArticle may very likely have many citations and still be woefully unreproducible.

This citation doesn't contain a URL, but still isn't quite useless (while the paper is excellent); because there's at least a DOI string:

Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285

> Rule 3: Archive the Exact Versions of All External Programs Used

mybinder.org builds Jupyter containers from git repositories that contain config files with repo2docker.

https://repo2docker.readthedocs.io/en/latest/config_files.html#configuration-files
"""
Dockerfile
environment.yml
requirements.txt
REQUIRE
install.R
apt.txt
setup.py
postBuild
runtime.txt
"""

Specifying the exact version of Python (and what package it was installed from and/or what URL the source was obtained and built from) is no substitute for hashes of the 'pinned' versions of said artifacts.

# includes the python version
$ conda env export -f environment.yml

# these do not include the python version
$ pip freeze -r requirements.txt --all
$ pipenv lock # > Pipfile.lock
$ pipenv sync # < Pipfile.lock

Uploading a built container or VM image to e.g. Docker Hub / GitLab Container Registry / Vagrant Cloud is another way to ensure that research findings are reproducible.
- Dockerfile, docker-compose.yml
- Vagrantfile

> Rule 4: Version Control All Custom Scripts

https://mozillascience.github.io/code-research-object/ (FigShare + GitHub => DOI citation URI)

https://guides.github.com/activities/citable-code/ (Zenodo + GitHub => DOI citation URI)

...

Is it necessary to cite Python (or all packages) if you're not building a derivative of Python or said packages?

It's definitely a good idea to "Archive the Exact Versions of All External Programs Used"; but IDK that those are best represented with bibliographic citations. Really, a link to the Homepage, Source, Docs, and Wikipedia page are probably more helpful to a reviewer that's not familiar with and wants to help support by linking dereferenceable URLs and https://5stardata.info.

While out of scope and OT, it's worth mentioning that search engines index https://schema.org/Dataset metadata; which is helpful for data reuse and autodiscovering requisite premises for the argument presented in a https://schema.org/ScholarlyArticle .

A citation for each MAJ.MIN.PATCH revision of CPython (and/or other excellent packages) might be a bit much.

On Monday, September 10, 2018, Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Sep 10, 2018 at 09:25:29PM +0200, Chris Barker via Python-Dev wrote:

> I"d like ot know what thee citations are expected to be used for?

>

> i.e. -- usually, academic papers have a collection of citiations to

> acknowledge where you got an idea, or fact, or .... It serves both to

> jusstify something and make it clear that it is not your own idea (i.e. not

> pagerism).



[

> That is about reproducible results, which is really a different thing than

> the usual citations.



I don't think it is. I think you are seeing a distinction that is not

there. If citations were just about acknowledgement, we could say "I got

this idea from Bob" and be done with it. Citations are about identifying

the *exact* source so that anyone can reproduce the given ideas by

checking not just "Bob" but the specific page number of a specific

edition of a specific work.



So the requirement for precision is no different between papers and

software, and the academic standards for citing software already take

that into account. There are challenges with software, to be sure --

code is much more ephemeral, there may be literally hundreds of

authors, etc. But in principle, the kinds of information needed to

cite a software package is known. The major citation styles already

include this. When you are using a specific style, this page:



https://openresearchsoftware.metajnl.com/about/



suggests a few formats, depending on how you got access to the software.



The bottom line is, we don't have to guess what information to provide.

People like Jacqueline can tell us what they need, and we'll just fill

in the values.



The people citing Python know what information they need, we just have

to help them get it. I think that the best way to do that is to provide

the correct information in a single place, in a single, standard format,

and let them choose the appropriate citation style for their

publication.



Jackie, do I have that right?







--

Steve

_______________________________________________

Python-Dev mailing list

Python-Dev@python.org

https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com