(original) (raw)

I think the "why" in this case should be a bit deeper than that, because until recently, it's been somewhat unusual to cite the tools you use to create a paper.

I see three major reasons why people cite software packages, and the form of the citation would have different requirements for each one:

1\. Academic credit / Academic use metrics

The weird way that academia has evolved, academics are largely judged by their publications and how influential those publications are. A lot of the people who work on statistical and scientific python libraries are doing excellent and incredibly influential work, but that's largely invisible to the metrics used by funding and tenure committees, so there's been an effort do things like getting DOIs for libraries or publishing articles in journals like the journal of open source software: https://joss.theoj.org

Then you cite the libraries if you use them, and the people who contribute to the work can say, "Look I'm a regular contributor to this core library that is cited in 90% of papers". This seems less important to CPython, where the majority of core contributors (as far as I can tell) are not academics and have little use for high h-index papers. That said, even if no one involved cares about the academic credit, if every paper that used Python cited the language, it probably would provide useful metrics to the PSF and others interested in this.

If all you want is a formal way to say "I used Python for this" as a citation so that it can be tracked, then a single DOI for the entire language should be sufficient.

2\. As a primary source or example for some claims

If you are writing an article about language design and you are referencing how Python handles async or scoping or unicode or something, you want to make it easy for your readers to see the context of your statement, to verify that it's true and to get more details than you might want to include as part of what may be a tangential mention in your paper. I have a sense that this is closer to the original reason people cited things in papers and books before citations became a metric for measuring influence - and subsequently a way to give credit for the source of ideas.

If this is why you are citing Python, you should probably be citing a specific sub-section of the language reference and/or documentation, and that citation should probably be versioned, since new features are added in every minor version, and the way some of these things are handled may change over time. In this case, a separate DOI for each minor version that points to the documentation as built by a specific commit or git tag or whatever would probably be ideal.

3\. To aid reproducibility

It won't go all the way towards reproducing your research, but given that Python is a living language that is always changing - both in implementation and the spec itself - to the extent that you have a "methods" section, it should probably include things like operating system version, CPython version and the versions of all libraries you used so that if someone is failing to replicate your results, they know how to build an environment where it should work.

If you want to include this information in the form of a citation, then I would think that you would not want to be both more granular - citing the specific interpreter you used (CPython, Jython, Pypy), the full version (3.6.6 rather than 3.6) and possibly even other factors like operating system, etc, and less granular in that you don't need to cite a specific subset of the interpreter (e.g. async), but just the interpreter as a whole.

--

My thoughts on the matter are that I think the CPython core dev team probably cares a lot less about #1 than, say, the R dev team, which is one reason why there's no clear way to cite "CPython" as a whole.

I think that #3 is a very laudable goal, but probably should be in some sort of "methods" section of the document being prepared rather than overloading citations for it, though having a standardized way to describe your Python setup (similar to, say, the pandas debugging feature \`pandas.show\_versions()\`) that is optimized for publication would probably be super helpful.

While #2 is probably only a small fraction of all the times where people would want to "cite CPython", I think it's probably the most important one, since it's performing a very specific function useful to the reader of the paper. It also seems not terribly difficult to come up with some guidance for unambiguously referencing sections of the documentation and/or language reference, and having "get a DOI for the documentation" be part of the release cycle.

Best,
Paul

P.S. I will also be at the NumFocus summit. It's been some time since I've been an academic, but hopefully there will be an interesting discussion about this there!

On 9/16/18 6:22 PM, Jacqueline Kazil wrote:

RE: Why cite Python….

I would say that in this paper — http://conference.scipy.org/proceedings/scipy2015/pdfs/jacqueline\_kazil.pdf, where we introduced a new library, we should have cited Python, because the library was based in Python. We were riding on the coattails of Python and if Python did not exist, then this library would not exist.

(taking this a level higher)
Just as someone doing research (a specific application) should cite the Mesa library. Without the good and bad that is Mesa, their research would have taken a different form.

Since my Ph.D is on Mesa, I will be citing Python there.

I think for more insight we can look at who has cited some of Guido’s stuff…
For example: https://scholar.google.com/scholar?cites=900267235435084077&as\_sdt=20005&sciodt=0,9&hl=en

Does that help?