(original) (raw)
I think the "why" in this case should be a bit deeper than that, because until recently, it's been somewhat unusual to cite the tools you use to create a paper.
I see three major reasons why people cite software packages, and the form of the citation would have different requirements for each one:
1\. Academic credit / Academic use metrics
The weird way that academia has evolved, academics are largely judged by their publications and how influential those publications are. A lot of the people who work on statistical and scientific python libraries are doing excellent and incredibly influential work, but that's largely invisible to the metrics used by funding and tenure committees, so there's been an effort do things like getting DOIs for libraries or publishing articles in journals like the journal of open source software: https://joss.theoj.org
Then you cite the libraries if you use them, and the people who
contribute to the work can say, "Look I'm a regular contributor to
this core library that is cited in 90% of papers". This seems less
important to CPython, where the majority of core contributors (as
far as I can tell) are not academics and have little use for high
h-index papers. That said, even if no one involved cares about the
academic credit, if every paper that used Python cited the
language, it probably would provide useful metrics to the
PSF and others interested in this.
If all you want is a formal way to say "I used Python for this" as
a citation so that it can be tracked, then a single DOI for the
entire language should be sufficient.
2\. As a primary source or example for some claims
If you are writing an article about language design and you are
referencing how Python handles async or scoping or unicode or
something, you want to make it easy for your readers to see the
context of your statement, to verify that it's true and to get
more details than you might want to include as part of what may be
a tangential mention in your paper. I have a sense that this is
closer to the original reason people cited things in papers and
books before citations became a metric for measuring influence -
and subsequently a way to give credit for the source of ideas.
If this is why you are citing Python, you should probably be
citing a specific sub-section of the language reference and/or
documentation, and that citation should probably be versioned,
since new features are added in every minor version, and the way
some of these things are handled may change over time. In this
case, a separate DOI for each minor version that points to the
documentation as built by a specific commit or git tag or whatever
would probably be ideal.
3\. To aid reproducibility
It won't go all the way towards reproducing your research, but
given that Python is a living language that is always changing -
both in implementation and the spec itself - to the extent that
you have a "methods" section, it should probably include things
like operating system version, CPython version and the versions of
all libraries you used so that if someone is failing to replicate
your results, they know how to build an environment where it should
work.
If you want to include this information in the form of a citation,
then I would think that you would not want to be both more
granular - citing the specific interpreter you used (CPython,
Jython, Pypy), the full version (3.6.6 rather than 3.6) and
possibly even other factors like operating system, etc, and less
granular in that you don't need to cite a specific subset of the
interpreter (e.g. async), but just the interpreter as a whole.
--
My thoughts on the matter are that I think the CPython core dev
team probably cares a lot less about #1 than, say, the R dev team,
which is one reason why there's no clear way to cite "CPython" as
a whole.
I think that #3 is a very laudable goal, but probably should be in
some sort of "methods" section of the document being prepared
rather than overloading citations for it, though having a
standardized way to describe your Python setup (similar to, say,
the pandas debugging feature \`pandas.show\_versions()\`) that is
optimized for publication would probably be super helpful.
While #2 is probably only a small fraction of all the times where
people would want to "cite CPython", I think it's probably the
most important one, since it's performing a very specific function
useful to the reader of the paper. It also seems not terribly
difficult to come up with some guidance for unambiguously
referencing sections of the documentation and/or language
reference, and having "get a DOI for the documentation" be part of
the release cycle.
Best,
Paul
P.S. I will also be at the NumFocus summit. It's been some time
since I've been an academic, but hopefully there will be an
interesting discussion about this there!
On 9/16/18 6:22 PM, Jacqueline Kazil wrote:
RE: Why cite Python….
I would say that in this paper — http://conference.scipy.org/proceedings/scipy2015/pdfs/jacqueline\_kazil.pdf, where we introduced a new library, we should have cited Python, because the library was based in Python. We were riding on the coattails of Python and if Python did not exist, then this library would not exist.
(taking this a level higher)
Just as someone doing research (a specific application) should cite the Mesa library. Without the good and bad that is Mesa, their research would have taken a different form.Since my Ph.D is on Mesa, I will be citing Python there.
I think for more insight we can look at who has cited some of Guido’s stuff…
For example: https://scholar.google.com/scholar?cites=900267235435084077&as\_sdt=20005&sciodt=0,9&hl=enDoes that help?