gh-60691: allow certificates to be specified from memory by jgehrcke · Pull Request #2449 · python/cpython (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation26 Commits7 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
""" |
---|
with open(filepath, 'rb') as f: |
buf = BytesIO(f.read()) |
buf.read = buf.getvalue |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need that? It should work without.
>>> b = io.BytesIO(b"abc")
>>> b.read()
b'abc'
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wanted to have the object be re-read()
able, because IIRC that simplified writing the test code quite a bit (it probably allowed me to prepare the various file objects just once, instead of for every load_cert_chain()
method invocation):
>>> import io
>>> b = io.BytesIO(b"abc")
>>> b.read()
b'abc'
>>> b.read()
b''
vs.
>>> b = io.BytesIO(b"abc")
>>> b.read = b.getvalue
>>> b.read()
b'abc'
>>> b.read()
b'abc'
} |
---|
PySSL_BEGIN_ALLOW_THREADS_S(pw_info.thread_state); |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a password be asked for the certificates (as opposed to the private key)? This sounds a bit weird.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pitrou. I got around to looking into this again. My argument above was not the correct argument to answer your question. I went into the code and docs again and understood the reasoning again. There are two points of relevance here:
1) cert files might indeed be encrypted
From https://www.openssl.org/docs/man1.0.2/ssl/SSL_CTX_use_certificate_chain_file.html:
The private keys loaded from file can be encrypted. In order to successfully load encrypted keys, a function returning the passphrase must have been supplied, see SSL_CTX_set_default_passwd_cb. (Certificate files might be encrypted as well from the technical point of view, it however does not make sense as the data in the certificate is considered public anyway.)
(emphasis mine)
2) Consistency
That is the code block in question from my patch:
PySSL_BEGIN_ALLOW_THREADS_S(pw_info.thread_state);
r = _openssl_use_certificate_chain_from_bio(self->ctx, certbio->bio);
PySSL_END_ALLOW_THREADS_S(pw_info.thread_state);
whereas _openssl_use_certificate_chain_from_bio()
is by definition meant to closely resemble OpenSSL's SSL_CTX_use_certificate_chain_file()
. Both can handle the password callback. As of reason (1), I suppose. From the source of SSL_CTX_use_certificate_chain_file()
in OpenSSL 1.1.0:
x = PEM_read_bio_X509_AUX(in, NULL, passwd_callback,
passwd_callback_userdata);
If _openssl_use_certificate_chain_from_bio()
or SSL_CTX_use_certificate_chain_file()
triggers the password callback I believe we better have the wrapper in place.
That's probably what one of the earlier authors thought, too, and so the same ALLOW_THREADS wrapper around SSL_CTX_use_certificate_chain_file()
has always been present in CPython so far:
PySSL_BEGIN_ALLOW_THREADS_S(pw_info.thread_state);
r = SSL_CTX_use_certificate_chain_file(self->ctx,
PyBytes_AS_STRING(certfile_bytes));
PySSL_END_ALLOW_THREADS_S(pw_info.thread_state);
Notably, I do not think that we cover the case of loading an encrypted certificate (where there is no key also present in the cert file) within test_ssl
.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a commit with a hopefully clarifying code comment.
keyfile_path = keyfile |
---|
# Pre CPython 3.7 behavior: let OpenSSL consume the files via |
# SSL_CTX_use_certificate_chain_file(). |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not read the files in Python and fall through below? This would allow deleting a bunch of C code.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intriguing. You are asking this now and I feel like "mhm, I think there was a pretty strong reason to not do it, because I tried hard, but what was the reason again?".
I am glad that I wrote a rather complete summary one year ago. It revived my memories... this is an excerpt from it:
[...]
An important note on backwards compatibility:
>> Antoine, are you suggesting that we remove the current c-level
>> capability to use file system files (using open()) and just go with
>> raw bytes data at the C api level, and then do the 'filename or
>> filelike object' in Python land?
>
> Yes, I think that's reasonable. Hopefully it won't break any existing uses.
I absolutely agree that Python code should be responsible for doing the
distinction between file paths and file objects. However, I do not agree with
> remove the current c-level capability to use file system files
After following the code I am rather sure that we should leave the current C
implementation (`_ssl__SSLContext_load_cert_chain_impl()` in Modules/_ssl.c)
as untouched as possible, and not try to integrate the new logic into that
very same function.
[...]
The central argument for having two separate C code paths is that old code
must proceed using OpenSSL's SSL_CTX_use_certificate_chain_file().
[...]
A bit of background on this can be found in the summary I linked to. I tried hard to understand the OpenSSL docs and the corresponding C code in the CPython repository, and tried hard to come up with what you were suggesting but I failed achieving the level of confidence that's required here. I was unable to achieve the desired level of confidence as of inherently bad OpenSSL API docs, and (this is what I vaguely remember) a few obvious mismatches between code and docs. In my summary I also wrote
When inspecting the OpenSSL code for SSL_CTX_use_certificate_chain_file(), some
of the above statements are not entirely correct.
(referring to statements from API docs)
So I could not rely on doc. Then I probably thought that the only way to gain confidence was via a proper test suite. In this particular case here, however, we probably agree that a test suite that actually matters does not really exist. It's comprised of all the applications in the world that use this code path. As of certificate diversity and pure combinatorics the input space into this function is super large.
From here it was rather easy to conclude for me that the old code path is better left intact, and that having a switch/shunt at a higher level is much safer in terms of backwards compatibility.
Manually resolved conflicts in
Lib/test/test_ssl.py
Modules/clinic/_ssl.c.h
@pitrou @orsenthil @tiran I would very much appreciate to get another round of feedback. I am available for pushing this further, so that maybe we can still land this in 3.7. With all the information present across the issue tracker and this GH conv I want to stress again what's probably easy to miss: this is meant to be a conceptually backwards-compatible change, because the old code path remains intact.
Updated:
- Responded more in-depth to review.
- Merged current master into this branch; manually resolved conflicts (that was required especially after bpo-31346: Use PROTOCOL_TLS_CLIENT/SERVER #3058 and bpo-29464: Rename METH_FASTCALL to METH_FASTCALL|METH_KEYWORDS and make #1955 landed recently).
- Added code comments based on review feedback.
- Built successfully on Fedora 26's
gcc version 7.2.1
(newdereferencing pointer to incomplete type
errors newly appeared compared to my work from a year ago). - Confirmed that
./python -m test -v test_ssl
passes (one error message emitted by OpenSSL seems to have changed from OpenSSL 1.0.2 to 1.1.0, and I changed oneassertRaisesRegex(SSLError, ...)
back to simplyassertRaises(SSLError)
) - Added NEWS entry.
I'm still surprised that this functionality needs so much support code, but I'll let @tiran review it.
I'm still surprised that this functionality needs so much support code, but I'll let @tiran review it.
Gotcha. At the core that's because there is unfortunately no such thing like SSL_CTX_use_certificate_chain_bio()
provided by the OpenSSL API: a bio
pendant corresponding to SSL_CTX_use_certificate_chain_file()
would be super convenient.
That's what this patch adds and what's called _openssl_use_certificate_chain_from_bio()
.
Thanks for giving this another quick thought @pitrou, much appreciated.
Edit: I've tried to write this before, but I want to retry with more clarity: _openssl_use_certificate_chain_from_bio()
is effectively static int use_certificate_chain_file(SSL_CTX *ctx, SSL *ssl, const char *file)
straight from the OpenSSL source, just without BIO creation.
OpenSSL's use_certificate_chain_file(SSL_CTX *ctx, SSL *ssl, const char *file)
expects a file path as input and then immediately performs BIO creation from a file (via BIO_read_filename(in, file)
), and then processes this BIO with the "OpenSSL cert chain loading logic" (a term I will use again below).
In this patch, we create the BIO using the nice existing Python API for Memory BIO management (which was added in Python 3.5). That results in PySSLMemoryBIO
objects that hold the OpenSSL-native bio structs internally.
The input to the _openssl_use_certificate_chain_from_bio()
added by this patch is precisely the native OpenSSL Bio struct. The function then proceeds with the "OpenSSL cert chain loading logic" copied from OpenSSL' use_certificate_chain_file(SSL_CTX *ctx, SSL *ssl, const char *file)
.
This is why _openssl_use_certificate_chain_from_bio()
is OpenSSL's use_certificate_chain_file(SSL_CTX *ctx, SSL *ssl, const char *file)
modulo BIO creation.
That
- leaves the old
use_certificate_chain_file()
-based code path intact (for conceptually ensuring backwards compat). - should conceptually ensure that the new code path has the same effects on a given
SSLContext
when using the same input data, just from memory instead of from file.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still -1 on this change. @reaperhulk and I are going to have another stab in PEP 543, which will provide a better way to load certs and keys. As Antoine pointed out, it's a lot of additional code. I don't want to maintain additional code and multiple APIs to load key and cert material.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.
Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again
. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.
@tiran any idea if you'll remove the block on this? More and more people are running into this issue.
It appears the perfect is the enemy of the good here -- this bug is outstanding for nearly a decade and waiting for a full rewrite is a classic engineering tarpit. PEP 543 also appears to be moribund, but that's a separate issue.
+1 for @nimish What need to happen to push the thread forward?
Exposing certificate and private key on host's file system in production increases the risk to expose these credentials.
Putting social pressure on me is not how you are going to land this PR. As I tried to explain earlier I'm not against the feature per se. But I want to a clean and well designed API with a maintainable implementation.
Ultimately I will be the person who has to maintain the code for the next decade or two. We have very few core devs that are familiar with X.509, TLS/SSL, and OpenSSL APIs. I'm doing majority of work and have limited personal (and unpaid) time to contribute to CPython.
If you don't agree with my decision you can ask another core dev to overrule me, take over ownership of the ssl module, or appeal to the Python steering council and ask for mediation.
In the mean time please watch @brettcannon's keynote from PyCon 2018, https://youtu.be/tzFWz5fiVKU?t=2964 .
Exposing certificate and private key on host's file system in production increases the risk to expose these credentials.
I don't agree with your reasoning. If you cannot trust a secure temp directory, then you have more serious problems in your server and application design. Besides private keys should always be encrypted with PKCS#5 v2.0 and a secure passphrase so access to the encrypted key file does not pose a security risk.
@tiran, perhaps a recipe that shows how to use mkstemp()
to load certificate that are currently in memory could be useful while work on PEP 543 is ongoing?
@tiran Sure, can you then close this as a WONTFIX? It's in limbo right now.
Your previous comment just refer to developing a PEP 543 implementation. If this is not going to be merged at all in favor of PEP 543 library, then I'd like to ask that this and the BPO bug be closed with that reason vs being left tantalizingly open.
@tiran, perhaps a recipe that shows how to use
mkstemp()
to load certificate that are currently in memory could be useful while work on PEP 543 is ongoing?
#!/usr/bin/env python3
import tempfile
import ssl
def load_cert_memory(context, certdata, keydata, password):
if not password:
raise ValueError("Insecure key")
with tempfile.NamedTemporaryFile(mode="wt") as f:
if keydata is not None:
f.write(keydata)
if not keydata.endswith("\n"):
f.write("\n")
f.write(certdata)
f.flush()
context.load_cert_chain(f.name, password=password)
def test():
passphrase = "somepass"
with open("Lib/test/ssl_cert.pem") as f:
testcert = f.read()
with open("Lib/test/ssl_key.passwd.pem") as f:
testkey = f.read()
with open("Lib/test/keycert.passwd.pem") as f:
combined = f.read()
ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
load_cert_memory(ctx, testcert, testkey, passphrase)
ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
load_cert_memory(ctx, combined, None, passphrase)
if __name__ == "__main__":
test()
With PEP 543 marked as Withdrawn, what's the roadmap for getting this sort of functionality?
tiran removed their assignment
This PR is stale because it has been open for 30 days with no activity.
vToMy mentioned this pull request
So ...ssl_context.load_cert_chain(io.BytesIO(crtChain.encode("utf8")))
is still not possible? Can only use filenames? (its a sad situation if so ... people want in-memory keys exactly because it's more secure than storing them on the filesystem).
This PR is stale because it has been open for 30 days with no activity.
arhadthedev changed the title
bpo-16487: allow certificates to be specified from memory gh-60691: allow certificates to be specified from memory
Unfortunately, closing because of four-year stalemate. The feature is useful by itself but found no support of the SSH part maintainer both from architectural standpoint:
If you cannot trust a secure temp directory, then you have more serious problems in your server and application design. Besides private keys should always be encrypted with PKCS#5 v2.0 and a secure passphrase so access to the encrypted key file does not pose a security risk.
and verification area creep:
We have very few core devs that are familiar with X.509, TLS/SSL, and OpenSSL APIs. I'm doing majority of work and have limited personal (and unpaid) time to contribute to CPython.
Unfortunately, closing because of four-year stalemate
Decade-long stalemate! :) This issue was originally opened in 2012. No hard feelings. Thanks again for all the valuable contributions here everyone.
Please add make this to the standard library of ssl.py for extended functionality especially for context.load_cert_chain(certfile=cert_path, keyfile=key_path)
so it can be used with embedded strings or bytes and not only with filepaths!