Better URL encoding/decoding (original) (raw)
Flurl.Url has always contained a few undocumented public static methods for URL-encoding and decoding. In an effort to fix a couple reported bugs and, more generally, fix some known quirks in the .NET world, these methods have gotten an overhaul, a couple renames, and a cohesive new story, so I'm now happy to "advertise" them. :)
What the RFC says
When dealing with URL-encoding, characters fit into one of 3 categories:
- unreserved (legal in URLs): alphanumeric and
-._~ - reserved (legal, but may have special meaning in URLs):
:/?#[]@!$&'()*+,;= - everything else (illegal in URLs, must be encoded)
One notable special case is the % character. When used as part of a %-encoding sequence (e.g. %20 to represent a space), it is legal in the URL. Otherwise, it must be encoded.
Another thing to note is that although the RFC says nothing about encoding space characters as +, the HTML spec does specify this for URL-encoded form data, and it is also a common practice in query strings.
What .NET gives us
- Uri.EscapeDataString is our best option for encoding both illegal and reserved characters, but it has the following shortcomings:
- It chokes with a
UriFormatExceptionat 65,520 characters, which is a realistic problem when using it to URL-encode form data. - It has no option to encode space characters as
+.
- It chokes with a
- Uri.EscapeUriString is our best option for encoding illegal characters only. For example, with a string like
"1 2/3 4", it'll encode the spaces for you but assumes you want to keep the/as a path separator. But it has one major quirk:- It always encodes the
%character, even if it's proceeded by 2 hex characters, which is a %-encoded sequence and perfectly legal in a URL.
- It always encodes the
- Uri.UnescapeDataString is our best option for URL decoding, but it too has a shortcoming:
- It has no option to interpret
+characters as spaces.
- It has no option to interpret
- WebUtility.UrlEncode is our best option for...pretty much nothing.
How Flurl improves on these
Flurl sets out to replace the methods above and correct their quirks with the following static methods:
- Url.Encode(string s, bool encodeSpaceAsPlus) encodes both illegal and reserved characters. It has no string size limit and gives you the option to encode space characters as
+. - Url.EncodeIllegalCharacters(string s, bool encodeSpaceAsPlus) encodes illegal characters only, and will not encode
%if it is part of a %-hex-hex sequence, so there is no worry of already-encoded strings getting double-encoded. - Url.Decode(string s, bool interpretPlusAsSpace) decodes any size string and gives you the option to decode
+characters to spaces.
What's breaking?
As mentioned, Url has always had encoding/decoding methods, but with their new purpose in life, 2 have been renamed and, effectively, superseded:
- Url.EncodeQueryParamValue is superseded by Url.Encode.
- Url.DecodeQueryParamValue is superseded by Url.Decode.
Since these were mainly for internal use I'm hopeful this won't cause problems for most, but they were public methods so I want to fully disclose this breaking change.