net/url: URL allows malformed query round trip · Issue #22907 · golang/go (original) (raw)
What did you do?
package main
import ( "fmt" "log" "net/url" )
func main() { u, err := url.Parse("http://example.com/bad path/?bad query#bad fragment") if err != nil { log.Fatal(err) } fmt.Println(u.String()) }
https://play.golang.org/p/hdX1zpv3BN
What did you expect to see?
I expect either url.Parse return a non-nil error or URL.String method return fully escaped url representation — http://example.com/bad%20path/?bad%20query#bad%20fragment
— with query being escaped the same way as path or fragment.
What did you see instead?
http://example.com/bad%20path/?bad query#bad%20fragment
For the reference, such url is rejected by net/http.Server: https://play.golang.org/p/2gujmbXZlu
Does this issue reproduce with the latest release (go1.9.2)?
Yes
System details
go version devel +9a13f8e11c Tue Nov 28 06:47:50 2017 +0000 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/artyom/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/tmp/go:/Users/artyom/go"
GORACE=""
GOROOT="/Users/artyom/Repositories/go"
GOTMPDIR=""
GOTOOLDIR="/Users/artyom/Repositories/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lb/3rk8rqs53czgb4v35w_342xc0000gn/T/go-build624293827=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version devel +9a13f8e11c Tue Nov 28 06:47:50 2017 +0000 darwin/amd64
GOROOT/bin/go tool compile -V: compile version devel +9a13f8e11c Tue Nov 28 06:47:50 2017 +0000
uname -v: Darwin Kernel Version 17.2.0: Fri Sep 29 18:27:05 PDT 2017; root:xnu-4570.20.62~3/RELEASE_X86_64
ProductName: Mac OS X
ProductVersion: 10.13.1
BuildVersion: 17B48
lldb --version: lldb-900.0.57
Swift-4.0
https://tools.ietf.org/html/rfc3986#section-3.4 states that query component should be defined as (appendix A):
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
There's no whitespace character in this list. whatwg agrees on that:
A URL-query string must be zero or more URL units.
[...]
The URL units are URL code points and percent-encoded bytes.
[...]
The URL code points are ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters.