Discussion:
How to escape a relative URL string with arguments for usage in a HTTP GET request ?
(too old to reply)
Charlie Gibbs
2021-11-24 16:57:28 UTC
Permalink
Hello all,
As in the subject line : I've got a relative URL path with arguments it
which I need to have its special chars escaped so it can be used in a HTTP
"/some+part/another+part?arg1=foo+42%23brown&arg2=bar+fly"). The thing is
that I can't seem to find an API function for it.
The ShlwApi UrlEscape function doesn't even seem to want to touch the
arguments on a full URL, and the WinInet InternetCreateUrl function does not
want to function with only the path and arguments parts being provided.
Looking at what docs.microsoft.com says about them does not show any leads
either.
In other words, does someone know which function I'm supposed to use to
create an escaped relative URL with arguments ?
Or am I supposed to (again) just roll my own ...
Maybe I'm different in that I've been writing string parsing code
for decades, but for me it's much faster to roll my own than go
through all the stuff you've described above (and you still haven't
found a solution yet).

Don't be afraid to do your own parsing. It's often simpler than
figuring out how to use some proprietary API. And it's actually
kind of fun once you get into it.

Besides, you might want your program to run on a Linux box someday...
--
/~\ Charlie Gibbs | Microsoft is a dictatorship.
\ / <***@kltpzyxm.invalid> | Apple is a cult.
X I'm really at ac.dekanfrus | Linux is anarchy.
/ \ if you read it the right way. | Pick your poison.
R.Wieser
2021-11-24 19:09:50 UTC
Permalink
Charlie,
Post by Charlie Gibbs
Maybe I'm different in that I've been writing string parsing
code for decades, but for me it's much faster to roll my own
than go through all the stuff you've described above (and you
still haven't found a solution yet).
:-) Yes, rolling my own encoding will most likely work ... I think. But
than I will need to spend a /lot/ of time figuring out the requirements[1],
and after implementing it testing the result.

[1] Currently I can't even seem to find if a space char must be
percent-escaped or should become a "+" character. The mentioned functions
do the former, FF here does the latter ...

Also, I still consider myself to be a hobbyist and assume that those MS guys
are /way/ better at their jobs. Even though I've been disappointed a few
times in that regard before and now again I still can't shake that feeling.
Go figure ...
Post by Charlie Gibbs
Don't be afraid to do your own parsing. It's often simpler than
figuring out how to use some proprietary API.
In this case the usage of the mentioned functions is not the problem, as
that can be found at docs.microsoft.com . What is is that it simply doesn't
do what I think it should be doing. Its often feels as if they write their
functions with a certain goal in mind, not at all to be general purpose ...
Post by Charlie Gibbs
And it's actually kind of fun once you get into it.
Yes, it is. But at moments I would like to be "lazy" and just use an
available, standard function for it.
Post by Charlie Gibbs
Besides, you might want your program to run on a Linux box someday...
Alas, I don't think that that wil ever happen. I'm writing Assembly using
Win32 API functions, and neither translates well.

But yes, I might one day write something like it on a Linux box. I hope a
Raspberry Pi counts as one ? :-)

Regards,
Rudy Wieser
JJ
2021-11-25 03:01:48 UTC
Permalink
Post by R.Wieser
[1] Currently I can't even seem to find if a space char must be
percent-escaped or should become a "+" character. The mentioned functions
do the former, FF here does the latter ...
The "+" space escaped character is dependent on the server script. i.e. not
every server (script) support the "+" space escaped character.
R.Wieser
2021-11-25 08:17:27 UTC
Permalink
JJ,
Post by JJ
The "+" space escaped character is dependent on the server script.
i.e. not every server (script) support the "+" space escaped character.
What I did was fully client-side, using FireFox 52.

Though I what I posted came from a HTML form element, where spaces in the
path where indeed percent encoded, but spaces in the argument part where
encoded as plus signs.

When I entered the same in the URL bar all spaces got percent encoded - but
none of any plus signs I introduced got encoded, regardless of in the path
or argument parts.

Not really consistent behaviour, :-(

I've also got absolutily no idea how a webserver / future me is supposed to
deal with that ...

Regards,
Rudy Wieser
Tavis Ormandy
2021-11-24 20:14:59 UTC
Permalink
In other words, does someone know which function I'm supposed to use to
create an escaped relative URL with arguments ?
I think it would be helpful if you gave an example of the output you
wanted and the input you have.

It sounds like you're looking looking for InternetCombineUrlA (with
an empty lpszBaseUrl), but it seems unlikely you knew about
InternetCreateUrl and not that one... :)

Tavis.
--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger ***@sdf.org
_\_V _( ) _( ) @taviso
R.Wieser
2021-11-24 22:52:35 UTC
Permalink
Tavis,
Post by Tavis Ormandy
I think it would be helpful if you gave an example of the output
you wanted
Blimey, I thought I did : [quote]example:
"/some+part/another+part?arg1=foo+42%23brown&arg2=bar+fly"[/quote]
Post by Tavis Ormandy
and the input you have.
I think you misunderstood : I do not have some specific input, I need random
inputs to be correctly (combined and) escaped into a valid, relative URLs
with arguments.
Post by Tavis Ormandy
It sounds like you're looking looking for InternetCombineUrlA
(with an empty lpszBaseUrl)
I saw it but ignored it as I assumed it would just combine. But no, it
ecapes too. Alas, it has the same flaw UrlEscape has : it ignores
everything after the (first) question mark.

Ohhh... Interresting - Using InternetCombineUrlA :

Input: "/some#part/other stuff?arg=my cat# a fragment"
Output: "/some?arg=my cat# a fragment#part/other stuff"

Notice that the part from the first hash upto the questionmark (*not* the
following slash) is moved to the end of the string. Ofcourse, now that that
moved part is right of the questionmark its not escaped either.

Also unexpected output when I provide the relative path as the first, and
the argument string as the second one :

Part1: "/some?part/other part"
Part2: "?arg1=my dog"
Output: "/some?arg1=my dog"

Notice that from "part1" everything after the questionmark has just
disappeared (without an error), and that the space in "part2" is still
unescaped.


Also consider the below, all of which produces garbage output when
InternetCreateUrl is used (which accepts the path part and arguments as
seperate strings, which /should/ make lots of stuff easier) :

Path: "some?folder'
Args: '?arg1=my dog'
Fail: first question mark is not escaped.

Path: "some folder'
Args: '?arg=my?dog'
Fail: Last question mark is not escaped.

Path: "some folder"
Args: "#fragment"
Fail: hashmark *is* escaped when at that position it shouldn't

Path: "some folder"
Args: "arg=dog"
Fail: The "Args"part is directly concatenated to the "path" part (without
the insertion of a question mark)

I hope thats enough info. :-)

Regards,
Rudy Wieser
Tavis Ormandy
2021-11-25 03:15:10 UTC
Permalink
Post by R.Wieser
Tavis,
Post by Tavis Ormandy
I think it would be helpful if you gave an example of the output
you wanted
"/some+part/another+part?arg1=foo+42%23brown&arg2=bar+fly"[/quote]
Sure, but the point was it wasn't clear if that is what you *want* or
what you *have*. If it is what you *want*, then what do you have?

Your reply makes it clear that this is what you *want* - good - but
you didn't give an example of what you have.
Post by R.Wieser
Post by Tavis Ormandy
and the input you have.
I think you misunderstood : I do not have some specific input, I need random
inputs to be correctly (combined and) escaped into a valid, relative URLs
with arguments.
Umm.. I think that's obvious :)

What I don't understand is why you can't show what specific input you
have that should produce the specific output above? e.g. "I have a
cracked URL in a URL_COMPONENTS, lpszUrlPath is /foobar/".
Post by R.Wieser
Post by Tavis Ormandy
It sounds like you're looking looking for InternetCombineUrlA
(with an empty lpszBaseUrl)
I saw it but ignored it as I assumed it would just combine. But no, it
ecapes too. Alas, it has the same flaw UrlEscape has : it ignores
everything after the (first) question mark.
Input: "/some#part/other stuff?arg=my cat# a fragment"
Output: "/some?arg=my cat# a fragment#part/other stuff"
Notice that the part from the first hash upto the questionmark (*not* the
following slash) is moved to the end of the string. Ofcourse, now that that
moved part is right of the questionmark its not escaped either.
It's helpful that you showed the input you have, but again, it
would be helpful if you show the output you *wanted* too :)

Is the URL already cracked, i.e. you know which part is a fragment and a
query?

Tavis.
--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger ***@sdf.org
_\_V _( ) _( ) @taviso
R.Wieser
2021-11-25 10:15:56 UTC
Permalink
Tavis,
Post by Tavis Ormandy
Sure, but the point was it wasn't clear if that is what you *want* or
what you *have*.
Neither. What I *want* is to send a GET followed by a string that can
again be split into its basic parts by a webserver. What I *have* is tho
strings, one containing the URL path, and the other the arguments and/or
fragment.
Post by Tavis Ormandy
Umm.. I think that's obvious :)
As you asked for an "before encoding" example - which doesn't really exist -
I wasn't so sure about that anymore.
Post by Tavis Ormandy
What I don't understand is why you can't show what specific input
you have that should produce the specific output above?
Perhaps that is because I have not yet found a function which will actually,
you know, encode the whole URL ? :-p

What I showed is what I gathered what, from the information I googeled, the
result of the encoding could/would look. I did that encoding by hand,
starting with
"/some part/another part?arg1=foo 42#brown&arg2=bar fly".
Post by Tavis Ormandy
It's helpful that you showed the input you have, but again, it
would be helpful if you show the output you *wanted* too :)
Thats the wrong question.

Its not about what *I* want, but what the rules (whatever they are) decree
it should look like - so it can be broken up by the webserver into the exact
same parts as provided by the client program.
Post by Tavis Ormandy
Input: "/some#part/other stuff?arg=my cat# a fragment"
Output: "/some?arg=my cat# a fragment#part/other stuff"
Expected output ? Either an error because a fragment cannot be part of the
first part, or something like this :
"/some%23part/other+stuff?arg=my+cat#+a+fragment"

Another example :
Input: "/part1#part2/part3#part4?arg1=data1#part5&arg2=data2#part6"

This one is problematic : how the <beep> do I know if that last "#part6" is
a fragment, or just a part of "arg2" ? I've not found anything even
wanting to touch tat question ...

But, other than just throwing an error because of ambiguity, there are at
least three outputs I can think of :

Naive:
"/part1%23part2/part3%23part4?arg1=data1%23part5&arg2=data2%23part6"

Making a guess:
"/part1%23part2/part3%23part4?arg1=data1%23part5&arg2=data2#part6"

Fragment combining (a-la InternetCombineUrlA):
"/part1/part3?arg1=data1&arg2=data2#part5#part4#part2#part6"

The first one will cause problems if the last part was actually ment as a
fragment. In the same way, the second one causes a problem when it was ment
as part of "arg2". The same goes for the third one (the "#part2" could be
part of the folder name). Also, I've not seen any indication that multiple
fragments are allowed or used anywhere.

The only solution I see is to provide the path, argument and fragment as
seperate strings, so that all three can be encoded /before/ gluing them
together (using their respective delimiters).
Post by Tavis Ormandy
Is the URL already cracked, i.e. you know which part is a fragment
and a query?
I've been ninja-ed :-)

No, I do not *know*. But I do know that determining which is what is
already a problem ...

"InternetCrackUrl" doesn't seperate them either. Whatever is starting with
a "?" or "# is returned in the "extra" part.

By the way, FireFox doesn't really know either. When entering hash symbols
into the "input" boxes of an HTML "form" element the hash symbols get
percent encoded.

But put them into the "action" part, and everything starting from them will,
for that part, just disappear.

The same happens when hash symbols are used in the URL bar - even though the
URL bar does not reflect that throwing-away change. <whut?>

Damn .. It looked to be so easy, just finding the right function. :-)

Regards,
Rudy Wieser
Tavis Ormandy
2021-11-25 15:39:08 UTC
Permalink
Post by R.Wieser
Post by Tavis Ormandy
It's helpful that you showed the input you have, but again, it
would be helpful if you show the output you *wanted* too :)
Thats the wrong question.
Its not about what *I* want, but what the rules (whatever they are) decree
it should look like - so it can be broken up by the webserver into the exact
same parts as provided by the client program.
I've know how URL encoding works ;-)

I was asking those questions for a reason. Imagine that you had the
string "/foo/bar?a=x#z", you could encode this a bunch of different
ways.

"/foo/bar?a=x%23z" -- ?a=x#z is the query, there is no fragment
"/foo/bar?a=x" -- ?a=x is the query, #z was a fragment (removed, not sent to server)
"/foo/bar%3fa=x" -- bar%3fa was part of the path, #z was a fragment
"/foo/bar%2fa=x%23x" -- everything was a (weird) path component
etc.

For each of these cases, you need to know what the components are, so I
asked you what you have and what you want. For example, if you got this
URL from a user, then it needs to be cracked first - if you generated
it, then the encoding needs to happen *before* you create it, etc, etc.

I think I've run out of patience for this, sorry!

Tavis.
--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger ***@sdf.org
_\_V _( ) _( ) @taviso
R.Wieser
2021-11-25 17:23:18 UTC
Permalink
Tavis,
Post by Tavis Ormandy
I was asking those questions for a reason. Imagine that you had
the string "/foo/bar?a=x#z", you could encode this a bunch of
different ways.
I know. That is what I tried to make clear in my previous message with
those examples.
Post by Tavis Ormandy
For example, if you got this URL from a user, then it needs to
be cracked first
/How/ I get the involved parts is fully outside the scope. My only interest
was-and-is how to combine them.
Post by Tavis Ormandy
Post by R.Wieser
The only solution I see is to provide the path, argument and fragment
as seperate strings, so that all three can be encoded /before/ gluing
them together (using their respective delimiters).
I think I've run out of patience for this, sorry!
My apologies for not allowing you to muddy the waters. I've got bad
experiences (multiple) with that (responders running off hunting pursuing
their ideas, never actually adressing the stated problem).

Thanks for trying anyway.

I think I can conlude that encoding an URL is a mess. Not because its
difficult, but as both of us have shown, it can be done in too many ways.

Regards,
Rudy Wieser
R.Wieser
2021-11-29 19:59:56 UTC
Permalink
In other words, does someone know which function I'm supposed to use to
create an escaped relative URL with arguments ?
Or am I supposed to (again) just roll my own ...
Hmmm... After having tried to come up with something, I have to admit my
defeat : there is simply no single solution possible that will cover all
use-cases.
The ShlwApi UrlEscape function doesn't even seem to want to touch the
arguments on a full URL
But at least I figured out why it doesn't want to that.

Oh well. Another lesson of "looks to be easy - but it isn't".

Regards,
Rudy Wieser

Loading...