Discussion:
Reading a HTTP stream - big "Content-Length" mismatch
(too old to reply)
R.Wieser
2019-09-05 07:42:01 UTC
Permalink
Hello all,

I'm trying to read raw HTTP data, and just ran into a situation where the
"Content-Length" entry exeeded the actual content length by a /large/ margin
(~350 KB vs ~50KB). A bit of googeling tells me that that should never
happen, and should be regarded as an error.

One "minor" detail though: The header also contains a "Connection: close"
entry, but some more googeling doesn't tell me which one /should/ take
priority, if at all(!).

The involved URL is https://www.questionablecontent.net/QCRSS.xml (its a
webstrip).

Does anybody know how cases like these should be handled ?

Regards,
Rudy Wieser

P.s.
I tried to find a more related newsgroup for the above question, but could
not find one. If one exists than please advice.
JJ
2019-09-06 07:43:30 UTC
Permalink
Post by R.Wieser
Hello all,
I'm trying to read raw HTTP data, and just ran into a situation where the
"Content-Length" entry exeeded the actual content length by a /large/ margin
(~350 KB vs ~50KB). A bit of googeling tells me that that should never
happen, and should be regarded as an error.
One "minor" detail though: The header also contains a "Connection: close"
entry, but some more googeling doesn't tell me which one /should/ take
priority, if at all(!).
The involved URL is https://www.questionablecontent.net/QCRSS.xml (its a
webstrip).
Does anybody know how cases like these should be handled ?
Regards,
Rudy Wieser
P.s.
I tried to find a more related newsgroup for the above question, but could
not find one. If one exists than please advice.
"Content-Length" header defines the original content length. It's different
than the length of the transferred raw data. e.g. when "Transfer-Encoding"
header is "gzip", "Content-Length" header would usually be smaller than the
transferred raw data because the raw data is the compressed data of the
original content.
R.Wieser
2019-09-06 09:10:48 UTC
Permalink
JJ,
Post by JJ
"Content-Length" header defines the original content length.
I'm sorry, but the above and what follows doesn't make any sense to me - its
as if you threw two definitions of "origional" together and stirred. :-|

The "Content-Length" header, if there, /should/ signify the size of the
transferred body - its pretty-much the only way to determine the end of one
results body and the start of another results header (in the case of
"Connection: keep alive").

Also, in this particular case I'm reading /raw/ data - I've opened a socket
and I'm sucking on it. This means that any kind of decoding, un-chunking
and/or -packing will need to be done afterwards.

And, after having posted, I've also encountered the opposite: the
Content-Length header indicated a /smaller/ size than the size of the
returned body (again a HTML page).

Remark: both are Cloudflare (the caching/ddos protection service) results.

Regards,
Rudy Wieser
JJ
2019-09-07 05:56:30 UTC
Permalink
Post by R.Wieser
JJ,
Post by JJ
"Content-Length" header defines the original content length.
I'm sorry, but the above and what follows doesn't make any sense to me - its
as if you threw two definitions of "origional" together and stirred. :-|
The "Content-Length" header, if there, /should/ signify the size of the
transferred body - its pretty-much the only way to determine the end of one
results body and the start of another results header (in the case of
"Connection: keep alive").
Also, in this particular case I'm reading /raw/ data - I've opened a socket
and I'm sucking on it. This means that any kind of decoding, un-chunking
and/or -packing will need to be done afterwards.
By "original", it means unencoded/uncompressed.

The length of the raw data depends on the existence and value of the
"Transfer-Encoding" header. i.e. whether the raw data is encoded/compressed
or not. The "Content-Length" header defines the length of the
unencoded/uncompressed data.

If the raw data is encoded/compressed, you'll have to decode/uncompress the
raw data and monitor the resulting unencoded/uncompressed length, in order
to know whether the raw data has all been transferred or not yet.
Post by R.Wieser
And, after having posted, I've also encountered the opposite: the
Content-Length header indicated a /smaller/ size than the size of the
returned body (again a HTML page).
The length of compressed data may actually be larger than the
unencoded/uncompressed data if the unencoded/uncompressed data can not be
compressed any more. e.g. 7-Zip or encrypted file which is transferred using
GZip encoding.
R.Wieser
2019-09-07 08:27:53 UTC
Permalink
JJ,
Post by JJ
By "original", it means unencoded/uncompressed.
The "Content-Length" header defines the length of the
unencoded/uncompressed data.
In that case I think you are wrong. Sorry.

As I mentioned, the received text(!) contents was just ~50 KByte, with a
"Content-Length" header indicating ~350 KByte. I cannot connect those
numbers in any way. Can you ?
Post by JJ
The length of compressed data may actually be larger than the
unencoded/uncompressed data if the unencoded/uncompressed
data can not be compressed any more.
In both cases the content/body was plain HTML text, /directly from the
socket/. No un-encoding or decompression performed or required.

And for the chance you think the un-encoding and/or decompression has
already been (silently) done: in that case the content size and the
"Content-Length" header should have been the same values ....

Regards,
Rudy Wieser
R.Wieser
2019-09-07 15:58:41 UTC
Permalink
JJ,

I found the problem. Its Windows who is playing a game..

Currently the "infobar" (at the bottom of the explorer window) shows ~37
KByte for the selected file, the hover-over popup shows ~350 KByte, and
rightclick -> properties shows ~12 KByte. The latter one matches the
"Content-length" value in the HTTP header.

This most likely happens because I overwrite the same file each time, and
Windows simply being lazy with updating somer of the values.

In short, the "Content-length" value does match (what I regard as) the
raw (directly from the socket) data (there might still be a definition
difference between us though).

My apologies for the confusion.

Regards,
Rudy Wieser

Loading...