[ back to toc ]

implementing http downloads in C under Unix/Linux

Date: 2002/05/28 10:45

Q:
I am trying to write a small command line program that fetches a file from
a webserver, like a normal webbrowser. However, I have been unsuccessful
at making it work and I would like to ask you a little about implementing
HTTP commands into programms.
First: should writing "GET http://www.someserver.com/index.html HTTP/1.1"
to the socket do the trick? I tried and it and it didn't work. I pulled
down the HTTP1.1 RFC but didn't get any clues. Do I have to add anything
else?
Second: How do I take care of the file coming down the wire? How big
should the buffer be and how can I write it to the disk only when it is,
say, 85% full? All this taking into account that I cannot know the
filesize beforehand...
Third: how can I split the file download among several processes
(multithreaded download) and reassemble the parts properly afterwards?

Please note that I am trying to do this under Linux (RedHat 7.2).

Thank you very much for taking the time to examine my questions. Even the
smallest bit of help will be much appreciated.

Sincerely,
*NAME-DELETED* *NAME-DELETED*
A:
You can see a Perl program that does something like that

http://www.peter.verhas.com/progs/perl/webmirror/

you can learn from. Socket handling in C is similar to that of Perl.

>How big should the buffer be

It all depends on how much memory you have in the machine.

>and how can I write it to the disk

I assume you have to use the system function 'write'.

>>>
how can I split the file download among several processes (multithreaded
download) and reassemble the parts properly afterwards?
<<<

No way. You mix up something. A single HTTP session gives data over a
single TCP channel, which is a single thread of data. There is no reason
to habdle this in more than one process or thread.

Multithreaded download starts several HTTP sessions, each for a DIFFERENT
web page.

Regards,
Peter

[ back to toc ]