[ back to toc ]

Downloading web pages

Date: 2002/01/23 16:06

Q:
I am looking for a way to download the source for an html file using a C
program, so that I can then parse that file for data. I'm using RedHat
Linux 7.1 on a PIII 700 Compaq. I've tried using the regular read system
call inserting the entire web address as the filename, but with no
results. (The program works fine if I insert a local file) When I insert
the web address, the program just sits doing nothing until I control-c to
kill it.

Thanks for any help you can provide.

*NAME-DELETED* *NAME-DELETED*
A:
Of course it does not! The read system call can read files and sockets.
There are functions in higher level languages, like php that work that
way, but if you look at the code implemented behind, you will see that
this is not just calling the system call read for the HTTP address.

To learn how to do that you have to learn first socket programming
(network programming under C) and then how http work.

Or you can just download and use the library called libwww without caring
too much what is inside.

Regards,
Peter

[ back to toc ]