MachineOfTheMonth: Slurping down websites
Jul 21, 2001, 19:00 (0 Talkback[s])
(Other stories by Glenn Mullikin)
[ Thanks to Glenn Mullikin for this
link. ]
It's often nice to have your favorite webpage retrieved
automatically, and this article sets out to explain how with
sirobot and some hands-on examples of a number of popular
sites:
"Monolithic applications are great. I use them and
enjoy such programs as KDE Konqueror, Mozilla, the Pan newsreader
and others. However, when it comes to doing custom things,
sometimes it's useful to use more basic tools that allow you to
hook them up with other programs such that they cooperate to get a
job done.
In this article I propose to take a look at ways to minimize
your online time but still get your favorite website. I'll be using
the sirobot web mirroring tool and hooking it up with perl to do
the dirty work. Here is what the man page says about sirobot:
...The problem isn't pulling down a certain page, the problem is
figuring out the right syntax to use for the url. Doing this
requires an analysis on a case by case basis. Before we get
started, I'll admit that some sites are not amenable to pulling
down. For example, on Kuro5hin.org if you wanted to pull down a
specific story in flat mode, how would you do that? How would you
get the url for that? From my examination, it isn't something that
appears in the web browser url window, when you're in flat mode so
without a url, we can't pull down a story in flat mode. But let's
look at some sites that seemed to work."
Complete
Story