About pfetch

For a long time now, I’ve had various cron jobs running to fetch various web resources with which I’d build out parts of my own site, or supply myself with custom RSS feeds after a pass through xsltproc.

This mostly worked OK, but there were a few things wrong with it:

I had to be careful to avoid putting stuff in place upon fetch failure.
Fetch failures would send me email unless I put effort into avoiding that.
Network timeouts would cause cron jobs to start piling up.
I’ve actually had cron get sick of running my jobs and just stop altogether.
Various jobs that ran at various frequencies would be in various scripts and hard to keep up with.
Running through cron means all jobs start at the exact same moment in time, thus are more likely to cause strain on web servers (if everybody does it).
Conditional gets require cross-invocation state to be stored (though I wrote a tool for this).
Sequential processing meant the whole thing took longer.

After a while, the problems added to enough of an annoyance that I decided to do something about it, so a couple months ago I started pfetch.

pfetch is a simple twisted app that does scheduled parallel http requests and optionally runs scripts after successful execution.

Given a list of URLs each with a destination, frequency, and optional (with arguments) to run after each successful (200) response, each URL will begin a fetch cycle starting at a random offset from the start time and loop on the defined interval.

About pfetch

Related Posts