Anyone know a good web spider/crawler? I want something which can start with a given URL and retrieve pages to a specified depth. Search capability with support for special characters is a bonus.
It is. (You're overestimating your dash use, I think. My unadjusted prose style is far more punctuation-happy than yours, at least judging by livejournal.)
Hrm; I may be—I've been been getting into semicolons lately, and trying to moderate my em dash use and write in sentences of less than paragraph length. While those work well enough for academic papers, they aren't as great in the real world.
(I should note that the application I have in mind is a quick-and-dirty search process for the MNA which could be implemented by people who aren't particularly computer-geek-oriented, so though wget does what I asked for, the way it does it may not be ideal.)
This (http://www.xav.com/scripts/search/) is what the An Tir Heralds site uses. I don't know how it deals with special characters, but it's worth checking.
I'm rusty on the details because it's been a while since I set it up, but I'm pretty sure it can do what you want -- for example, IIRC, the site search on An Tir Heralds searches both antirheralds.org and the Internal Letter archive on Badger's server. (There is also a search that can be done that only searches the ILs, but I am pretty sure I set up the main search to do both servers.)
I would go check on it to refresh my memory on how it's set up, but I have to leave the house right now.
(no subject)
Date: 2006-04-28 11:49 pm (UTC)wget should do what you need.
(no subject)
Date: 2006-04-29 12:09 am (UTC)(no subject)
Date: 2006-04-29 01:15 am (UTC)(no subject)
Date: 2006-04-29 01:17 am (UTC)(no subject)
Date: 2006-04-29 01:29 am (UTC)(no subject)
Date: 2006-05-01 12:58 am (UTC)(no subject)
Date: 2006-05-01 01:16 am (UTC)(no subject)
Date: 2006-05-01 01:35 am (UTC)I would go check on it to refresh my memory on how it's set up, but I have to leave the house right now.