Does anyone have suggestions for tracking updates to sites that don’t have RSS feeds? Ideally whatever script would output RSS that my feed reader can then subscribe to.

I’m hoping to watch a bunch of different classifieds sites that probably have relatively-consistent HTML, and not have to load them manually every day.

Wiring up automation for a new hobby 😁

@mathiasx What tools do you have to hand? What I've done in the past is write a script that curls the page and does a byte-wise sync comparison against a local copy. If the change is large (configurable) then it emails me.

I don't run it any more, but since it was just for me it was fairly short and simple.

I don't know of any service for this.

@humanetech @ColinTheMathmo thanks! This or Scrapy (suggested by @logoninternet ) and fiddling with parsing the HTML right should work!

I’ve got a little yunohost server that can run whatever language and DB is needed, but haven’t hand-rolled a service definition on it yet. Might be time for one for RSS Bridge or whatever I make like it.

@humanetech @ColinTheMathmo @logoninternet turns out RSS-Bridge was on yunohost already. Giving it a try, will have to write my own bits to parse the HTML, but it doesn’t look so bad.

@mathiasx there was a good free change detection service I used to use, maybe I can dig it up.

@mathiasx it was, which has been rebranded but might still work?


@mathiasx Follow That Page ( sends you an email when the content of up to 20 pages change.

h/t @trashheap for the find

@hypolite @trashheap at least one of these is a Joomla that returns 503 but loads in a browser — either it knows it is being scrapes and needs a browser-like user agent, or it doesn’t like the server in The Netherlands. I guess I’ll have to write some glue code.

@mathiasx I haven’t used it myself but I’ve heard Feedly has something like that, and the feeds it creates are usable outside of Feedly too

