Good questions! The system is mostly PHP - I just used what I am most comfortable with. The library ecosystem for PHP is great, but I’m not doing anything that Ruby/Python/Node couldn’t do.
This is the biggest challenge, but I think it is surmountable. I’m using a headless lightweight browser and a scraping engine that allows me to build scrapers using a web interface. It allows CSS and XPath expressions, plus finding links, clicking them, simple loop variables, etc. It is fairly stable across ~50 small employers - people don’t often change their sites, and of the ones that hand-write their broken links, there’s a point where you just have to drop them as a scrape target.
The best tip I have so far is to make your crawlers easily and quickly editable, so you can keep them up to date.
I have some time off coming up, so maybe I will get a chance to do some more work on it. Good luck with your bugs!