Earth Notes: On Website Technicals (2021-12)
Updated 2024-04-26 18:31 GMT.By Damon Hart-Davis.
2021-12-31: IndexNow
What else is there to do while waiting for New Year other than set up IndexNow support?
I have this set up incrementally pushing changed, reasonably-new, main pages. This rotates randomly between IndexNow participants, since they should share all URLs that they receive.
This was provoked by Bing suddenly (unannounced, in the last few days) stopping accepting sitemap pings, rejecting them with a 410 Gone
.
This is pretty much the entire support in the makefile
:
# Recreate/expose the IndexNow key as necessary. # It is not built in to the makefile since it is meant to be 'secret'. # https://www.indexnow.org/documentation IndexNowKeySrc=.work/IndexNow.key.txt IndexNowKey := $(shell cat $(IndexNowKeySrc)) IndexNowKeyFile := $(IndexNowKey).txt all:: $(IndexNowKeyFile) $(IndexNowKeyFile): $(IndexNowKeySrc) @echo "Rebuilding $@" ln $(IndexNowKeySrc) $(IndexNowKeyFile) chmod a+r $(IndexNowKeyFile) # Ping main-page updates to IndexNow and remembers which have been done. # https://www.indexnow.org/ # Errs on the side of under-reporting. # Submits updates incrementally. # Only considers pages up to a few days old. IndexNowMaxDaysOld=7 # Eliminates explicit 'NOINDEX' pages. # Does not attempt to ping any one page more than once between updates. # All pings could be sent to the primary (URL1) or can be shared at random. IndexNowSEURL1=https://yandex.com/indexnow IndexNowSEURL2=https://www.bing.com/indexnow IndexNowFlags=.work/IndexNow.flags .PHONY: IndexNow.ping IndexNow.ping: $(WORKTMP)/IndexNow.ping all:: $(WORKTMP)/IndexNow.ping $(WORKTMP)/IndexNow.ping: makefile $(IndexNowKeyFile) $(SCWPAGES) @echo "Rebuilding $@" @$(LOCKFILENRSLOW) $@.lock @for f in `find $(PAGES) -mtime -${IndexNowMaxDaysOld} | sort -R`; do \ if egrep -q '<!-- *NOINDEX *-->' .$$f; then continue; fi; \ count=0; \ n=$$f; \ flag=${IndexNowFlags}/$$f.log; \ if [ ! -f $$flag -o $$f -nt $$flag ]; then \ echo IndexNow: $$n; \ URL=`( echo ${IndexNowSEURL1} ; echo ${IndexNowSEURL2} ) | sort -R | head -1`; \ wget -O $$flag "$$URL"'?url=$(URLLISTPREFIX)'"$$n"'&key=${IndexNowKey}'; \ count=1; break; \ fi; \ done; \ if [ 0 = "$$count" ]; then echo "All done..."; touch $@; fi @/bin/rm -f $@.lock
2022-01-05:
I note that if I submit a URL to Bing or Yandex, Yandex spiders it immediately. But I don't think I've seen Bing respond at all to a URL submission.
But I can see the Bing-IndexNow submitted URLs in the appropriate section of the Bing Webmaster Tools, with a date and time. They appear immediately, given a (BWT) page refresh.
2021-12-27: Race to Crawl
2021-12-13: Reviews Healing
2021-12-04: AMP Off
2021-12-03: Sitebulb 5.4.0
I was sent a canny "Would you like to try again?" marketing email from Sitebulb. So I did, and this time I have ponied up for a 'Lite' licence at least for now. I cannot possibly justify the expenditure as it is basically all my ad revenue, but I have found the tool helpful in its trial version for a number of things. So I think that Sitebulb ought to have at least a little of my money...
A number of observations:
- I cannot start a new project with my Mac's internal firewall turned on, which is ugly. But I can then (re)run projects with it on.
- The
schema.org
structured microdata parsing is currently broken; it does not understand multiple values in a singleitemprop
(oritemtype
) attribute. - Paying for a Lite licence terminates access for the Pro features usable in the trial licence: don't pay up too soon!
- Running a Sitebulb crawl against my Apache (2.4.25) MPM Event configuration caused Apache to stop responding after a while. I switched to MPM Worker, though it may need to be trimmed a little to conserve memory.
Product: Sitebulb Website Crawler 5.4.0
- Brand: Sitebulb
- MPN: 5.4.0
- InStock
- GBP12 for Lite single user monthly including VAT valid at/until:
Review summary
- 14-day free trial upgraded to Lite
- As previous (2.0.2) review I found Sitebulb desktop website crawler to perform a thorough crawl and cross-check of many aspects of a site's content and behaviour. Encountered a problem with incorrect parsing of schema.org multi-value itemprop attributes, and inability to create a new project with the Mac's internal firewall enabled. Tested on x64 Mac OS X and x86 Windows 10 laptops. Support remains friendly and good. I upgraded to a paid (Lite) account.
- Rating: 4/5
- Published: