Earth Notes: On Website Technicals (2021-07)

Updated 2023-09-22 19:34 GMT.
By Damon Hart-Davis.
Tech updates: AMP be gone, going, HTTPS m-dot, WebP footling, AMP gone, WebP lo-fi, not much Save-Data, yak shaving, ate my hamster.
Screenshot 20210816 GSC AMP chart pages zero
I turned off AMP this month (AMP-ectomy!), which should be enough excitement for anyone. The AMP residue and GSC fall-out lingered well beyond month-end... I don't know when beyond eg Google settling down, the SEO rubble will mainly have stopped bouncing.

2021-07-31: AMP Ate My Hamster

The page-experience / CWV Google Search Console oddness continues!

Screenshot 20210731 AMP stubbornly not gone
GSC is still reporting the AMP page count at 7. (Recent peak was 198 on 2021-05-06.)

The Page Experience headline is currently Your site has 100% URLs with a good page experience. But this has been flip-flopping daily, and the graphs are very odd!

Screenshot 20210731 page experience
GSC Page Experience graph: note the 2-day gap.
Screenshot 20210731 Core Web Vitals oddity
GSC 'Mobile' Core Web Vitals chart: note the 2-day non-zero part (just shy of 70), which overlaps on the 27th with the 2-day gap above.

2021-07-27: AMP Be Going?

GSC is this afternoon reporting the AMP page count back at 9!

But from yesterday "page experience" being all green, I'm back at: Your site has no URLs with a good page experience.

The page experience graph simultaneously says 100% of my URLs are good, while saying that I only had 5 Total impressions from good URLs.

2021-07-23: AMP BE GONE

GSC is this afternoon reporting the AMP page count up from 9 to 10!

At least some of these seem to be where Google has decided that the HTTP version of the full-fat desktop www page is canonical, rather than the HTTPS version.

2021-07-22: Yak Shaving

I'm shaving a few more bytes from error pages such as 404. That reduces (marginally) the tax from broken bots and spiders and rogues scanning the site for vulnerabilities.

In the first instance I am dropping specialist 'print' media tweaks. Who is printing the 404 page and thus who cares?

I am also dropping the schema.org metadata, since nothing should care about it either in such noindex pages.

Before:

% ls -fl {,m/}404.html{,gz,br}
1536 404.html
 774 404.htmlgz
 564 404.htmlbr
1234 m/404.html
 660 m/404.htmlgz
 455 m/404.htmlbr

So far (65+ bytes or >10% saved for HTTP/2 Brotli-supporting clients):

% ls -fl {,m/}404.html{,gz,br}
1293 404.html
 670 404.htmlgz
 486 404.htmlbr
 991 m/404.html
 554 m/404.htmlgz
 390 m/404.htmlbr

Good and Bad

GSC is still stubbornly reporting 9 AMP pages. Meanwhile GSC has decided that almost none of my page impressions is 'good'. But as of this evening Your site uses HTTPS, and I have an overall good page experience.

Screenshot 20210722 page experience bad impressions
Your site has 75% URLs with a good page experience

2021-07-18: Little Save-Data

I looked for for Save-Data: on requests by checking for input='on' pattern='on' in the ErrorLog overnight, having turned on rewrite logging:

LogLevel alert rewrite:trace6

There was evidence of just one genuine third-party request, for an embedded intensity button in my profile at another site (Fieldlines). Not apparently a single direct article view though.

I was able to trigger such requests via WebPageTest.

2021-07-17: AMP Gone

GSC is reporting AMP impressions down from ~1000 per day to under 70.

As of this I am going to manually remove all the generated AMP pages. Nothing still seems to be (accidentally) updating them, so they should stay gone.

Note that I have to leave a home page (index.html) in place for AMP for now to avoid a bare directory listing. It is marked 'noindex'.

If I make no further changes then AMP article requests will get 302 (temporary) redirects to appropriate m-dot pages. Possibly that should become 301 (permanent) at some point.

Screenshot 20210717 GSC AMP chart
AMP impressions declining further...

I have also added support for serving .webpL files when there is a Save-Data (on) header.

Lo-fi WebP: webpL

I have also attempted to make the Save-Data value match case-insensitive:

# Serve smaller images/audio for Save-Data clients if possible.
# Ensure caches handle Save-Data correctly.
<FilesMatch "\.(jpg|png|webp|mp3|mp4)$">
  Header append Vary "Save-Data"
</FilesMatch>
# If client has Save-Data header set (to "on", case-insensitive).
RewriteCond %{HTTP:Save-Data} on [NC]
# ... and if the lo-fi image/audio exists...
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME}L -s
# ... then send .xxxL content instead of .xxx hi-fi original.
RewriteRule ^/(.+)\.(jpg|png|webp|mp3|mp4)$ /$1.$2L [L]

2021-07-13: AMP Downslope Momentum

I'm looking for pages that are still showing as having AMP in GSC, and manually submitting reindexing requests. Only a handful, but it all helps.

I'm trying to pick pages that Google is likely to be relatively slow to refresh, eg less-well-ranked ones.

Screenshot 20210713 GSC AMP chart
AMP impressions declining even faster than indexed pages.

2021-07-12: M-Dot Higherer

The GSC reports for crawling for m-dot pages, with latest data for the 9th, shows ~120 per day compared to 20-something on previous days. AMP crawling is below 20 per day. Overall (and www) crawl rates are fairly constant. So the bot has apparently switched most AMP crawl budget to m-dot.

The Page Experience report is showing the impressions of good URLs ticking up from a low on the 9th (56/d vs 756/d on the 2nd).

All this lines up with adjusting the header and navigation 'lite' page link now to point to https://m. rather than http://m., as hoped!

Small even smaller

I'm trying to reduce the overhead on small pages further, to be more like lite pages:

% ls -alS m/OpenTRV-protocol-discussions-201412-3.html
4095 m/OpenTRV-protocol-discussions-201412-3.html
% ls -alS OpenTRV-protocol-discussions-201412-3.html*
5533 OpenTRV-protocol-discussions-201412-3.html
2053 OpenTRV-protocol-discussions-201412-3.htmlgz
1664 OpenTRV-protocol-discussions-201412-3.htmlbr

Avoiding 'extra' page image/video metadata and trimming invisible metadata precision for dates for small pages gets to:

% ls -alS OpenTRV-protocol-discussions-201412-3.html*
5076 OpenTRV-protocol-discussions-201412-3.html
1968 OpenTRV-protocol-discussions-201412-3.htmlgz
1593 OpenTRV-protocol-discussions-201412-3.htmlbr

Turning off SpeakableSpecification support for noindex pages gets close to brotli-compressed content being able to fit into a single TCP frame (~1400 bytes) like the 'lite' version:

% ls -alS OpenTRV-protocol-discussions-201412-3.html*
4839 OpenTRV-protocol-discussions-201412-3.html
1905 OpenTRV-protocol-discussions-201412-3.htmlgz
1524 OpenTRV-protocol-discussions-201412-3.htmlbr

A bit more trimming of metadata not needed when noindex:

% ls -alS m/OpenTRV-protocol-discussions-201412-3.html*
4089 m/OpenTRV-protocol-discussions-201412-3.html
1617 m/OpenTRV-protocol-discussions-201412-3.htmlgz
1283 m/OpenTRV-protocol-discussions-201412-3.htmlbr
% ls -alS OpenTRV-protocol-discussions-201412-3.html*
4751 OpenTRV-protocol-discussions-201412-3.html
1870 OpenTRV-protocol-discussions-201412-3.htmlgz
1496 OpenTRV-protocol-discussions-201412-3.htmlbr

~10% weight reduction for the maximally-compressed desktop page!

2021-07-11: M-Dot Higher

Searching on my mobile in Google for the same term that reliably brings up an EOU page prominently as a few days ago, now brought up the (HTTPS) m/lite page.

So Google is now maybe using https://m. as the preferred target for mobile searches.

2021-07-10: WebP Footling

Much as I am keen to use JXL (JPEG XL), I am first going to have a go at using WebP as a more compact (lossless) alternate for hero PNGs.

Hero images are provided in picture elements, and at a single resolution (though depending on wide/narrow viewport).

This enables folding in WebP versions of a PNG where smaller. I should fall back to PNG for older browsers, though most support WebP. It is even worth using an inline WebP image version, with a PNG out-of-line fallback, as few will need to fall back.

This should all only be done where the WebP image can be produced, and that saves many more bytes than the overhead of the extra HTML needed!

The following incantation seems to produce a smaller WebP than source PNG via ImageMagick on both Mac and RPi:

% convert train-fast.png -define webp:method=6 train-fast.webp
% ls -al
26736 train-fast.png
22584 train-fast.webp

Sometimes convert does better with -quality 100 also, but sometimes worse.

I should do this for the .pngL lo-fi versions too.

In each case, let the PNG version determine inlining etc, but if a smaller WebP version is available, put that in ahead, and use the PNG as fallback.

I'm having to add support for .webp with MIME type image/webp, and the .webpL and .webpLL suffixes too. Eventually the same Apache support to switch to the L version with Save-Data will need to be added.

This works in at least some (non-inlining) cases, but not for example when the source image is suitable (eg light enough) to use as-is.

It does potentially save a few hundred bytes for every single 'tools' hero load in these site-technicals pages...

% ls -al img/a/h/tools-1280w.l354283.*
2584 img/a/h/tools-1280w.l354283.640x80.l.png
2004 img/a/h/tools-1280w.l354283.640x80.l.png.webp
2043 img/a/h/tools-1280w.l354283.640x80.l.pngL
5399 img/a/h/tools-1280w.l354283.800x200.png
4486 img/a/h/tools-1280w.l354283.800x200.png.webp
4434 img/a/h/tools-1280w.l354283.800x200.pngL

All these results are on the Mac, where cwebp is version 1.2.0. On the RPi server, with version 0.5.2, output file sizes are much larger, and so the WebP images are not being deployed.

Much better results seem to be happening on the RPi side (and marginally less good on the Mac) with:

cwebp -lossless -m 6 input.png -o output.webp

Which suggests that ImageMagick is, unusually, not helping...

Adding -q 100 adds effort, and should generally result in smaller files.

2021-07-09: M-Dot HTTPS

The dropping of AMP pages has slowed right down, presumably as the better-ranked ones have now been re-digested with 'noindex'. I assume that others are polled/re-read less frequently, so there is likely to be a natural asymptotic-like decay. (Without me manually requesting/forcing re-indexing, anyway.)

I'm not now going to wait for all the dust to settle: I'm going to make the m-dot / lite official view HTTPS. The HTTP side will remain available, but I'm going to see if GSC gets happier.

A fairly gloomy day so forced a rebuild of the desktop pages (then the rest) containing the navigation links to the now HTTPS m-dot variants.

I have also added a AMPDEPRECATED flag to the makefile to parallel the one in wrap_art to gradually turn off parts of the AMP support. The first step was removing the references to AMP pages in sitemap.xml, but I have removed most automatic AMP-page building. It's still possible to build AMP pages individually or en masse.

I note from the GSC crawl stats that the AMP page crawl rate more than halved on July 2nd to mid-30s pages per day.

Screenshot 20210709 GSC page experience FAILING
Screenshot 20210709 GSC page experience FAILING
GSC unhappy mobile page experience...

2021-07-06: M-Dot Ascendant

Searching on my mobile in Google for a term that reliably brings up an EOU page prominently, whereas it always used to bring up the (https) AMP page, has now brought up the (HTTP) m/lite page. (I saw the (https) www page be brought up once, a couple of days ago.)

To make the warning about "too many http URLs" go away, I may have to link to lite and lite-http pages in the navigation bar, and make the https://m page the official alternate to the canonical. I'll wait for the AMP page removal to settle before trying that...

2021-07-04: AMP Be Going

GSC and thus Google is quickly purging EOU AMP pages from its index, down to 99 reported cf 187 before starting, and against ~300 actual!

In the GSC page experience section I'm getting a slightly alarming (red) Insufficient HTTPS coverage on your site warning ... If your site has too high a ratio of HTTP URLs, you will see warning banner on your site, and the HTTPS section will show Failing. This is possibly because I had 3 sets of HTTPS pages (www, m, amp) and two HTTP (www, m), but I am down down to 2 and 2.

Screenshot 20210704 GSC AMP chart
Crawl rates a few days after turning AMP off (99 AMP pages reported), from GSC.

2021-07-01: AMP Be Gone

Starting at about noon today I put the AMP-be-gone programme in place. Today's step is updating the page-build script to make all AMP pages as noindex. Also, removing the explicit and header amphtml cross-site links so as to orphan the AMP pages.

Just removed from the site guide:

As of December 2018 there is also an AMP site version, much like the mobile/"lite" version, but possibly faster for vistors from Google search for the first page at least, because of the AMP cache. Not every page can be reproduced for AMP because of restrictions that the AMP format imposes, and a few minor features may be missing from all AMP page versions.

I captured a screenshot to remind me of current crawl stats.

Screenshot 20210701 crawl stats
Crawl rates at point of turning AMP off, from GSC.
~1974 words.