Earth Notes: On Website Technicals (2018-10)

Updated 2022-10-23 13:48 GMT.
By Damon Hart-Davis.
Tech updates: preparing for the new RPi3 with 256GB of microSD card and BBR, app inventory, Bing crawl efficiency, info image and AMP.
tools
Amongst other noise this month, puzzling comments on the value or even harm of alternate (m-dot/AMP) pages from a big cheese at Bing. Though I note that at the turn of 2019 Google is now diverting ~25% of searches to my AMP all at the cost of my m-dot pages...

2018-10-31: Informational Image Appearance and AMP

I'm attempting to allow the generation of AMP-compliant pages. I'm not sure yet whether AMP is in fact a good idea for this site.

One big problem with this is that AMP does not allow use of normal HTML(5) img tags, and my pages are indeed full of them, hand-crafted. More than a decade's worth!

I have created a more 'portable' and restricted EOU 'IMG' tag with essentially only src and class attributes. (I can use case since all I have all normal HTML tags lower-case.) The src must be a local image under src/. The class must be a single class that is one of a subset of the site's responsive image types. If these tight conditions are not met then the image is not inserted and an error is generated, though the raw tag should in fact always be valid (if not optimal) HTML5.

If the EOU IMG tag passes muster then the hero image autogeneration mechanism is used to spit out a suitably scaled image. It inserts an img tag with auto-generated width height and alt. These are extracted from the actual image's size and filename. (For AMP, a amp-img is used instead.)

The IMG tag is quick and easy to write. It's also less error-prone than manually sizing a 'thumbnail' image and constructing the whole browser-friendly HTML5 img tag to use it...

See a "KODA House" simple example, floated left.

KODA House front

These autogenerated images are space-efficient (relatively light-weight). There is also the possibility in future to use picture to have the browser fetch (say) an even lighter-weight WebP image if that format is understood by the browser.

A feature already implemented has the img tag inserted for the mobile/lite/AMP version refer to an image pre-scaled down to the maximum size that could responsively be seen on the presumed 640px-or-narrower screen. And the Save-Data lower-fi image version is auto-created for miserly browsers that request it. Thus saving even more bandwidth!

One wrinkle is that these images are not purely decorative 'hero' images, but are intended to convey useful information, so the generation script has been tweaked to allow more bits per pixel for such images on mobile. Indeed potentially up to desktop image 'quality' settings. (The cap on image size remains for now lower than desktop even for these.)

This 'nicer' mobile image licence could be extended to the carousel images on the home page, for a better user experience there.

These informational image sizes should match the carousel images to try to allow (at least eg on desktop) the same images to end up being referenced in both cases, improving cache hits for visitors.

2018-11-01: follow-up note: I have segregated 'hero' from 'body' images, eg in separate cache directories, and allowed the latter to retain their aspect ratio.

It's evident from the logs that plenty of browsers narrower than 640px are arriving at the www/desktop site, so it would be worth extending the IMG output HTML to use srcset to have those use the mobile version of each image in that case for further bandwidth savings.

2018-11-03: I have added code to insert a srcset for both desktop and mobile image versions where both are present. This allows a small device arriving at the desktop site to fetch the smaller mobile image and save some bandwidth.

2018-10-24: Crawl Efficiency and Split Signals

Following on from a talk at SMX by Frédéric Dubut @CoperniX Safety PM @Bing (fighting webspam, malware and other bad stuff), Barry Schwartz @rustybrick said:

It is very important for crawling efficiency to reduce duplicate content according to Bing's @CoperniX #smx

Don't have lots of useless pages, 404 pages, etc. secondary files like JS, CSS, etc. m dot URLs are essentially duplicate URLs and impacts crawl budget.

To which I responded:

mm, I can't agree that m. pages are (always) dupes. For my key site [EOU] I have a pre-trimmed (but still responsive) experience for smaller devices on more expensive and higher-latency connections.

Then @CoperniX said:

Generally speaking, having both www. and m. means you split signal and crawlers have to crawl both URLs, which is suboptimal of the URL structures are otherwise the same. We do not recommend it but like everything on the web YMMV and it could still work for you.

I countered with:

... Note that the intention is to improve UX (eg everything key is fetched in first round-trip) for network/device constrained users, but the www. page is in each case marked link rel=canonical, and the m. as rel=alternate media=" ... max-width:640px)"

To which @CoperniX replied:

If you canonicalize m. to www. then you mitigate most of the signal split and a good part of the double crawl.

Phew! I'd rather optimise for the end user than the crawler...

2018-10-23: RPi3 App Inventory

I am taking the opportunity of the (re)build to construct an application inventory for EOU and other uses. This way unused apps that are not actually used get implicitly 'garbage collected'... I also get to discover which scripts and so on fail badly when an expected app is missing. I can then choose to install the app or fix the script not to need it.

2018-10-13: Preparing for the Raspberry Pi 3 Upgrade

Thinking about the logistics of bringing up the new RPi3 server, it seems to me that it will be tricky enough restoring the current capabilities (eg working out all the packages to reinstall and upgrade) and getting the new networking right (doing without the existing router to save ~8W and some downtime).

So the desired changes that provoked the upgrade (supporting HTTPS and HTTP/2, and probably Brotli) will have to wait until everything existing is back to where it was (though a little faster). It would be good even before then, but apparently essential to make HTTP/2 work well, to switch on BBR and tcp_notsent_lowat etc.

(2018-10-22: I have added the BBR settings for TCP.)

I have ordered storage for the new RPi3. 256GB of fast non-volatile storage in the size of a fingernail, around £60 retail, and requiring tiny amounts of power. When I tell the kids these days that my entire university's storage in 1986 was 1.5GB they don't believe me!

~1208 words.