Earth Notes: On Website Technicals (2018/12)Updated 2019-04-22 12:01 GMT
2018/12/30: AMP Cache Oddities
As reported <amp-img> (at least via cdn.ampproject.org) inserts bizarre non-optimal srcset #20104, some of the things done in the AMP Cache seem unhelpful, even if most seem sensible.
Here is an extract from the report, made today:
What's the issue?
The AMP cache (at least for cdn.ampproject.org which I can observe) deoptimises image access if presizing has already been done. A srcset is added that specifies several (fictitious) image versions all larger than the original.
How do we reproduce the issue?
For example, in http://amp.earth.org.uk/note-on-survey-results.html the line:
<a href=http://gallery.hd.org/_c/mechanoids/UK-Millennium-Dome-voting-ticket-credit-card-sized-uniquely-coded-tweaked-1-DHD.jpg.html><amp-img src=http://www.earth.org.uk/img/a/b/UK-Millennium-Dome-voting-ticket-credit-card-sized-uniquely-coded-tweaked-1-DHD.l95176.211x330.jpg layout=intrinsic class=respfloatrsml width=211 height=330 alt="vote/survey" title="vote/survey"></amp-img></a>
gets expanded in the AMP cache in https://amp-earth-org-uk.cdn.ampproject.org/c/amp.earth.org.uk/note-on-survey-results.html to:
<a href=http://gallery.hd.org/_c/mechanoids/UK-Millennium-Dome-voting-ticket-credit-card-sized-uniquely-coded-tweaked-1-DHD.jpg.html target=_top><amp-img alt=vote/survey class=respfloatrsml height=330 layout=intrinsic src=https://www-earth-org-uk.cdn.ampproject.org/i/www.earth.org.uk/img/a/b/UK-Millennium-Dome-voting-ticket-credit-card-sized-uniquely-coded-tweaked-1-DHD.l95176.211x330.jpg srcset="https://www-earth-org-uk.cdn.ampproject.org/ii/w220/www.earth.org.uk/img/a/b/UK-Millennium-Dome-voting-ticket-credit-card-sized-uniquely-coded-tweaked-1-DHD.l95176.211x330.jpg 220w, https://www-earth-org-uk.cdn.ampproject.org/ii/w470/www.earth.org.uk/img/a/b/UK-Millennium-Dome-voting-ticket-credit-card-sized-uniquely-coded-tweaked-1-DHD.l95176.211x330.jpg 470w, https://www-earth-org-uk.cdn.ampproject.org/ii/w680/www.earth.org.uk/img/a/b/UK-Millennium-Dome-voting-ticket-credit-card-sized-uniquely-coded-tweaked-1-DHD.l95176.211x330.jpg 680w" title=vote/survey width=211></amp-img></a>
with these entirely spurious srcsrc entries all nominally larger than the 211x330 original.
It's a waste of HTML and processing time at best on the client, and at worst makes for images that are sent larger than necessary and require client CPU memory and time to resize.
I had a quick response from 'Gregable' 2019/01/02:
None of these images are actually larger than the original. The
/ii/w680is indicating a maximum width, not the actual width. The cache doesn't actually return an image that large, it returns an image of min(original width, indicated width).
You are correct in that in this case, the
srcsetis not actually helping any. All of the images in the srcset are essentially the same. So in theory it's adding bytes to the HTML document and some very minimal CPU for parsing the
srcsetstring. That said, I think these are probably not worth worrying much about. The extra bytes in the document are going to get gzip compressed away generally.
The reason for this is that the image and document are cached independently. When the srcset is generated, the cache doesn't know the image dimensions. This is done for a few reasons:
- It's possible that the image could be updated and change dimensions for example. Updating all referencing document when the image changes is possible, but increase cpu costs on the server.
- It also means that in a cold cache cache, the document cannot be returned to the user until the image has been fetched, which slows down delivery of the document. This has a far more significant affect on user experience than the CPU cost of unnecessary srcset parsing.
There is also a comment in this amp-img issue discussion suggesting that what should really happen is to have
It turns out that the only current way to suppress this inserted
srcset is to have one of my own already there. Apparently
srcset=" " should do, but I could actually insert a useful
srcset with a new smallest entry for both desktop and mobile, maybe the
size of the smallest carousel entry (200px wide), as long as (say) 20%
narrower than the existing smallest entry. And also benefiting from an
L' version for
Oddities notwithstanding, PageSpeed Insights rates the AMP (and m.) version of this page 100/100! (The www. version is 57/100 because of Google ads!)
I just became aware of the new
img tags, to try to help performance by deferring
the decode step. AMP apparently applies it to all
I am testing applying it to all body
IMGs, most of which
will not even be above the fold for mobile.
One of the older discussions gives the best description of apparent intent for this declarative usage, better than the standards! C/o vmpstr on 25 Oct 2016:
Give the async attribute three values: async, auto, and sync. sync behaves as today's image elements do without any attribute specified, where once an image has been loaded it will appear immediately (and be decoded synchronously) if inserted into the document. Images marked async can get loaded/decoded best-effort without janking. auto would involve browser heuristics to decide if the image could and should be loaded async or if it needs to be sync. async is therefore just a more aggressive version of auto in practice. sync is mostly just there as a safety valve for developers in case browser heuristics get it wrong.
2018/12/28: Featured Snippet!
Doing a Google search from mobile for one of my key term, my "Why XXX?"
heading followed by the start of the following para, showed up as a
featured snippet. The same (
m.) page shows up as a normal
SERP entry a little further down with rich text (for a review) and the
Note that there is no special (eg schema.org) markup around the snippet,
just a clear short question in the (
h2) heading, and a simple
short and sweet para immediately underneath answering it.
2018/12/27: AMP Social Media Buttons, and Tests
Tentatively, I have added sharing with amp-social-share.
In all the recent upheaval, AMP support included, a number of things seem
to have been silently broken such as dropping the
I have added unit tests for the test page to cover variants of some of the issues found today.
2018/12/23: AMP Live Today
The amp.earth.org.uk site went live!
(Incidentally that home page can be accessed also via the Google AMP Cache.)
In such a case, no AMP page is created nor linked to, and any attempt
to access it will be redirected to the vanilla mobile/lite
An annoying issue: the AMP cache knows that it's loading from an
http: source, and rewrites relative links to absolute
http:) links. But it doesn't rewrite protocol-relative
(scheme-relative?) links (eg
to absolute, so they will fail trying to reach a currently
https: server. So I'm now introducing
//WWW.earth.org.uk links to
match the existing
//STATIC.earth.org.uk links, that get
re-written to the primary absolute form, though remain syntactically
valid as raw HTML.
2018/12/24: with some prodding by me, Googlebot is sucking in the AMP pages and starting to report in the Search Console. Interestingly I've just had a complaint about omitting some metadata (embedded video schema.org markup) from the AMP page that is in the desktop page. I've never had such a complaint about a linked m-dot/lite page which does the same. So I've fixed the build script to show all of that related metadata in all versions now.
Now all but 15 (out of ~207) pages have AMP versions. I manually fixed up nearly 600 links to deal with the protocol-relative issue also...
2018/12/21: AMP and Inline CSS Styles
I had somehow convinced myself that AMP did not allow inline CSS
style in the page body. So I did a lot of work to eliminate
common inline styling, partly because doing so can also make the page
smaller. But I was sure that it was going to be a big problem for many
However, inline styling seems not to be a signficant problem since
only ~40 of the ~200 main pages are failing to validate in AMP form.
And many of those have simple/known
The validator uses the latest published set of rules to apply across
the network, which means that
are already passing validation (hurrah!). But it also means that
attempting to build and validate AMP pages fully off-line, as I can
with desktop and vanilla mobile, will result in:
ERROR: validation: Unable to fetch https://cdn.ampproject.org/v0/validator.js - getaddrinfo ENOTFOUND cdn.ampproject.org cdn.ampproject.org:443
I'm not sure that I want to be prodding the CDN for every single page rebuild, for a number of reasons.
Maybe amphtml-validator-rules would be part of a mechanism to help me work more locally.
Google's Search Console still objects to the
declaring the AMP page to be 'invalid'.
2018/12/20: Lighter Error Page
I have reduced the size of the custom 404 error page to 894 bytes of body when pre-GZIP-compressed, plus ~340 bytes of desktop HTTP/1.1 headers. Thus the HTTP response for it should be able to fit in a single TCP frame to most clients. The mobile/lite page is even smaller.
All informational footers, and social-media header support such as
twitter:card, are omitted for
such an error page. This saves ~200 bytes from the GZIPed size.
There is a little more that could be stripped out, eg a little residue of social media button support (~80 bytes uncompressed) that could go, and would benefit all desktop pages not needing such support.
2018/12/15: Speakable Markup
Though it's unlikely to be used any time soon (ie probably only for US-originated Google News searches for now), I'm starting to fold in some support for the 'pending' Schema.org 'speakable'.
I was partly spurred on by the relevant parts of the discussion in What to Expect in 2019 with Google's John Mueller.
This may in future help screen readers and voice searches. Being another site providing this data may in a tiny way speed its adoption.
Google's documentation has this firmly marked as BETA for now. (Also see "Add vocabulary to indicate which sections of a document are particularly 'speakable'".)
Note that Google's docs say not to use both
cssSelector, but I am picking out title and description with
the former and an optional intro para with the latter. I have split the
structured microdata for the latter into its own
and Google's Structured Data Testing Tool seems OK with that, showing
value for each item correctly.
All the new meta/structured data is at the very end of the document, so not in the CRP (Critical Rendering Path). Hurrah!
Note that HTML minification for the m-dot version rearranges (sorts) tag attributes to try to improve compression. However this seems to silently break extraction by Twitter of some header meta data such as description under some circumstances. So I have stopped doing that particular sorting.
The m-dot minifier also omits the inferrable
head tag, and
this minified HTML apparently defeated parsing when in full precise form:
All is happy again when I slightly generalise the xpath, with minimal risk of picking up stray tags later!
Schema.org SpeakableSpecification Example
Grabbed from the EOU home page (with some wrapping for readability):
<span itemprop=speakable itemscope itemtype=http://schema.org/SpeakableSpecification> <meta itemprop=xpath content="//meta[@property='og:title']/@content"> <meta itemprop=xpath content="//meta[@property='og:description']/@content"> </span> <span itemprop=speakable itemscope itemtype=http://schema.org/SpeakableSpecification> <meta itemprop=cssSelector content=.pgintro> </span>
Note that this is largely fixed because it refers to existing pieces of text.
.pgintro part is left out if the page doesn't have
pgintro chunk of text. The title and description
are always present, however.
2018/12/09: Random Page Build Order
At times there may be more than one
to try to rebuild EOU. For example, while the battery charge is high
make -k all may be run every hour.
In particular, the rebuild of each page has a lock around it. Two or more make processes may end up continually trying to make the same page next, with one of them being excluded by the lock after a timeout, and moving on. The multiple processes tend to stay in lockstep, and all that lock contention wastes time and reduces parallelism.
In general make tries to be reasonably dependable and consistent,
and shaking that up is hard. A reasonable solution for my
on *nx is, given my list of main pages in a "simply expanded variable"
PAGES := pageA.html pageB.html ... another.html
... and given that each page's build is independent of the others, then adding this line afterwards mixes things up:
PAGES := $(shell echo $(PAGES) | xargs -n1 | sort -R | xargs)
This works fine with independent
make runs with ot without
-j to add paralllelism.
The cost is a the execution of the shell command once per make invokation.
2018/12/07: IMG Test Cases
It's a day off for me today, so of course what I do before breakfast
is add a couple of tricky
unit test cases
2018/12/05: IMG and gallery.hd.org
Let joy be unconfined!
IMG to be able to accept as
a (standard) thumbnail URL in my 'CMS'
at least for body images in the first instance.
This means that I need not copy, minify and check-in to the VCS body images for EOU if they are already being hosted in the Gallery.
(This also means that the IMG tag remains valid HTML if not translated, even if it will be non-optimal in a number of ways.)
The appropriately scaled images will still be served from directly
/img/a/ rather than by the Gallery,
and clicking the image will take the visitor through to the Gallery
Which means that I can drop almost any still raster image in, on whim!
2018/12/02: IMG Helps
IMG tag is helping me to spruce up existing pages,
eg in adding new images to them. Even if I never take AMP pages live,
the mechanism is useful. It helps to only need
class attributes. It is proving helpful to be
able to manually set
So, for example,
On Greening Christmas
had its rather poor lone image improved and a new one added,
and that second image was also added to
Low Carbon Family Holidays.
I automatically get smaller lower-weight versions for the mobile pages,
along with really-low-weight
and a link back to the source image if too large to be used directly,
so possibly containing other information of interest to the visitor.
What's not to like?
2018/12/01: Atom Feeds
I freed up a little space on the CRP (Critical Rendering Path) for the home pages (desktop and mobile) and so inserted a header link to the basic Atom site feed. My Firefox "Brief" plugin picks that up and shows a feed button in the URL bar, and I'm hoping that other browsers give similar signals.
Soon I shall drop both the RSS/Atom and G+ 'social media' buttons for each page, and so keeping the feed on the home page in this way may be useful.