Earth Notes: On Website Technicals (2019/02)
2019/02/18: Soft 404
I am puzzled by Google reporting (in GSC) files such as
with a MIME type in the HTTP header of
as "Soft 404". There's nothing '404' about it: it's clearly a data
file, and present, and behaving as expected, not a missing HTML
document for example.
Since Google+ is going away in March/April I have removed the social media button for it from desktop/lite pages. (AMP uses a different mechanism.)
While I am having fun, and to save more page weight, I removed the RSS button, since I saw no evidence of it being used.
Page weight (on first load) should now have dropped by more than 180 bytes.
I will probably tidy up the appearance of the float box that includes the now-shorter button bar, in due course...
2019/02/10: AMP 50% Indexed
AMP pages marked as valid/indexed has been wobbling around the 100 (ie ~50%)
mark for many days. Note that only one residual AMP error is being reported.
(This one apparently from Google's "crawl issue" internal bug still.)
All main canonical pages as listed in
reported as indexed. So it puzzles me why half the AMP version aren't.
2019/02/09: Holding it Wrong: link rel= prev/next
I've been linking sets of pages together, such as in this sequence of
tech notes, with manual links in the page body and
next in the head. It's slightly
tiresome and error-prone work.
link rel part seems simply to be wrong, eg from
"Indicating paginated content to Google":
Note: You should not use this technique merely to indicate a reading list of an article series; you should use this to indicate a single long piece of content that is broken into multiple pages.
I've read various things on this topic, but this seems to be the clearest statement so far.
I've manually removed a couple of manual prev/next pairs between individual article headers as a small quick test and improvement.
But I'd like to do something more systematic for the long series that I have. Eg some fixed metadata that does the right thing in the body of the page, and whatever is appropriate (but probably not prev/next) in the head.
Happily this may trim the head/CRP for all the affected pages. It should certainly save me some manual boilerplate hacking and maint over time!
Now for pages marked as
SERIES, I automatically insert
previous and next links, and
breadcrumb structured data, with a link to the head/unnumbered page if extant:
I'm still tweaking the appearance of the resulting early sidebar.
2019/02/03: Schema.org ImageObject isBasedOn
For hero images used in EOU and derived from external sources,
and for which I have a credit/discussion
I have made two enhancements.
.txt link now gets a
I'm not sure if the semantics are quite right, but it's close.
.txt file contains a line of the form
isBasedOn: URL then a 'src' link is made after the 'i' link
to the given URL with a
Here is a snippet from the foot of the desktop/canonical version of this page as of writing, with some whitespace added for readability:
<strong id=pgMedia>Page Media</strong>: <span itemprop=image itemscope itemtype=http://schema.org/ImageObject><meta itemprop=width content=1280><meta itemprop=height content=1192> <a href=img/tools-1280w.png itemprop=url>image</a> (<a href=img/tools-1280w.png.txt itemprop=discussionUrl>i</a>/ <a href=https://pixabay.com/en/tool-pliers-screwdriver-145375/ itemprop=isBasedOn>src</a>)</span>.
Last month I managed to squeak the head/CRP for a particular page under the limit to retain its Twitter video player card, etc.
This was in part through assuming that the embedded player video URL, eg
https://www.youtube.com/embed/BAP56HIPBY8, would not need quoting
when used as an attribute value. For this it must not contain spaces
nor quotes nor a '>' closing angle bracket.
At the time I could not be sure that the URL would never end in a '/' (slash).
If one did, it would not be safe to use unquoted in an attribute at the
end of an HTML tag ie
I rearranged the attributes so as to have the URL-containing one not last. But that inconsistency in attribute ordering reduces compressibility.
Today I added checks for raw and Twitter player URL safety, and put the
attributes back in the same order that I use elsewhere. The uncompressed form
of the page preamble/head/CRP is exactly the same size and semantic content,
gzip -8 and
zopfli output is slightly smaller.
The pre-compressed version is made with zopfli, but the CRP size is tested
gzip -8, and the desktop page threshold is currently 1260,
aiming to allow some meaningful body text into the first TCP frame sent,
after HTTP/1.1 headers.
|Version||Uncompressed bytes||gzip -8 bytes||zopfli bytes|