Earth Notes: On Website Technicals (2024-03)

Updated 2024-04-16 19:22 GMT.
By Damon Hart-Davis.
Tech updates: METERCHANGE, micro-optimisation, podcast WebVTT transcript, CORS, Opus 16kbps, 429, time cues, mp3L, auto dark mode, lite-only text.
tools
Hoping for a quiet month, maybe getting back to some must. Also, and related, extracting more from the data that I am already collecting. I did not anticipate the Opus-audio-with-everything (even video) fest! I am also gently trying to get everyone from Apple downwards to use RSS efficiently and think of the climate; not a small side project...

2024-03-31: Mobile-only View

I have added a new dmob desktop-only CSS class that hides content on wider-than mobile screens.

@media screen and (min-width:640px) {
...
    /* Mobile-only (hidden on wider screens). */
    .dmob{display:none}
    }

I have done this so that I can drop in a hint just above (the first) audio or video player on a page for mobile (narrow-screen) users that are on the desktop full-fat pages that they may wish to switch to 'lite':

(Slow or expensive connection? Switch to the mobile/lite view.)

This dmob-class paragraph should disappear in full/desktop (non-mobile, non-offline) site view for wide-ish screens/viewports.

This is the first update to the desktop pages CSS in over three years.

2024-03-29: & of Doom

Apparently the world ends if an HTML &name; entity code is allowed into an RSS title tag. Such as in Repair Café ... for example.

So I have made an evil hack. Any entity in that case is rewritten from &Xyz; to X, which is often the correct unaccented form. It will get me by for the moment...

I have done the same for titles in sitemap.atom and other Atom files.

Auto dark mode for dashboard and intensity page

I inserted the magic in-line CSS @media (prefers-color-scheme:dark){body{background-color:#000;color:#eee}img{filter:brightness(.9)}} for those two pages.

2024-03-27: mp3L

I am happy with my automatic ~16kbps Opus audio conversion. The ffmpeg flags are:

-codec:a libopus -ac 1 -b:a 16k -f opus

For the lo-fi mono .mp3L auto-generation I am trying lowest VBR bit-rate, mono, 10kHz low-pass, best (and slowest) compression:

-codec:a libmp3lame -qscale:a 9 -ac 1 -cutoff 10000 -compression_level 0 -f mp3

At least on my MacBook Air with a very recent ffmpeg the results sound acceptable and have a reasonable size on my first test, with a 3-second video source:

3124848 img/video/Welcome-1.mp4
  16174 img/a/a/Welcome-1.l3124848.48k.mp4.mp3L
   7551 img/video/Welcome-1.opusL

The file utility on the MBA reports for the .mp3L:

Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v1, 64 kbps, 48 kHz, Monaural

whereas a higher nominal bit rate (128kbps vs 64kbps) is claimed for one of the supposedly-equivalent .mp3L files generated by Audacity:

Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 128 kbps, 48 kHz, Monaural

I adjusted the RSS podcast generator to request auto-generation of .mp3L, but other than one edge case (now fixed, where I had checked in a smaller .mp3 than could be auto-generated as .mp3L) that does not change the RSS feed file/content.

Extending the AUDIO tag to auto-generate .mp3Ls where needed produced ~100 files.

2024-03-26: Low-carbon

On the GB Grid Intensity page I have renamed the zero-carbon fuels total to low-carbon because, as overall grid intensity falls, the difference stops being a rounding error!

OPPP

I am taking the stats service from Open Podcast Prefix Project for a quick spin to see its public stats.

2024-03-28: episode 60

I published the 60th podcast episode this afternoon. It will be interesting to see if human listeners show up in the stats at all.

(For this episode, for the first time, the .mp3L and .opusL lower-fi, lower-bandwidth audio variants were auto-generated.)

2024-03-25: Auto-generating Opus

The RSS feed now should auto-generate a very-low bandwidth Opus audio file from the primary enclosure (or a lossless version if available) if one is not checked in.

This is done by a shiny new script script/audioBuildLossy.sh which borrows a lot of machinery from the lo-fi / hero image generation scheme.

I think the slightly older ffmpeg on sencha the EOU server may make slightly larger and/or less good Opus files than on my MacBook Air, and so the checked-in versions are not redundant. I can check in a hand-crafted .opusL at any time for best results, and it should be used in preference next time the RSS feed is rebuilt.

This can be extended to make the .mp3L in future, and then even the nominal primary .mp3 from a lossless master.

I have also taken the opportunity to add the lossless FLAC, if present, as an alternateEnclosure in its own right. It will not exist for video episodes. I may regret this (and undo it) if lots of people listen to the FLAC!

I have added the same facility to the AUDIO tag: a .opusL will be generated if one is not checked-in. So suddenly a lot of audio files will now have a smaller version to download, and that smaller one becomes the default for 'lite' pages.

Auto-generated .opusLs for VIDEO tags is done too. That works somewhat differently inside when the source is the Gallery...

Next up will be .mp3Ls for RSS and AUDIO. That will require significant logic rejigging, and some testing that the achieved output (at ~48kbps nominal) is acceptable.

(Then will come .mp3 generation (at ~144kbps, from FLAC), with still more logic change and ear-based testing!)

2024-03-24: RSS Feed Files Update Less

I have adjusted the RSS feed files to be updated only when one of their HTML page source files does.

That also implies removing the channel lastBuildDate, since it would simply change every time.

In its place goes a channel pubDate with the timestamp of the newest HTML page source file.

The makefile has been updated to have no direct dependencies on the .rss files, but instead on .rss.built files touched whenever a rebuild is attempted even if the .rss does not change.

In part this is to give smart RSS readers a clue to poll less often, and to received more 304 Not Modified responses when they do.

skipHours

I currently have skipHours from 22h to 07h (UTC) inclusive, to cover times without incoming solar power for my server, and when I am likely to be asleep or at least less likely to be updating EOU.

It occurs to me that it would also be worth skipping 4pm to 7pm local time to avoid peak grid demand hours, typically also high carbon-intensity, at least towards this end of the connection even if some visitors are not in the UK.

Allowing for winter and summer time that could be another block of skipHours from 15h to 18h (UTC) inclusive. That would bring us to a total of 14 skipped hours.

Not that anything is paying attention to them at all so far...

: there were 904 HTTP log entries (eg GET or HEAD) for /rss/podcast.rss.

2024-03-23: Time Cues in Transcripts

In order to make transcripts slightly easier to absorb, I am making time cues just a little less salient, but still accessible. I have slightly reduced opacity from a default:

[00:16]

to an opacity of 0.6:

[00:16]

It is subtle, but I think that it helps.

For visitors from the far future, this is the current cuetime styling in case changed:

[00:16]

For now I have put this in its own CSS file, explicitly imported by the few (~10) files that use the cuetime style.

Bibliography haircut

With a view to making the 'lite' bibliography smaller, and being able to produce less huge desktop bibliography pages too, I have started chopping out less-important data and metadata from the generated HTML.

from:

498811 bibliography.html
482031 m/bibliography.html
 65001 bibliography.htmlgz
 59569 m/bibliography.htmlgz
 50099 bibliography.htmlbr
 45788 m/bibliography.htmlbr

to:

498788 bibliography.html
259093 m/bibliography.html
 64977 bibliography.htmlgz
 50124 bibliography.htmlbr
 47161 m/bibliography.htmlgz
 38086 m/bibliography.htmlbr

2024-03-07: Podcast Feeds and Transcripts, Opus Audio

Apple now supports podcast transcripts. Apparently transcripts are generated automatically by default, by Apple, but a podcast:transcript tag can point to a pre-made transcript. This can be in .vtt (WebVTT) format, eg as I have already provided for the OpenTRV movie mashup video episode at img/video/OpenTRV/OpenTRV-mashup-1.mp4.vtt. Where such a file exists, it is now being added this way to the RSS feed file.

A new namespace has been added to the RSS file to allow this tag: xmlns:podcast="https://podcastindex.org/namespace/1.0".

An HTML-ish format is also allowed, though the use of the time tag is not compatible with normal HTML5. Still, for where that is not a conflict, and as a little preparation, I have now added an ID mo-1-transcript to the episode's HTML page tag containing the transcript text, so it may be possible to link to that.

Adding .vtt transcripts where there was no full text transcript should help search engines. The cue time points should also help with accessibility (a11y) and usability generally.

It may be worth running a low-fidelity transcription just to get timing points for the HTML transcript.

2024-03-08: Transcobble and a11y

Using Transcobble local transcription in the browser, and a hacked together awk script, I have now added significant transcripts both as .vtt files (auto-linked into the RSS feed and the HTML pages now), but also in the body of the pages concerned. Hurrah!

Five podcasts now have .vtt captions files.

I should possibly generate these timestamped .vtt transcripts for all new episodes and some extant ones, to help accessiblity, even where I have the words already, eg when I read from a script. I could copy over some of the key timestamps from .vtt to the HTML also. AI making things better!

Steno.fm displays transcripts with highlighted sections as one listens...

That is a few years-old to-do items crossed off!

Observations after completing the transcripts for all 60 extant episodes (yesterday, ).

  • A common failure mode of Transcobble / Whisper is to without indication omit a chunk of text entirely, from a few words to a few sentences.
  • It can also get 'stuck' after some interesting non-speech sounds.
  • It is not entirely consistent, eg sometimes transcribing the very same lossless audio clip as Earthnotes or Earth Notes.

2024-03-09: CORS

Testing https://www.earth.org.uk/rss/podcast.rss with the CORS Tester says that This URL will not work correctly with CORS.

Apparently I should add the header access-control-allow-origin with value * for at least that RSS file, so hereby new configuration:

<Location /rss>
    # Give podcast RSS and similar feed files an expiry time of 1h.
    ExpiresDefault "access plus 1 hour"
    # Allow CORS to work.
    Header set access-control-allow-origin *
</Location>

And now This URL will work correctly with CORS. Hurrah!

(CORS issues may explain why I have not been able to see captions when viewing videos in my pages in the filesystem.)

All transcript .vtt files need the CORS treatment too.

I have also added podcast:location to the RSS feed file.

I am also updating to allow text/vtt / .vtt files to be automatically offered DEFLATEd / GZIPped.

As of this evening both video episodes and five audio have WebVTT transcripts.

2024-03-11: extending expiry time overnight

Since most of the feed files (eg the podcast RSS) will only update when I am at the keyboard updating something, it should be possible to set a longer expiry time for them at night, eg:

    <If "%{TIME_HOUR} -lt 7 || %{TIME_HOUR} -gt 21">
        # Give podcast RSS and similar feed longer expiry out of work hours.
        ExpiresDefault "access plus 3 hour 7 minutes"
    </If>
    <Else>
        # Give podcast RSS and similar feed files an expiry time of 1h.
        ExpiresDefault "access plus 1 hour 7 minutes"
    </Else>

This seems to work!

This may reduce futile polling by more sophisticated clients, and save a little energy and bandwidth. (I do not think that my Firefox "Brief" RSS plugin will take any notice.)

I could also force longer cacheing when system power status is LOW.

I am also adding a ttl (maximum time to live in a client cache), in minutes) of 367 (ie 6h7) to the RSS file, to see what difference that makes, if any!

Amazon was polling every ~3 minutes after a ttl of 127 originally went in... (At least Amazon seems to be pulling it DEFLATEd/GZIPped.)

...
18.246.X.X - - [11/Mar/2024:17:17:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast"
34.210.X.X - - [11/Mar/2024:17:23:16 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast"
18.232.X.X - - [11/Mar/2024:17:25:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast"
54.214.X.X - - [11/Mar/2024:17:29:12 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast"
34.217.X.X - - [11/Mar/2024:17:35:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast"
107.21.X.X - - [11/Mar/2024:17:38:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast"
52.12.X.X - - [11/Mar/2024:17:41:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast"
34.222.X.X - - [11/Mar/2024:17:47:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast"
...

I have used 2h7 to be distinct from the default RSS and ExpiresDefault 1h poll/expiry, and the ExpiresDefault values above, and all the values are prime-ish to avoid clashes with other activity.

I can also try (hat-tip) the skipDays and skipHours tags, the latter being more directly relevant for a solar-powered system!

I am initially adding skipHours for 00h to 07h inclusive, as likely to be quiet (no updates) and off-grid battery relatively low.

None of which has slowed down Amazon, it seems...

34.236.X.X - - [12/Mar/2024:06:25:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
34.222.X.X - - [12/Mar/2024:06:29:28 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
35.166.X.X - - [12/Mar/2024:06:35:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
44.211.X.X - - [12/Mar/2024:06:38:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
35.162.X.X - - [12/Mar/2024:06:41:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
35.92.X.X - - [12/Mar/2024:06:47:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
3.236.X.X - - [12/Mar/2024:06:51:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
35.87.X.X - - [12/Mar/2024:06:53:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
44.242.X.X - - [12/Mar/2024:06:59:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
...

2024-03-12: alternateEnclosure

I have sent an email to Amazon, and a toot to OverCast, asking why no use is made of (eg) If-Modified-Since and skipHours to reduce bandwidth and (carbon) footprint. I may yet write to Apple which seems equally obvious to these signals.

2023-03-21: done: to Apple: Is there any way that I can set your RSS fetcher to honour Cache-Control (or Expires) and/or SkipHours? Currently it does not seem to.

2024-03-28: response from Apple: ... we do not provide any technical support for the implementation of your requested changes.

So I asked for contact details of their climate change director.

Meanwhile, I have added podcast:alternateEnclosure to my podcast RSS to list the ("Low bandwidth") 'L' version where available, which may help some end users save their bandwidth!

Note the suggested single line of code to produce a tiny 16kbps Opus (audio/opus) [valin2013high] version:

ffmpeg -y -i input.wav -c:a libopus -ac 1 -b:a 16k output.opus

On a couple of sample files it sounds acceptable, and I have captured one, so I am now providing it in general in the RSS and via standard AUDIO and VIDEO tag links also.

To support this, I have generated .opusL files for all podcast episode audio and video, and checked them in.

That gives another useful 3x reduction step in file size / bandwidth, eg:

17290019 img/audio/diary/20240128.flac
 4859775 img/audio/diary/20240128.mp3
 1515255 img/audio/diary/20240128.mp3L
  562889 img/audio/diary/20240128.opusL

A ~2kB/s (nominal 16kb/s) Opus file is ~1% of the size of the nominal 48ksps 16-bit stereo uncompressed (eg WAV) (~192kB/s) file that it encodes.

2024-03-13: ByteDance

And the first spider to pull down a .opusL file is ... TikTok / ByteDance! Then Yandex and YaCy...

[13/Mar/2024:01:02:47 +0000] "GET /img/audio/meta2/meta2.opusL HTTP/2.0" 200 557403 "-" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)"
...
[13/Mar/2024:14:39:33 +0000] "GET /img/video/20201112/20201112-EcoHomeLab-talk-on-smart-thermostatic-radiator-valves-TRVs.opusL HTTP/1.1" 200 2121570 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
[13/Mar/2024:14:43:32 +0000] "GET /img/video/OpenTRV/OpenTRV-mashup-1.opusL HTTP/2.0" 200 190720 "-" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)"
[13/Mar/2024:14:44:44 +0000] "GET /img/audio/statscast/statscast-202004.opusL HTTP/2.0" 200 1332099 "-" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)"
[13/Mar/2024:14:48:35 +0000] "GET /img/audio/mkaudio/battery-sounds.opusL HTTP/2.0" 200 322273 "-" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)"
[13/Mar/2024:14:58:47 +0000] "GET /img/audio/statscast/statscast-202005.opusL HTTP/1.1" 200 1355681 "https://www.earth.org.uk/statscast-202005.html" "yacybot (/global; amd64 Windows 10 10.0; java 1.8.0_401; America/en) http://yacy.net/bot.html"

Bots/spiders including Google have made an appearance by the end of the day, but no actual humans/browsers other than me so far as I can see.

(2024-03-19: there is some slight evidence in the logs of a human user of one of the .opusL files today, via an explicit download link, for OpenTRV-mashup-1.opusL; hurrah!)

2024-03-14: lite Opus

The savings from Opus are so good, and the fidelity still good, that I am making it the default for non-desktop (eg mobile/lite) AUDIO when available.

CanIUse reports ~97% browser support.

To allow the Firefox and Chrome audio tag to play these Opus files I have had to change the declared MIME type to be audio/ogg and indeed more specifically I have used audio/ogg;codecs=opus. (They are Opus in an Ogg container.)

There's now often nearly 1:100 size ratio from lowest-fi .opusL up to lossless .flac. The order etc of the downloads after AUDIO and VIDEO tags should be tidied up to be clearer and in order and help users pick the right one. Redo to be smallest to largest, and with an indicator (eg meter) of how big they are with lowest-fi 1/5 to highest 5/5, eg a logarithmic view. The default for the current view could be highlighted (eg bold).

As of ~1pm I have something for video that on a desktop page looks like:

1014s "20201112 EcoHomeLab talk on smart thermostatic radiator valves TRVs [VIDEO]" (poster) (captions) Uploaded . Downloads:

As of ~4pm and with the 'standard' object size normalised to half scale, and the scale linearised:

106s "OpenTRV mashup [VIDEO]" (poster) (captions) Uploaded . Downloads:

I have also written a slightly more stern rebuff for badly-behaved RSS fetchers:

    <If "%{TIME_HOUR} -lt 8 || %{TIME_HOUR} -gt 21">
        # Give podcast RSS and similar feeds longer expiry out of work hours.
        ExpiresDefault "access plus 3 hours 7 minutes"
        # For RSS files (which will have skipHours from 0 to 7 inclusive),
        # if there is no Referer and no conditional fetching, back off!
        RewriteCond %{HTTP_REFERER} ^$
        RewriteCond %{http:if-modified-since} ^$
        RewriteCond %{http:if-none-match} ^$
        RewriteRule "^/rss/.*\.rss$" - [L,R=429]
    </If>

In the wee hours of the morning when generally the feeds do not update, and for which some of the feeds also have explicit skipHours, then no Referer and no attempt to do a conditional fetch (ie only getting the feed file if it has actually changed), will result in a 429 status code: Too Many Requests.

2024-03-15: tweaked configuration

That did not work: another go!

<Location /rss>
    # Allow CORS to work.
    Header set access-control-allow-origin *
    <If "%{TIME_HOUR} -lt 8 || %{TIME_HOUR} -gt 21">
        # Give podcast RSS and similar feeds longer expiry out of work hours.
        ExpiresDefault "access plus 3 hours 7 minutes"
        # For RSS files (which will have skipHours matching the above),
        # if there is no Referer and no conditional fetching, back off!
        RewriteCond %{HTTP_REFERER} ^$
        RewriteCond %{HTTP:If-Modified-Since} ^$
        RewriteCond %{HTTP:If-None-Match} ^$
        RewriteRule "\.rss$" - [L,R=429]
    </If>
    <Else>
        # Give podcast RSS and similar feeds an expiry time of 1h.
        ExpiresDefault "access plus 1 hour 7 minutes"
    </Else>
</Location>

I have seen no sign of a human using Opus audio from the RSS podcast feed yet. I have changed the declared MIME type in the RSS feed of the Opus files to audio/ogg in line with the change made for Firefox and Chrome to be able to play them in VIDEO and AUDIO tags.

Oops: it seems that Location and rewrite rules do not play nicely: Although rewrite rules are syntactically permitted in <Location> and <Files> sections (including their regular expression counterparts), this should never be necessary and is unsupported.

It seems to work for now, but maybe I will have to move the rewrites out of the Location block and adjust the RewriteRule to start with /rss?

With all that in mind, some new Apache config:

# Allow CORS to work for RSS feeds and transcripts.
# This allows browsers to access them from non-EOU pages.
<IfModule mod_headers.c>
  <FilesMatch "\.(rss|vtt)$">
    Header set access-control-allow-origin *
  </FilesMatch>
</IfModule>
<If "%{TIME_HOUR} -lt 8 || %{TIME_HOUR} -gt 21">
    # Give podcast RSS and similar feeds longer expiry out of work hours.
    ExpiresByType application/rss+xml "access plus 7 hours 7 minutes"
    # For RSS files (which will have skipHours matching the above),
    # if there is no Referer and no conditional fetching, back off
    # when battery is low.
    RewriteCond %{HTTP_REFERER} ^$
    RewriteCond %{HTTP:If-Modified-Since} ^$
    RewriteCond %{HTTP:If-None-Match} ^$
    RewriteCond /run/EXTERNAL_BATTERY_LOW.flag -f
    RewriteRule "^/rss/.*\.rss$" - [L,R=429,E=RSS_RATE_LIMIT:1]
    Header always set Retry-After "25620" env=RSS_RATE_LIMIT
</If>
<Else>
    # Give podcast RSS and similar feeds an expiry time of 1h.
    ExpiresByType application/rss+xml "access plus 4 hours 7 minutes"
</Else>

2024-03-16: Amazon slower

To avoid conflict between the 10 skipHours (22h to 07h inclusive) and the ttl, I have pushed the ttl to 1507, ie ~25h. (Not exactly 24h, so as to help spread load around the day over time.)

When being knocked back with a 429 status, Amazon reduces polling to about once every 30 minutes rather than 3, so a ~10-fold reduction. It would just be better if the cache control (etc) was followed and If-Modified-Since was used properly. Other clients apparently do.

52.38.X.X - - [15/Mar/2024:22:04:20 +0000] "GET /rss/podcast.rss HTTP/1.1" 429 436 "-" "Amazon Music Podcast"
44.200.X.X - - [15/Mar/2024:22:11:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 429 436 "-" "Amazon Music Podcast"
35.88.X.X - - [15/Mar/2024:23:05:12 +0000] "GET /rss/podcast.rss HTTP/1.1" 429 436 "-" "Amazon Music Podcast"
54.167.X.X - - [15/Mar/2024:23:19:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 429 436 "-" "Amazon Music Podcast"

In that 2h slot at the end of yesterday there were apparently 332 RSS fetches, vs 383 in the previous 2h, and 730 for 10:00 and 11:00. 5133 for the whole day.

Way more than really makes sense given that I suspect I have few RSS followers and essentially no RSS podcast listeners.

I could to add a Retry-After: 11220 or similar header to the 429 response, maybe converting the RewriteRule to:

RewriteRule "\.rss$" - [L,R=429,E=RATE_LIMIT:1]
Header always set Retry-After "11220" env=RATE_LIMIT

Which results in:

% wget -S -O /dev/null https://www.earth.org.uk/rss/podcast.rss
--2024-03-16 13:55:58--  https://www.earth.org.uk/rss/podcast.rss
Resolving www.earth.org.uk (www.earth.org.uk)... 79.135.97.78
Connecting to www.earth.org.uk (www.earth.org.uk)|79.135.97.78|:443... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 429 Too Many Requests
  Date: Sat, 16 Mar 2024 13:55:58 GMT
  Server: Apache
  Retry-After: 11220
  Content-Length: 227
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html; charset=iso-8859-1
2024-03-16 13:55:58 ERROR 429: Too Many Requests.

A normal response looks like (not showing the body):

HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Sat, 16 Mar 2024 14:03:10 GMT
  Server: Apache
  Upgrade: h2
  Connection: Upgrade, Keep-Alive
  Last-Modified: Sat, 16 Mar 2024 13:34:51 GMT
  ETag: "ed04-613c7305e7ab1"
  Accept-Ranges: bytes
  Content-Length: 60676
  Vary: Accept-Encoding,Referer
  Cache-Control: max-age=4020
  Expires: Sat, 16 Mar 2024 15:10:10 GMT
  X-Frame-Options: DENY
  access-control-allow-origin: *
  Keep-Alive: timeout=5, max=100
  Content-Type: application/rss+xml
Length: 60676 (59K) [application/rss+xml]

I am also adding these top-level fields from the http://purl.org/rss/1.0/modules/syndication/ namespace:

<sy:updatePeriod>monthly
<sy:updateFrequency>1

And:

<podcast:updateFrequency rrule="FREQ=MONTHLY">Monthly</podcast:updateFrequency>

Given that at least one RSS client claims to obey HTTP cacheing headers but seems not to use If-Modified-Since (or If-None-Match), I am increasing the 'normal' expiry time to 4h7 and the during-skipHours expiry time to 7h7 (the latter being more than half the skipHours blackout period of 10h).

2024-01-17: better audio download list and MIDI

I have now updated the way that audio downloads are listed (under the audio player widget) to approximately match what I did with the video player:

219s "about 16WW" Uploaded . Downloads:

For a couple of the podcast 'music' episodes in particular where I have a MIDI 'source' file available, that now appears in the download list.

I have also, in the RSS part of the EOU Apache site configuration, inserted:

RewriteCond /run/EXTERNAL_BATTERY_LOW.flag -f

just above:

RewriteRule "\.rss$" - [L,R=429,E=RSS_RATE_LIMIT:1]
Header always set Retry-After "11220" env=RSS_RATE_LIMIT

so no 429s are sent unless the battery is LOW. All the other hints and entreaties continue to be sent!

2024-03-28: SpaceCowboys

I received a meaningful response from the "Feeder" Android RSS reader author to my suggestion Have you considered support for these (RSS-feed-specified SkipHours tag, and server-supplied HTTP expiry time/date) to reduce bandwidth and CPU?:

regular http cache-control is already supported.

what's skiphours?

I pointed him at the skipHours definition in the spec. He noted that:

Regarding skipHours, any implementation would result in stochastic behavior for users. The feature was designed for servers which can pick when they sync, but Feeder is not not in control of when its background sync runs. This is determined by Android.

A thought: maybe during skipHours you could avoid actually doing a poll when woken when there have been no non-skipHours since your last poll. The source is telling you that you will not (likely) have missed any change in that time.

2024-03-05: HTML Micro-optimisation

In order to make pages work properly on the m-dot domain, www domain, and locally in the filesystem (and offline), and the http: and https: online variants, and have the unprocessed source HTML be valid and usable, I have been replacing a prefix for non-top-level-page objects such as data/... in the source of //WWW.earth.org.uk/ with eg //www.earth.org.uk/ for m-dot pages. For desktop pages, so as to work off-line that replacement has been ./ so as to keep paths relative.

That waste of two bytes in most cases has been an annoyance. It has to be there for syntactic correctness when there is nothing after the prefix, eg a href=//WWW.earth.org.uk/ becomes a href=./ for desktop and a href=//www.earth.org.uk/ for m-dot.

I have now tweaked things that when there is something starting with [0-9a-zA-Z_] after the prefix then for desktop pages the prefix can be removed entirely. A tiny 'minification'!

(This whole thing applies to //STATIC.earth.org.uk/img/... URLs too!)

I will have forgotten some subtle constraint I am sure, but the main HTML validates and looks OK...

2024-03-04: METERCHANGE

I am hoping to extract more from the data that I already have, such as allowing data analysis to be able to work through meter changes, eg extend current analyses back further.

Also, if we get a heat pump I would at least like the ability to cope gracefully with removal of a gas meter entirely.

To this end I have added added a new METERCHANGE data record to my main 'weekly'(ish) data set.

First I will apply the notion to my 'yearly' data file and deltas. For every non-empty meter field it is an adjustment to accumulate. It is the final (old) value from the old meter minus the start (new, often near-zero) value for the new meter. The accumulated adjustments should be applied to each subsequent reading to make a new continuous 'virtual' reading. (An all zeroes METERCHANGE record has no effect, and an empty adjustment field is equivalent to a zero field.)

My excitement is tempered from realising that the electricity import meters were unratcheted until (thus running backwards for exports, though did directly reflect net flow), and the gas meter being (100) cubic feet until .

Any new meters are at least likely to be in the same units (kWh and m^3), and will never run backwards!

I can reasonably push the 'weekly' data back to when the electricity import/export pair was installed, I have gas (m^3) and the first generation meter values for then.

References

(Count: 3)

~4353 words.