Earth Notes: RSS Podcast Efficiency

Updated 2024-04-17 08:14 GMT.
By Damon Hart-Davis ORCID logo.
The carbon cost of sloppy implementation for a service becoming fashionable again... #frugal #greenSoftware
RSS polling log sample

Abstract

Centralised social media systems are somewhat out of favour in 2024 for reasons from fake news and privacy to the actions of single billionaire owners. Federated and more decentralised systems such as Mastodon and the Fediverse, plain old email, and RSS feeds including podcasts, are cool again. With much of the workings being out of sight for ordinary users, and in a system designed before intermittent renewable power generation was a thing, podcasting and RSS in particular are unnecessarily wasting an appreciable portion of their bandwidth and CPU time, and adding to climate change. There are already several technical mechanisms that could help, but many participants are ignoring them. This paper suggests some simple sustainability improvements for various elements of the ecosystem that should be largely transparent to end users, including Cache-Control, conditional GET and skipHours.

Keywords

RSS, podcasting, efficiency, climate, skipHours, alternateEnclosure, Cache-Control

Introduction

IN PROGRESS


Working Notes

This describes work in progress.

Note that podping is not in scope for this work as it introduces a central service dependency and may simply hide poor behaviour further upstream.

2024-04: Size of the issue

For the EOU Web (off-grid, RPi) server hosting a mixture of static sites including EOU, over the 7 days from to 25,881,279,225 bytes (~26GB) (sum of column 11 in the logs) were served over 301,193 requests (eg GET and HEAD) ie log lines.

Filtering for requests for /rss/podcast.rss gives 134,263,853 bytes (~134MB, ~0.5%) over 8,618 requests (~2.9%).

The traffic to all of EOU in this interval is 8,927,622,485 (~9GB) over 115,247 requests, so /rss/podcast.rss is ~7.5% of EOU hits, ~1.5% of EOU bytes.

Note that this podcast RSS file does not contain the body text of articles nor audio/video content, only summaries and links. Some RSS feed files (not at EOU) contain the full text for their entries.

134MB per week or ~600MB per month (and ~7.5% of all EOU server requests) to check for new entries in the RSS feed, which emerge less than once per month on average, is excessive. And this feed has a very small number of readers, including only a very small number of direct clients polling, eg from browser RSS readers or mobile phone podcast players.

This represents a waste of CPU and bandwidth and thus energy for all participants. Battery life also for mobile clients. Given that the system is not run on entirely zero-carbon energy. this in turn will be hurting the climate.

Top five consumers of the /rss/podcast.rss feed file by total access count 2024-03-24 to 2024-04-01 (06:25Z), plus 'ALL' total.
CountBytesUser-Agent ("-" means none, ALL is total)
8618134263853ALL
276930608880"Amazon Music Podcast"
145839332327"iTMS"
6536895886"Podbean/FeedUpdate 2.1"
4378646182"-"
2542713382"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
Top five consumers of the /rss/podcast.rss feed file by total bytes 2024-03-24 to 2024-04-01 (06:25Z), plus 'ALL' total.
CountBytesUser-Agent ("-" means none, ALL is total)
8618134263853ALL
145839332327"iTMS"
276930608880"Amazon Music Podcast"
4378646182"-"
6536895886"Podbean/FeedUpdate 2.1"
1004235406"Podchaser (https://www.podchaser.com)"

iTMS appears to be overwhelmingly Apple (Apple also has an itms agent), with a handful of hits from a feed validator.

So Apple and Amazon are clearly dominant in terms of traffic, and probably no one wants to complain too much because of their dominance in the market.

The anonymous (no User-Agent) traffic bears examination too.

Podbean appears to make about one request a day from each of tens of instances located in data centres (ie there are not end-user podcast player requests).

Podchaser appears high in the by-bytes list because, like iTMS, it does not accept compression and thus uses ~8x more bandwidth per fetch than a client that does.

Note that for this interval requests are fairly evenly spread over 24h, with a little more traffic in UK day and evening.

Traffic from the /rss/podcast.rss feed file per hour UTC by total bytes 2024-03-24 to 2024-04-01 (06:25Z).
CountBytesHour UTC
303457323000
340574877701
328520370802
336566419303
354579270304
349671447705
330514402406
338519758307
331555175508
316476556309
345520556610
348534744011
435608455712
345526000413
393569926914
395593768115
370569035316
404703547817
437607830218
444673008319
340541532720
389586464721
335492061822
313463851523

Interactions with Technology Providers

Various providers of pieces of the technology puzzle (eg aggregators, mobile podcast app writers) were contacted to better understand behaviour of their systems, and possibly nudge them in a good direction.

Some of the interactions are summarised below.

More on interactions...

Email and other content has been edited to preserve confidentiality, etc, as appropriate:

Amazon

The Earth Notes Podcast RSS has been registered with Amazon Music for Podcasters. Amazon serves as an aggregator and catalogue.

On I sent Amazon (UK) podcasting an email containing:

...

May I ask why you are polling my podcast RSS feed every few minutes when it usually updates only every few weeks? Probably more than all other users combined...

(See a sample of the log below.)

Also the skipHours in the RSS and the 3h+ Cache-Control / Expires HTTP headers that I have set seem to be ignored, and there appears to be no attempt to use If-Modified-Since or If-None-Match.

What am I doing wrong?

...

RSS file start:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:atom="http://www.w3.org/2005/Atom” xmlns:podcast="https://podcastindex.org/namespace/1.0" xml:lang="en-gb">
<channel>
<atom:link href="https://www.earth.org.uk/rss/podcast.rss" rel="self" type="application/rss+xml"/>
<title>Earth Notes Podcast</title>
<description>All things green and efficient @Home in the UK, cutting carbon and improving comfort.</description>
<link>https://www.earth.org.uk/SECTION_podcast.html</link>
<language>en-gb</language>
<itunes:author>Earth Notes / Damon Hart-Davis</itunes:author>
<itunes:owner><itunes:email>d@hd.org</itunes:email></itunes:owner>
<itunes:image href="https://www.earth.org.uk/img/wordcloud/podcast-1.png"/>
<itunes:category text="Education"/>
<itunes:category text="Technology"/>
<itunes:explicit>no</itunes:explicit>
<podcast:location geo="geo:51.406696,-0.288789,16">16WW, Kingston-upon-Thames, UK</podcast:location>
<ttl>367</ttl>
<skipHours><hour>0</hour><hour>1</hour><hour>2</hour><hour>3</hour><hour>4</hour><hour>5</hour><hour>6</hour><hour>7</hour></skipHours>
<item><title>2024-01-28 Diarycast - Year In Review (2023)</title><description>The rollercoaster thrills and spills of 2023 at EOU Towers... #podcast #yearInReview</description><link>https://www.earth.org.uk/diarycast-20240128.html</link><guid isPermaLink="false">img/audio/diary/20240128.mp3</guid><enclosure url="https://www.earth.org.uk/img/audio/diary/20240128.mp3" length="4859775" type="audio/mpeg"/><pubDate>Sun, 28 Jan 2024 13:51:53 GMT</pubDate><itunes:duration>271</itunes:duration></item>

Log sample:

[12/Mar/2024:05:33:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:35:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:41:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:46:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:47:16 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:53:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:59:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:59:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:05:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:11:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:12:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:17:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:23:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:25:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:29:28 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:35:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:38:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:41:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:47:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:51:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:53:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:59:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:07:04:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8350 "-" "Amazon Music Podcast"
[12/Mar/2024:07:05:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8350 "-" "Amazon Music Podcast"
[12/Mar/2024:07:11:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8350 "-" "Amazon Music Podcast"

Note that the Amazon requests come in from a large variety of IP addresses, with those checked being from within the compute.amazonaws.com zone.

Throwing 429 (Too many requests) codes at Amazon slows it down about 10-fold.

After being prodded the US-Global support team replied :

...

Please note, this request goes beyond the scope of support our team offers, and therefore will take some time before we receive a response from the engineers.

...

Apple

The Earth Notes Podcast RSS has been registered with Apple's iTunes podcast catalogue. Apple serves as an aggregator and catalogue, and hosts the de facto canonical podcast catalogue.

On I contacted Apple via its Podcasts for Creators portal, including the following:

...

Is there any way that I can set your RSS fetcher to honour Cache-Control (or Expires) and/or SkipHours? Currently it does not seem to.

My server is off grid and I'd prefer polling to be minimised in the hours I include (23Z to 07Z).

Done right this could save a lot of bandwidth, CPU and carbon for you and the servers that you poll.

...

An initial response said that I've received confirmation from our internal teams that we do not provide any technical support for the implementation of your requested changes.

I responded with:

...

This could be added to the other simple technical fixes that Apple already implements to reduce carbon emissions from unnecessary CPU and bandwidth use.

I note that your agent polls very frequently and often does not even use compression, ie is not compliant with even basic de facto etiquette.

...

Some example Apple fetches, including uncompressed GETs:

[02/Apr/2024:19:01:14 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 3599 "-" "iTMS"
[02/Apr/2024:19:01:14 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 412 "-" "iTMS"
[02/Apr/2024:19:01:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 79283 "-" "iTMS"
[02/Apr/2024:19:16:36 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 3599 "-" "iTMS"
[02/Apr/2024:19:16:36 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 412 "-" "iTMS"
[02/Apr/2024:19:16:37 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 79283 "-" "iTMS"
[02/Apr/2024:19:32:53 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 3599 "-" "iTMS"
[02/Apr/2024:19:32:53 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 412 "-" "iTMS"
[02/Apr/2024:19:32:53 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 79283 "-" "iTMS"
[02/Apr/2024:19:51:44 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 3599 "-" "iTMS"
[02/Apr/2024:19:51:44 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 412 "-" "iTMS"
[02/Apr/2024:19:51:44 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 79283 "-" "iTMS"

On I was provided with links to Apple Podcasts feedback, Environment, and the contact email for environment report feedback.

AntennaPod

AntennaPod uses conditional fetches for the RSS feed file. When set with a 12h refresh interval log entries for the feed fetch are (noting underlying an feed file change before the 200 entry):

[01/Apr/2024:13:30:24 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11144 "-" "AntennaPod/3.2.0"
[02/Apr/2024:06:59:47 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 14443 "-" "AntennaPod/3.2.0"
[02/Apr/2024:19:07:46 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11266 "-" "AntennaPod/3.3.2"
[03/Apr/2024:08:12:00 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 3443 "-" "AntennaPod/3.3.2"
[03/Apr/2024:20:12:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 14445 "-" "AntennaPod/3.3.2"
[04/Apr/2024:08:14:20 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 280 "-" "AntennaPod/3.3.2"

Feeder

Feeder is an open-source feed reader and podcast player for Android mobile devices. I noticed its user agent in the Earth Notes logs.

I asked (by logging an 'idea'): Have you considered support for these (RSS-feed-specified SkipHours tag, and server-supplied HTTP expiry time/date) to reduce bandwidth and CPU?:

To which the author responded:

regular http cache-control is already supported.

what's skiphours?

I pointed the author at the skipHours definition in the RSS 2.0 spec [RAB2009RSS]. He noted that:

Regarding skipHours, any implementation would result in stochastic behavior for users. The feature was designed for servers which can pick when they sync, but Feeder is not not in control of when its background sync runs. This is determined by Android.

I added: A thought: maybe during skipHours you could avoid actually doing a poll when woken when there have been no non-skipHours since your last poll. The source is telling you that you will not (likely) have missed any change in that time.

The author noted in the exchange that in version 2.6.20 (of ) One quirk is that Feeder will revalidate the cache if last sync is older than 15 minutes. And in version 2.60.21 (of ) one of the fixes is Tweaked Cache-Control headers to respect site headers even more.

2.60.21

I gave 2.60.21 a sneaky test run and the log showed:

[03/Apr/2024:18:42:04 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:18:42:05 +0000] "GET /SECTION_podcast.html HTTP/2.0" 200 11259 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:18:43:23 +0000] "GET /img/wordcloud/podcast-1.png HTTP/2.0" 200 71167 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:18:43:30 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:18:51:32 +0000] "GET /img/site/podcast/20200523-Ambient-haiku.png HTTP/2.0" 200 90726 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:19:12:31 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[04/Apr/2024:06:38:51 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[04/Apr/2024:13:49:26 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[04/Apr/2024:18:40:12 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"

After loading the new app version, telling it the feed URL and messing around, then forcing a Sync feeds (), the feed was not reloaded until I picked the phone up at . Feeder is set to the default nominal 1h between refreshes of the feed. (Though Feeder then did unconditional fetches (200) which should probably have been 304 given that the feed file was unchanged since , and ideally deferred until after skipHours ie .) Good progress!

A set of ~hourly interactions for the 2.6.20 Feeder version by another user for a different feed during a period where it was unchanged:

[03/Apr/2024:17:17:28 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"
[03/Apr/2024:18:24:03 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"
[03/Apr/2024:19:25:25 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"
[03/Apr/2024:20:25:40 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"
[03/Apr/2024:21:25:44 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"

: I have seen what appears to be one other user upgrade to 2.6.21 and RSS polling traffic is tiny, even if still unconditional. So I am recommending Feeder to my podcast page visitors.

Three different clients (the last three hits are from the same client) all getting 304s since I also turned off ETag for RSS feed files (bad interaction with mod_deflated in Apache):

[15/Apr/2024:07:02:11 +0000] "GET /rss/saving-electricity.rss HTTP/2.0" 304 93 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[15/Apr/2024:07:02:29 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 304 93 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[15/Apr/2024:09:07:18 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 223 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[15/Apr/2024:09:07:18 +0000] "GET /rss/saving-electricity.rss HTTP/1.1" 304 223 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[15/Apr/2024:09:07:18 +0000] "GET /rss/note-on-site-technicals.rss HTTP/1.1" 304 223 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"

fyyd

The Earth Notes Podcast RSS has been registered with the fyyd directory. It seems to poll unconditionally for updates hourly: no 304 codes are returned even when the feed file is not changing.

I emailed a suggestion :

...

Would it be possible to support the RSS SkipHours tag in future, and/or respect the Cache-Control/Expires/ETag headers from the fetch?

...

TuneIn

TuneIn (RSS feed fetcher user agent TuneIn-Podcast-Checker) hosts a podcast directory.

It seems to poll faster than hourly, not respecting HTTP cache control or RSS skipHours.

[03/Apr/2024:04:59:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:05:52:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:06:25:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:07:02:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"

It seems to be hosted on AWS (Amazon Web Services).

I used the contact form to ask:

...

RSS feed polling excessively

...

Is there any way that I can get your RSS fetcher to honour Cache-Control (or Expires) and/or SkipHours? Currently it does not seem to, and is polling far more often than makes sense. I am concerned about climate impact.

...

After several miscommunications, including attempting to create me an account, I sent further explanation:

...

I am referring to how often you poll my RSS feed at https://www.earth.org.uk/rss/podcast.rss

It updates with new content about monthly.

You poll it about every 30 minutes, and don’t seem to pay any attention to Cache-Control, Expires, Last-Modified or ETag, nor the skipHours tag (or other update-hint tags) in the RSS feed itself, eg:

[03/Apr/2024:02:23:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:03:00:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:03:33:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:04:26:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:04:59:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:05:52:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:06:25:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:07:02:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:07:35:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:08:28:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:09:01:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:09:54:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:10:27:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:11:04:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:11:37:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:12:30:11 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:13:03:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:13:57:07 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:14:29:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:15:06:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:15:39:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11117 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:16:32:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11117 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:17:05:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11117 "-" "TuneIn-Podcast-Checker"

You are wasting a tremendous amount of your CPU time and bandwidth and feed providers’ (such as me), with an accompanying hit on all our bills and climate emissions. I only have a small off-grid server which is not updating the feed overnight for example.

Is there anything we can do to make this better?

I note that some of the other services polling the same feed are making use of at least some of those fields and hints.

...

Less-than-monthly according to Listen Notes Update frequency: every 52 days Average audio length: 9 minutes .

I received a response offering to offer to extend the polling interval on my feed from 4h to 40h because ... we've found that the headings are unfortunately not reliable across our directory so our system doesn’t take a look at them.

I accepted the increase to 40h, but asked:

...

But what do you mean by "headings are unfortunately not reliable across our directory”? HTTP cache control headers are very basic, and if you don’t trust them entirely, you can limit whatever cache life you see to (say) 1 day or even 12h, vastly reducing pointless polling traffic (and climate emissions) for many (slow) feeds.

...

... and my ticket was closed!

I asked anyway: When can I expect to see the change to 40h polling? It is still more than hourly: see the log fragment below.

[07/Apr/2024:06:40:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:07:12:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:08:06:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:08:23:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:09:16:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:09:49:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:10:42:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:11:14:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:12:08:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:12:25:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:13:18:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11610 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:13:18:11 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11591 "-" "TuneInRssParser/1.0"
[07/Apr/2024:13:51:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11610 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:13:51:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11591 "-" "TuneInRssParser/1.0"
[07/Apr/2024:14:44:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11610 "-" "TuneIn-Podcast-Checker”

Thinsg are looking a little better. Note that each IP address (other than for TuneInRssParser) is unique in this log fragment:

[08/Apr/2024:20:55:12 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11715 "-" "TuneIn-Podcast-Checker"
[08/Apr/2024:20:55:17 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11696 "-" "TuneInRssParser/1.0"
[09/Apr/2024:14:38:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11965 "-" "TuneIn-Podcast-Checker"
[09/Apr/2024:14:38:28 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11946 "-" "TuneInRssParser/1.0"
[09/Apr/2024:21:20:12 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11965 "-" "TuneIn-Podcast-Checker"
[09/Apr/2024:21:20:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11946 "-" "TuneInRssParser/1.0"

Still not quite one poll every 40h (more like 6h!), but much better anyhow! I hope that TuneIn also at least thought about how wastefully it is polling everyone else too...

Hints Dropped

In order to give remote entities polling the RSS feed file as much chance as possible to avoid polling when it is pointless, wasting CPU and bandwidth, I provide a suite of hints, at least some of which any poller could act on.

I also provide alternateEnclosure items, alternatives alongside the default MP3 (audio) or MP4 (video) file, that allow users to download much smaller versions if they wish, to save more bandwidth, data-charges, CPU, etc. I have not seen evidence of any client using (or able to use) these.

More on hints...

2024-04-03: snapshot

In the RSS file itself are the following lines in the channel part:

<pubDate>Wed, 03 Apr 2024 12:58:31 GMT</pubDate>
<ttl>1507</ttl>
<skipHours><hour>0</hour><hour>1</hour><hour>2</hour><hour>3</hour><hour>4</hour><hour>5</hour><hour>6</hour><hour>7</hour><hour>22</hour><hour>23</hour></skipHours>
<sy:updatePeriod>monthly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<podcast:updateFrequency rrule="FREQ=MONTHLY">monthly</podcast:updateFrequency>

This says that updates are expected roughly monthly and that updating once in that interval is OK, and that this feed has a TTL (time to live) of ~25h, ie can be cached that long, and that updates will generally not be happening from 22:00Z to 07:00Z so please do not poll then at all.

(Possibly the TTL should be higher, up to a month...)

In the HTTP response headers for the feed file are the following relevant lines:

Date: Wed, 03 Apr 2024 18:15:19 GMT
Last-Modified: Wed, 03 Apr 2024 15:34:48 GMT
ETag: "133ff-61532f67e0edd"
Cache-Control: max-age=14820
Expires: Wed, 03 Apr 2024 22:22:19 GMT

The Last-Modified: allows an If-Modified-Since conditional fetch. The ETag allows an If-None-Match conditional fetch. So if a conditional fetch is used and the feed file has not changed, then only a very small 304 status response is sent.

The Cache-Control: max-age and Expires are pushed out from this daytime poll's 4h7 to 7h7 during skipHours. Paying attention to either header would push polling frequency well below the typical default ~1h. If a conditional fetch is done, only a slow string of tiny 304s should happen almost all the time, and not even that in skipHours ideally!

Also I defer any rebuilding of the rss/podcast.rss file during skipHours, or the GB grid has high carbon intensity, or the local battery is low. This should help reduce GB-grid-powered network traffic at these times.

References

(Count: 7)

~3026 words.