Earth Notes: On Website Technicals (2021-01)

Updated 2021-01-14 08:56 GMT.
Tech updates: new year data capture, min.js, hosting, soft params, profile opt, hot pages.
tools
New year, new lockdown: major works not planned for this month, but tweaking and brainwaves happen...

2021-01-11: Distribution of Page Hits

I took a look at page hits, apparently by humans not bots, and not by me. The snaphot I looked at has a roughly 9-day window. Nearly 50% of hits are amongst the top 10 pages. Over 50% of hits are amongst the top 15 pages. Most pages get no hits at all in this time window.

# Pages% hitsComment
10 49The top-ten pages get nearly half the visits/hits.
15 56
20 63
140100All main pages with at least one hit.
298100Includes some pages not counted in stats above: ls *.html | wc -l

2021-01-10: Profile-driven Optimisation

Given that I have logs and know which pages and images are the most downloaded, I could make extra effort to, for example, (re)compress them or updated versions of them. Maybe use an extra -m option with zopflipng or more iterations in zopfli, or notch down the 'quality' slider one place for JPEG images for example.

It may also be possible to selectively omit some less-important content from popular pages, and spend more time chosing which related pages to link to.

It may also be sensible to treat warnings as errors on key pages and images, to encourage tuning them for optimal behaviour. I have set the desktop page build to do this initially, for top-ranked pages. This uses the last-archived popularity data if live data is not available, eg when working off-line.

This would reduce bandwidth for the site and for its clients, and possibly speed up page rendering and improve user experience a little, focussing CPU effort where it may gain most results.

Not that EOU is exactly swamped with requests, but still...

Delete Facebook

Given recent events I'm even less keen than I was on Facebook and its tentacles. I have removed the WhatsApp social media button from the AMP site (when its pages are next rebuilt). In due course I will create a slimmed-down Share42 set of buttons, minus Facebook, for the lite and desktop sites.

I don't think that I was getting any significant traffic via those buttons in any case.

2021-01-09: Site Soft Parameters

I've added a set of 'soft parameters' to the site build process that don't change the visible logical content of pages. They can be changed easily without forcing page rebuilds.

These parameters include such values as how many days after the last edit of a page to inject ads into it if it doesn't otherwise qualify.

This also allows turning on run-time debugging in some key scripts.

2021-01-06: Hosting

Today's equation is: 1 megabit / second = 0.3285 terabytes / month

Thus EOU's outgoing FTTC connection could (if not limited by the RPi etc) nominally serve more than 5TB/month, which was my bandwidth budget for another site hosted in the US.

I am testing out a VPS with 250Mbps unmetered bandwidth, ie ~82TB/month.

This host would accommodate a DNS secondary, and a gallery.hd.org mirror.

2021-01-05: Lockdown Reboot

Under the new England lockdown all four of us are at home (nearly) all day, the two kids on remote learning, and the adults WFH. Everyone will be at home for at least about seven weeks.

Today as I went out for my exercise walk a little after 1pm, the Internet connection dropped out, booting the three in the house off-line.

The Vigor2862 router is a bit flaky, so I have set up a regular weekly reboot of the router. I have also set up a calendar reminder for me to check that the older RPi server has come back on line after, since it often doesn't in this circumstance.

2021-01-04: JavaScript Minimisation

I had a small brainwave to (marginally) reduce the weight of the first page for each new visitor. The share43 JavaScript is compact, but not minified. So I ran it through codebeautify.org/minify-js and re-inserted a slightly-trimmed version of the copyright line, and made the name slightly shorter too, and generated the pre-compressed versions, to go from:

2787 share42.js
1180 share42.jsgz
 900 share42.jsbr

to:

2630 min.js
1114 min.jsgz
 852 min.jsbr

A whole 48 bytes (~5%) lopped off the brotli-compressed version, in fact a little over 5% saved from all versions!

Those are then svn cped to the m-dot area, with slightly different names (not shorter in that case).

I should have done this ages ago!

2021-01-01: Year-end and Month-end Data Munging

At the turn of the month and the turn of the year there's quite a lot of data collection and analysis to be done. I'm part-way through as I write.

Given the very grey dull day yesterday and today (I encountered a little gentle sleet while on my lockdown exercise walk) I think that few if any pages will get published until tomorrow's forced make all. Given that, and the "work storage" scheme, I've guessed the datePublished to be 2021-01-02T14:00Z, even though I'm writing it 24 hours ahead of that!

I note that I still haven't created canonical versions of some of the data files from the switch to the new RPi server around August, including (old server versions):

data/powermng/202008-old.log.gz
data/OpenTRV/pubarchive/localtemp/202008-old.log.gz
data/OpenTRV/pubarchive/remote/202008-old.json.gz
data/16WWHiRes/Enphase/202008-old.log.gz
data/16WWHiRes/Enphase/202008-old.daily.production.json.gz
data/SunnyBeam/202008-old.gz

I'm dealing with the SunnyBeam and main Enphase logs (data/16WWHiRes/Enphase/202008-XXX.log.gz, merging old and new with sort -u) while making annual xz logs.

The powermng logs were merged by splicing them at the point that the Morningstar controller was moved from old to new RPi. This is evidenced by the AL -1 indicating when the Morningstar is not connected. The splice happens at these two lines:

2020/08/21T12:30:06Z AL 1334 B1 14057 B2 -1 P 7170 BV 13857 ST VH D V A1P 18058 B1T 23 UC 100
2020/08/21T12:40:06Z AL 921 B1 13575 B2 -1 P 7643 BV 13390 ST H D h A1P 12532 B1T 23 UC 100

For the main OpenTRV sensor logs data/OpenTRV/pubarchive/remote/202008-XXX.json.gz, the old and the new are simply concatenated. The splice happens at these two lines:

[ "2020-08-21T20:08:41Z", "", {"@":"FA97A8A7B7D2D3B6","+":3,"O":2,"H|%":56,"vac|h":0} ]
[ "2020-08-21T21:12:25Z", "", {"@":"96F0CED3B4E690E8","+":14,"tS|C":0,"vC|%":21,"gE":0} ]

For the older (local-only) OpenTRV sensor logs data/OpenTRV/pubarchive/localtemp/202008-XXX.log.gz, the old and the new are again simply concatenated. The splice happens at these two lines:

2020/08/21 20:06:51Z 26.5 =F@26C8;X0;T14 7 W255 0 F255 0 W255 0 F255 0;S 14 20 c;C5
2020/08/21 21:12:49Z 26.5625 =F@26C9;X0;T15 17 W255 0 F255 0 W255 0 F255 0;S 14 20 c;C5

The new server's log for 202008 is copied to use as the canonical version for the once-per-day snapshot of all Enphase values. This on the basis that a reliable merge is hard given the file format. Old and new should contain the same amount of information.

% svn cp 202008-new.daily.production.json.gz 202008.daily.production.json.gz

This completes the pending reconciliation work from September, I think!

The various -old and -new files should not be removed, to allow for other reconciliations and uses if desired.

~1136 words.