Earth Notes: On Website Technicals (2023-01)
Updated 2023-04-08 17:45 GMT.By Damon Hart-Davis.
2023-01-28: 410 Gone
Now that the Gallery is back, albeit skeletally, bots are going at it quite hard. Some URLs that they seem especially keen on, such as the 'pick a random entry' page, will not be coming back.
To try to make that clear a 410 status ("Gone") is now returned:
RewriteRule ^/_cat/doHTMLRandom.jsp - [L,G]
This rule is placed early to avoid any redirections of the request before the 410 is delivered, ie to minimise useless requests.
If 410 seems at all effective then I will apply it more widely.
2023-01-25: Bib GZip
I added MIME-type text/x-bibtex
and suffix bib
to the list of of files that can be GZipped on the fly if the client accepts that content encoding. That may save a few bytes for anyone using my bibliography.
I have also atomised the monolithic 92-entry general.bib
file into separate source files, one per entry, and now automatically reconstruct the monolith from them when they change.
2023-01-27: Faster
I switched a couple of uses of the .bib
files (checking for existence of a citation, and extracting a single terse citation) to the individual single-entry files, to stay nearer O(n) time than O(n^2)...
2023-01-24: Little Date Utility
A new handy and portable (at least between my Mac and RPi!) utility to print the current time plus n hours UTC:
% sh script/UTCplusH.sh 2023-01-24T21:46:56Z % sh script/UTCplusH.sh 1 2023-01-24T22:47:00Z % sh script/UTCplusH.sh -1 2023-01-24T20:47:03Z % sh script/UTCplusH.sh 10 2023-01-25T07:47:06Z % sh script/UTCplusH.sh 48 2023-01-26T21:47:11Z
The core is:
gawk 'BEGIN{print strftime("%FT%TZ", systime()+3600*ADJH, 1)'}
2023-01-23: Sendmail confHELO_NAME
For all sorts of reasons ... mumble mumble ... and idleness and trepidation, it has been the case that the name by which my sendmail
MTA
(Mail Transfer Agent) greets the world when sending or receiving email has not matched the domain name that comes up when looking up the PTR
record for its IP address in DNS.
This is naughty, or at least slack, and sometimes the smell of a SPAMmer, and has caused a few remote mail servers to reject outgoing mail from my system. They are right to do this in fact!
Because I no longer have a whole class C block of 256 addresses (or indeed three of them: those were the days) I do not directly control the PTR
records, and have to go through whatever custom system the ISP provides. Things has drifted apart over some years, thus the discrepancy. (It seems that even my ISP does not directly manage the PTR records any more, so changes may be a doubly-manual annoying and error-prone process!)
I will do a more thorough review of these mappings given how services have moved about, and I will also pin down outgoing mail to be from
a single IP address
with CLIENT_OPTIONS
.
In the meantime, thanks to search engines and
serverfault.com
, I have discovered the confHELO_NAME
configuration item, and I have set it to be the actual current PTR
record domain name.
And my first outgoing email to one of the picky mail servers worked immediately. Hurrah!
Biliography Lite
To roughly halve the size of the HTML for the m-dot lite site, I now make a version with abstract and keywords omitted.
Conversely, for the desktop page now my notes default to open
, and are thus searchable with the browser search function.
2023-01-15: Lowering Sizes
Every time I encounter the ProGuard Java shrinker and optimiser I am newly impressed.
I was unhappy with the >600kB JAR file supporting the grid intensity page.
After a couple of hours wondering why ProGuard was blaring warnings at me, I was down to ~50kB fully obfuscated (but a little hard to debug)!
Done with ~20 lines added to my ant
build.xml
file.
I have now settled on a ~80kB unobfuscated, slightly statically optimised, and much cleaner and more robust JAR file. Hurrah!
684487 Jan 10 18:56 reutils-1.1.21.jar 84974 Jan 15 19:30 edhMain.reutils-1.1.22.jar
Lower, Lower!
By being a bit bolder and allowing almost all obfuscation to reduce size, but retaining some information (source file names and line numbers) to help diagnose run-time exceptions, size is now ~10% of the original. (Removing unrelated functionality no longer used would cut that further!)
65114 16 Jan 15:04 edhMain.reutils-1.1.23.jar
Example exception output now:
% sh extraTweet.sh "testing 1 2 3" FAILED command: extraTweet java.lang.IllegalArgumentException at org.hd.d.edh.E.a(TwitterUtils.java:370) at org.hd.d.edh.Main.main(Main.java:146)
Naturally there were a couple of astonishments due to the obfuscation. But those were fixed and some overdue related code improvements done.
2023-01-14: Lowering Standards
A couple of pages on EOU, eg my PhD research page, are rated as "very difficult to read" by a Flesch–Kincaid scoring mechanism.
% reado --unfluff < PhD-research.html ... score: 29.94 school level: college graduate notes: Very difficult to read. Best understood by university graduates.
Pages tagged as 'technical' are allowed a lower readability than most. But the previous value (42) was way too high for these rogue pages, so that 'technical' threshold has been lowered to 25 for now!
2023-01-10: Undead URLs
It is fascinating that search engines (in this case apparently at Microsoft) are still polling for Gallery URLs that have been dead more than a decade, possibly two!
d.hd.org:80 40.86.XX.XX - - [11/Jan/2023:15:18:16 +0000] "GET /_I/cat/13/20yuxbmq0pc.HTM HTTP/1.1" 404 397 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15" www.hd.org:80 40.69.XX.XX - - [11/Jan/2023:15:18:18 +0000] "GET /Damon/_I/cat/19/2g4emqm6t78.HTM HTTP/1.1" 301 583 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15" d.hd.org:80 40.69.XX.XX - - [11/Jan/2023:15:18:18 +0000] "GET /_I/cat/19/2g4emqm6t78.HTM HTTP/1.1" 404 397 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15" d.hd.org:80 52.173.XX.XX - - [11/Jan/2023:15:18:19 +0000] "GET /_I/cat/19/rs1t27q1u0.HTM HTTP/1.1" 404 397 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15"
2023-01-08: Posting/Tooting on Mastodon from Java
I did not see a nice copy-and-paste example elsewhere though it was not that hard in the end, so here is the core of my solution:
// Fetch the auth tokens, or silently abort if not available... final String authtoken = getMastodonAuthToken(); // Send message... // Here is how to do it with curl... // (MAT is a file containing the access token.) // % curl https://mastodon.energy/api/v1/statuses -H "Authorization: Bearer `cat $MAT`" -F "status=$1" // See https://dev.to/bitsrfr/getting-started-with-the-mastodon-api-41jj // Use URL encoding to force into ASCII (7-bit) encoding. final String formEncodedBody = "status=" + URLEncoder.encode(statusMessage, StandardCharsets.US_ASCII); final int timeout_ms = 10000; final URL u = new URL("https", md.hostname, "/api/v1/statuses"); final HttpsURLConnection uc = (HttpsURLConnection) u.openConnection(); uc.setUseCaches(false); uc.setAllowUserInteraction(false); uc.setDoOutput(true); uc.setDoInput(true); uc.setConnectTimeout(timeout_ms); uc.setReadTimeout(timeout_ms); uc.setRequestMethod("POST"); uc.setRequestProperty("Authorization", "Bearer " + authtoken); uc.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); uc.setRequestProperty("Content-Length", String.valueOf(formEncodedBody.length())); final OutputStream output = uc.getOutputStream(); output.write(formEncodedBody.getBytes(StandardCharsets.US_ASCII)); output.close(); final int responseCode = uc.getResponseCode(); final String responseMessage = uc.getResponseMessage(); uc.disconnect();
(Note the embedded cURL
example and link!)
2023-01-07: Author Lists
I have now put in place code to parse author lists in the bibliography (split by semicolon or "and"), truncate overly-long lists and tail them with
et al.
(the trailing "." makes it an unsexed abbreviation I understand), and standardise the separator in the HTML to be ";" for compactness. Each author gets their own itemprop=author
section, which improves the metadata.
There are some potential potholes with ;
s in the HTML entity codes for accented letters, but so far that has been swerved around by insisting on a space after a ";" as separator.
2023-01-04: ASCII7 Bibliography
It turns out that some people have accents in their names! I am shocked, I tell you!
I want the bibliography .bib
files to stay ASCII7
, ie 7-bit ASCII, for robustness.
Accents don't fit in that reduced character set. HTML can get round that with entities such as "é
" for "é". LaTeX, which is in effect the source language of BibTeX, uses backslash escapes such as
\'{e}
(or {\'e}
).
I have set up my HTML conversion to recognise the single one of these that has so far appeared! [bouckaert2021net]
2023-01-01: New Year New Stats
I have spent most of today gathering and archiving and commenting on energy stats for 16WW!
One improvement was to switch to the hour-by-hour carbon figure for electricity for the front-page 16WW carbon-footprint graph (a snapshot of which is shown here) where available, ie for 2020 onwards.
(A little time was spent updating copyright notices for 2023, too!)
References
(Count: 1)