Date: Tue, 19 Jan 2016 10:20:09 +0000 Subject: Fwd: COHEAT REV2/REV9/ELV options From: Marko Cosic To: Damon Hart-Davis Hi Damon, Please go with this version for publication. Photo/plans attached if useful. I put all (except the photo of their original plans) in the public domain. ---------- Original sent on 2016/01/12 lightly edited below... COHEAT's publishable notes on operating the REV2/REV9/ELVs and ideas for improving performance follow: *Today:* In practice every REV2 is within earshot of every REV9 on the site. (attached photo taken by COHEAT and released into public domain, attached plan of dubious copyright status) We allocate dedicated 3-second slots across the whole site for polling each REV9 and co-ordinate these centrally, so there are zero collisions. The cycle is 10 minutes (200 slots available) The system always communicates from a fixed REV2 to/from each REV9. It retries once if it does not receive a response from the REV9 and there is still time within it's allocated 3 second slot. Sometimes the messages don't get through because of interference with communications between REV9s and ELVs. We have installed 121 REV9s but ~ 40 have crashed due to brownout issues so not all are broadcasting to ELVs. When these are restored and ~ 20 more arrive in the office this problem will become worse. There's nothing we can currently do about this using current code. Sometimes the buffers overflow: things are reading messages for other devices - then ignoring them - but they arrive too fast to process. Your mainline code with a longer buffer would help here. Sometimes non-crashed REV9s are in a state where they doesn't respond to incoming messages. Knowing more about this might be helpful. The REV9 messages to the ELV are staggered by the random (person inserting battery) start time of the REV9 board, by the drift due to variable message lengths to the ELV, and by crystal drift on the REV9. All told we expect a the current 50% success rate (wavy finger estimate from live data - I haven't calculated form logs) for messages to/from REV9s to drop to 30% once all are up and running for an expected time between messages of 20-30 minutes. We don't know what to expect from the REV9 to ELV reception but expect that it won't be pretty either. *Tomorrow:* We keep the 3-second slots for polling REV9s, and deploy a few more tricks to deploy to mitigate the impact of failed messages on our control system. These won't improve the intrinsic success rate though. We are NOT going to try and track REV9<>ELV sync to avoid these transmissions/collisions because (1) it's hard work on paper as it stands and (2) timer drift makes it near impossible in practice. I say near impossible: you could do it if you re-synced regularly so that crystal drift didn't get too far out of hand, but it doesn't buy enough to be worth the effort. *OpenTRV change priority:* 1) Move back to mainline code with the increased message buffer length. This improves maintainability and usefulness of the codebase to OpenTRV. The increased buffer length should improve success rate. Note: we'll need the power-on-self-test modifications to ignore stuck buttons in your mainline code. 2) Channel shift. The REV9-ELV communications can't move because we're committed to the naff ELVs. The REV2-REV9 communications can move. Could we try make that piece of code happen please? For starters it can be hard coded channels. All REV9s receive on Channel X, broadcast to REV2s on Channel Y, and broadcast to ELVs on Channel Z. More adventurous might have different groups of REV2s and REV9s on different channels. More adventurous yet would be commanding a channel change over the air if this channel is found to be noisy locally. For now moving the all REV2<>REV9 off the ELV channel would more than likely yield the success rate that we need for our control system. 3) Re-sync command. Include an over-the-air option for forced REV9<>ELV re-sync. There will be times when REV9<>ELV transmissions overlap: if power cuts don't do this then crystal drift will. REV2<>REV9 will work but REV9<>ELV will become unavailable for certain ELVs. The option to trigger a re-sync remotely fixes this. In the grand scheme of things this isn't a priority: a few radiator valves that don't do immediately as they're told doesn't totally break our control logic. Losing temperature readings if we can't poll REV9s is more problematic: fallback mode works but we can't demo fancy control. 4) Increase baud rate for REV2<>REV9 communications. Half the battle with the ELV devices is OOK on the same channel and with an the ultra low baud rate. REV2-REV9 could even use FSK, Jeenode/OpenEnergyMonitor sytlee. (we had no issues with 30 of their devices broadcasting at 10 second intervals) Google says you're already thinking this... ...but maybe we don't ask you to do it today? ;-) https://opentrv.atlassian.net/browse/TODO-690 http://www.earth.org.uk/OpenTRV/OpenEnergyMonitorProtocolNotes.txt *Deployment* We have a duplicate setup (server, raspberry pis, REV2s, REV9s) in the office that we can experiment on. Our current control code falls back to a default behaviour when we lose data from all REV9s in a property. One-off access for non-plumbing works isn't too problematic. As such we don't need backwards compatibility with the existing REV9s. Assume that they can all be field reprogrammed at leisure. Backwards compatibility with the existing REV2s would be very helpful. (adding to the API rather than rewriting it) Swapping out the devices in the field isn't difficult if needs be. *Notes on power supplies* Power quality has proven challenging. 1) There are an abnormally high number of power outages on this site. (I've been on site and witnesses three in person in the past 6 months) 2) There's a fault with the design of the battery<>mains switchover circuit on the REV9 boards. If the mains voltage sags slowly then COHEAT designed power supply circuitry will brownout the OpenTRV circuitry before the battery backup kicks in. 3) The OpenTRV circuitry doesn't always recover from brownouts. 4) Not all (official) Raspberry Pi power supplies are created equal. We have built a "guillotine board" to address the design fault with the REV9 boards. If voltage drops below 4.75V it will immediately disconnect the power and short across the supply rails to force a blackout before we brownout the REV9 boards. There's an override for initial startup. Some Raspberry Pi power supplies will turn on, rise to >4.75V, and maintain >4.75V even when 10 OpenTRV boards and their input power stage are instantaneously connected to the Raspberry Pi power supply. Other's rise to >4.75V but instantly drop below 4.75V when 10 OpenTRV boards are connected. They're all official supplies from Farnell: this is variability between the supplies. This may explain some of the behaviour observed on site, with certain sets of REV9s more prone to disappearing than others. I'll be addressing this with capacitors on the output of the Pi power supply/input to the "guillotine board" on site. 5) In future we're never using wall-wart power supplies - we'll build our own from components such that we know what the actual spec is. Everything also gets on-board brownout/blackout protection external to the micro/CPU. *Notes on ELVs* They do appear to re-sync even after the 30 minute period has expired and they've moved to a fixed 30% position. *Notes on logs* COHEAT have been tracking every message sent and received from REV2s/REV9s and will work out how to extract this in a useful format for you. -- Marko Cosic Technical Director COHEAT Ltd is a company registered in England and Wales. Registered number: 08583328. Registered office: Future Business Centre (RS10), Kings Hedges Road, Cambridge CB4 2HY