taw's blog

The best kittens, technology, and video games blog in the world.

Tuesday, September 27, 2016

Some ideas for building a computer from scratch

Jasper's computer by evhoffman from flickr (CC-SA)
Very long time ago I tried to build a computer from 74LS parts. I didn't get too far, basically I had 40-bit switch input board, 40-bit led output board, and somewhat reasonable ALU.

I had plans to add register file and instruction decoder, and zero clue how to go from there to memory, clock, and I/O - but when I realized how painful soldering has been, and how much worse register file is going to be, I dropped the whole project.

It was still fairly cool way to learn electronics, and you can check the pictures and schematics.

So building entire computer still seems overwhelming, but here's a simpler idea.


  • I could skip soldering completely by just getting a stack of breadboards. It's going to be less compact, and I'll need to setup some kind of frame for them, but anything beats soldering.
  • I wondered about wirewrapping as another alternative to soldering, but nobody uses that.
  • PCBs seem even harder than soldering.
  • 74HC are apparently the new 74LS

Design tools

  • I've heard rumors that there are better languages than Verilog for modelling hardware. It would be nice to investigate one of them.
  • In addition to Verilog-level simulation, I'd like to do some wiring estimations before I start building it
  • Compiler for that architecture shouldn't be too hard to build.

What I actually want to build

  • I want ALU, register file, and instruction decoder
  • some RaspberryPi (or Arduino) will hold memory contents and interface with I/O - it will still need some kind of controller for it
  • Interface between RPi and computer will itself be somewhat complex
  • I'm not sure if I'll have separate clock, or if I'll use RPi as clock source
  • At some point I'd like to get memory - but still every bootup RPi will send contents to that memory before my computer will start running
  • Even if I get memory and clock off-RPi, it will still handle I/O, networking etc. It's nothing unusual, today every disk, every network card, every wifi dongle etc. have tiny computer on board.

Design ideas

  • The fancy multiport register file needs to go as wiring hell, and I can have classic CISC style specialized registers. It doesn't seem like a big difference when you look at Verilog, but there's a reason all old architectures did it this way, and trying to build one taught me that lesson.
  • Completely separate code and data will simplify a lot over shared memory architecture.
  • With separate code vs data, I can make code words as wide as I'd like - it sort of offloads part of decoder duties to compiler.
  • I don't think it will need any fancy microcode, then again any instruction will inevitably take multiple stages. I could start by having extremely wide words, and then moving parts of this logic gradually to instruction decoder.
  • It feels really hard not having 32bits. Maybe going 32bit is reasonable - just making registers wider is simple, ALU won't get any harder (ALU won't do multiply or anything crazy like that, we'll do that in software) except for some zero checks etc., so it's just a matter of wiring all that to memory interface.
  • 32bit architecture would definitely require RPi as controller, Arduino or small 74HC chip just won't have enough memory.

Where to start

  • I think the first step would be to see if Raspberry Pi is a viable memory controller. The RGB LED experiment with software PWM was sort of sanity checking that, but actual memory interface would be much more complicated.
  • If it can run at semi-reasonable speed, I can proceed to designing the rest of the system, and building it part by part. If it doesn't, the whole approach will need rethinking.

Monday, September 26, 2016

The Next Ruby

śpie by der_cat from flickr (CC-SA)

Ruby turned out to be the most influential language of the last few decades. In a way that's somewhat surprising, as it didn't come up with that many original ideas - it mostly extracted the best parts of Perl, Lisp, Smalltalk, and a few other languages, polished them, and assembled them into a coherent language.

The Great Ruby Convergence

Nowadays, every language is trying to be more and more like Ruby. What I find most remarkable is that features of Perl/Lisp/Smalltalk which Ruby accepted are now spreading like wildfire, and features of Perl/Lisp/Smalltalk which Ruby rejected got nowhere.

Here's some examples of features which were rare back when Ruby got created:
  • Lisp - higher order functions - Ruby accepted, everyone does them now
  • Lisp - everything is a value - Ruby accepted, everyone is moving in this direction
  • Lisp - macros - Ruby rejected, nobody uses them
  • Lisp - linked lists - Ruby rejected, nobody uses them
  • Lisp - s-expression syntax - Ruby rejected, nobody uses them
  • Perl - string interpolation - Ruby accepted, everyone does them now
  • Perl - regexp literals - Ruby accepted, they're very popular now
  • Perl - CPAN - Ruby accepted as gems, every language has it now
  • Perl - list/scalar contexts - Ruby rejected, nobody uses
  • Perl - string/number unification - Ruby rejected, nobody uses them except PHP
  • Perl - variable sigils - Ruby tweaked them, they see modest use in Ruby-style (scope indicator), zero in Perl-style (type indicator)
  • Smalltalk - message passing OO system - Ruby accepted, everyone is converging towards it
  • Smalltalk - message passing syntax - Ruby rejected, completely forgotten
  • Smalltalk - image based development - Ruby rejected, completely forgotten
You could make a far longer list like that, and correlation is very strong.

By using Ruby you're essentially using future technology.

That was 20 years ago!

A downside of having a popular language like Ruby is that you can't really introduce major backwards-incompatible changes. Python 3 release was very unsuccessful (released December 2008, today it's about even split between Python 2 and Python 3), and Perl 6 was Duke Nukem Forever level fail.

Even if we knew for certain that something would be an improvement, and usually there's a good deal of uncertainty before we try. But let's speculate on some improvements we could do if we weren't constrained by backwards compatibility.

Use indentation not end

Here's some Ruby code:

class Vector2D
  attr_accessor :x, :y
  def initialize(x, y)
    @x = x
    @y = y
  def length
    Math.sqrt(@x**2 + @y**2)

All the ends are nonsense. Why can't it look like this?

class Vector2D
  attr_accessor :x, :y
  def initialize(x, y)
    @x = x
    @y = y
  def length
    Math.sqrt(@x**2 + @y**2)

It's much cleaner. Every lexical token slows down code comprehension. Not character - it really makes no difference between end vs }, but all the extra tokens need to be processed even if they're meaningless.

Ruby dropped so much worthless crap like semicolons, type declarations, local variable declarations, obvious parentheses, pointless return statements etc., it's just weird it kept pointless end.

There's minor complication that chaining blocks would look weird, but we can simply repurpose {} for chainable blocks, while droping end:

ary.each do |item|
  puts item


  item.price > 100
  puts name

This distinction is fairly close to contemporary Ruby style anyway.

If you're still not sure, HAML is a Ruby dialect which does just that. And Coffeescript is a Ruby wannabe, which does the same (while going a bit too far in its syntactic hacks perhaps).

Autoload code

Another pointless thing about Ruby are all the require and require_relative statements. But pretty much every Ruby project loads all code in a directory tree anyway.

As Rails and rspec have shown - just let it go, load everything. Also make the whole standard library available right away - if someone wants to use Set, Pathname, URI, or Digest::SHA256, what is the point of those requires? Ruby can figure out just fine which files are those.

Files often depend on other files (like subclasses on parent classes), so they need to be loaded in the right order, but Rails autoloader already solves this problem.

That still leaves out files which add methods to existing objects or monkeypatch things, and they'll still need manual loading, but we're talking about 1% of use cases.

Module nesting needs to die

Here's some random Ruby code from some gem, rspec-expectations-3.5.0/lib/rspec/expectations/version.rb:

module RSpec
  module Expectations
    module Version
      STRING = '3.5.0'

That's an appalling ratio of signal to boilerplate.

It could seriously be simply:

module Version
  STRING = '3.5.0'

With the whole fully qualified name being simply inferred by autoloader from file paths.

The first line is technically inferrable too, but since it's usually something more complex like Class Foo < Bar, it's fine to keep this even when we know we're in foo.rb.

Module nesting based constant resolution needs to die

As a related thing - constant resolution based on deep module nesting needs to die. In current Ruby:

Name = "Alice"
module Foo
  Name = "Bob"

module Foo::Bar
  def self.say_hi
    puts "Hi, #{Name}!"

module Foo
  module Bar
    def self.say_hello
      puts "Hello, #{Name}!"

Foo::Bar.say_hi     # => Hi, Alice!
Foo::Bar.say_hello  # => Hello, Bob!

This is just crazy. Whichever way it should go, it should be consistent - and I'd say always fully qualify everything unless it's in the current module.

New operators

Every DSL is abusing the handful of predefined operators like <<, [], and friends.

But there's seriously no reason not to allow them to create more.

Imagine this code:

class Vector2DTest
  def length_test
    v = Vector2D.new(30, 40)
    expect v.length ==? 50

That's so much cleaner than assert_equal or monkeypatching == to mean something else.

I expect that custom operators alone would go halfway through making rspec style weirdness unnecessary.

Or when I have a variables representing 32-bit integers for interfacing with hardware, I want x >+ y and x >! y for signed and unsigned comparisons instead of converting it back and forth with x.to_i_signed > y.to_i_signed and x.to_i_unsigned > y.to_i_unsigned.

This obviously will be overused by some, but that's already true with operator overloading, and yet everybody can see it's a good idea.

We don't need to do anything crazy - OCaml is a decent example of fairly restrictive class of operator overloading that's still good enough - so any operator that starts with + parses like + in expressions etc., and parsers don't need to be aware of which library it uses.

a +!!! b *?% c would always mean a.send(:"+!!!", b.send(:"*?%", c)), regardless of those operators meaning anything or not.

Real keyword arguments

Ruby hacks fake keyword arguments by passing extra Hash at the end - it sort of works, but really messes up more complex situations, as Hashes can be regular positional arguments as well. It will also get messed up if you modify your keyword arguments, as it will happily modify Hash in the caller.

We don't check if last argument is a Proc, we treat them as a real thing. Same should apply to keyword arguments.

Ruby is currently built around send operation:
  object.send(:method_name, *args, &block_arg)

we should make it:
  object.send(:method_name, *args, **kwargs, &block_arg)

It's a slight incompatible change for code that relied on previous hacky approach, and it makes method_missing a bit more verbose, but it's worth it, and keyword arguments can help clean up a lot of complex APIs.

Kill #to_sym / #to_s spam

This is somewhat of a cultural rather than cultural problem, but every codebase I've seen over last few years is polluted by endless #to_sym / #to_s, and hacks like HashWithIndifferentAccess. Just don't.

This means {foo: :bar} syntax needs to be interpretted as {"foo" => "bar"}, and seriously it just should. The only reason to get anywhere close to Symbols should be metaprogramming.

The whole nonsense got even worse than Python's list vs tuples mess.

Method names should not be globally namespaced String

This is probably the biggest change I'd like to see, and it's somewhat speculative.

Everybody loves code like (2.hours + 30.minutes).ago because it's far superior to any alternatives, and everybody hates how many damn methods such DSLs add to common classes.

So here's a question - why do methods live in global namespace?

Imagine if this code was:

class Integer
  def time:hours
  def time:minutes
  def time:ago
    Date.now - self

and then:

  (2.time:hours + 30.time:minutes).time:ago

This would let you teach objects how to respond to as many messages as you want without any risk of global namespace pollution.

And in ways similar to how constant resolution works now with include you could do:

class Integer
  namespace time
    def minutes
    def hours
    def ago
      Date.now - self

and then:

  include time
  (2.hours + 30.minutes).ago

The obvious question is - how the hell is this different from refinements? While it seems related, this proposal doesn't change object model in any way whatsoever by bolting something on top of it - you're still sending messages around - it just changes object.foo() from object.send("foo".to_sym) global method namespace to object.send(resolve_in_local_lexical_context("foo")), with resolution algorithm similar to the current constant resolution algorithm.

Of course this is a rather speculative idea, and it's difficult to explore all consequences without trying it out in practice.

Unified matching/destructuring

Here's a feature which a lot of typed functional programming languages have, and which Ruby sort of has just for  Strings and regular expressions - you can test for a match and destructure in a single expression:

case str
when /c:[wubrg]/
  @color = $1
when /t:(\S+)/
  @type = $1

Doing this kind of matching on anything else doesn't work because $1 and friends are some serious hackery:
  • $1 and friends are accessing parts of $~ - $1  is $~[1] and so on.
  • $~ is just a regular local variable - it is not a global, contrary to $ sigil.
  • =~ method sets $~ in caller's context. It can do it because it's hacky C code.
Which unfortunately means it's not possible to write similar methods or extend their functionality without some serious C hacking.

But why add a mechanism to set caller $~, and then we could create our own matchers:

case item
when Vector2D
  @x = $~x
  @y = $~y
when Numerical
  @x = $0
  @y = $0

To be fair, there's a workable hack for this, and we could write a library doing something like:

case s = Scanner(item)
when Vector2D
  @x = s.x
  @y = s.y
when Numerical
  @x = s.value
  @y = s.value

and StringScanner class in standard library which needs just a tiny bit extra functionality beyond what String / Regexp provide goes this way.

But even that would still need some kind of convention with regards to creating scanners and matchers - and once you have that, then why not take one extra step and fold =~ into it with shared syntactic sugar?

Let the useless parts go

Here's an easy one. Ruby has a lot of crap like @@class_variables, protected visibility (pop quiz: what it actually does, and how it interacts with method_missing), Perl style special variables like $=, method synonyms like #collect for #map, flip flop operator, failed experiments like refinements etc.

Just let it all go.

Wait, that's still Ruby!

Yeah, even after all these changes the language is essentially Ruby, and backwards incompatibility shouldn't be that much worse than Python 2 vs 3.

Tuesday, September 13, 2016

Adventures with Raspberry Pi: RGB Led Take 2

Once upon a time I tried to do software PWM to set different colors in a RGB Led. It failed.

LEDs are generally either on or off, with no intermediate states - so to get LED at half the intensity, you just turn it fully on for half the time, and blink it fast enough that human eye won't be able to tell the difference.

The problem was that the blinking wasn't fast enough. So now it's time for the long overdue debugging.

First, what the hell is the gem doing? Apparently it's simply writing to files like /sys/class/gpio/gpio17/value. So what if we just write to this file in a loop, skipping the gem? It turns out that's also just not fast enough.

So fallback plan, let's get wiringPi library (git clone git://git.drogon.net/wiringPi) and write it in C:

#include <stdio.h>
#include <wiringPi.h>

int main(int argc, char **argv)
  int r = atoi(argv[1]);
  int g = atoi(argv[2]);
  int b = atoi(argv[3]);
  int i;

  printf("Raspberry RGB %d %d %d blink\n", r, g, b);

  if (wiringPiSetup () == -1)
    return 1;

  pinMode(0, OUTPUT); // R
  pinMode(2, OUTPUT); // G
  pinMode(3, OUTPUT); // B

  for (;;)
    digitalWrite(0, rand() % 256 <= r);
    digitalWrite(2, rand() % 256 <= g);
    digitalWrite(3, rand() % 256 <= b);
  return 0;

Then compile with gcc -o rgbled rgbled.c -lwiringPi, and run like sudo ./rgbled 255 127 0
And it works!

Now obviously I don't want to write C programs for every trivial thing, so next step would presumably be using ffi interface to wiringPi instead of what PiPiper does with file-based interface.

Monday, September 12, 2016

Using Trello for GTD

I IZ SAD. NO CHEEZBURGER 4 ME by stratman² (2 many pix and busy) from flickr (CC-NC-ND)
The big problem with GTD is that no software solution really matches the ideal workflow, and using post-it notes for it has its own problems.

I tried a lot of different software solutions. For fairly long time I tried using a bunch of files in a Dropbox folders for it. The big upside was how easy it was to integrate with it - just crontab a script to put a file into inbox if I need to be notified of something. But plain text files are really poor format for anything.

So as another tool in the long list I tried Trello. Here's the setup.

GTD board

I have one main list with a lot of columns:
  • Today (5) - just a place to highlight whatever I'm currently working on, or plan to work on if top item gets blocked. It gets empty by either finishing things or moving them back to action lists about daily.
  • Next Actions - I don't really feel like there's much value in using crazy number of contexts, most of which would contain no or very few items most of the time, so most actions go here.
  • Code Me - There's pretty much the only context which is constantly filled and clearly distinct from non-code actions.
  • Waiting For - what I'm waiting on to happen. Trello has advantage over plain text files, as I can put links, dates etc.
  • Someday/Maybe - a fairly vague list of ideas
  • Projects to Plan - these are sort of next actions, any project with no obvious next action goes there; the idea is that they'd go to Projects list once more actionable. It could be seen as another next actions column with "Plan Me" context tag.
  • Projects - any projects bigger than one action go here. Actions and projects should generally be linked, but usually it's obvious enough that I don't bother. Trello doesn't have easy way of showing projects with no associated actions, so I wanted to write a script to tag them, but I never got to it (Trello API isn't too bad).
  • Done - any recently finished action or project
  • Areas of Responsibility - mostly for reference during reviews. Anything bigger than a project.

GTD Archive board

About once a week I move Done column there, and add a proper date. It's mostly a feel-good board, with fairly little functionality.

Trello labels

Any long running project or area of responsibility gets its own label, as labels are the only easy way to tag trello cards. I use Card Color Titles for Trello Chrome extension, as otherwise Trello labels are fairly useless (you can see before and after in that link).

The only other label is red "blocked" label, which can be quickly applied and unapplied to action cards.

Off-Trello parts

Once upon a time I used to have "Buy Me" list, but nowadays I just throw things into my Tesco groceries or Amazon basket right away, and actually buy them weekly or so - and things not purchasable in either are rare enough they can go into generic action list.

Inbox is still a Dropbox folder, mostly with plain text files, so existing crontab scripts can still use it.

How Well it Works?

It all sort of works, but it's not exactly amazing. I don't plan to return to plaintext files, but I'll probably try something else eventually.

It's really annoying that I can't use it when offline, for example in London Underground - Dropbox had far better online/offline integration.

Friday, September 02, 2016

Modern Times mod for Crusader Kings 2 - Reaper's Due release

Lucy by hehaden from flickr (CC-NC)
Here's new release of Modern Times mod, now updated for 2.6.1, and mostly containing bugfixes, such as American invasion no longer accidentally being theocracy.

By accident infectious diseases were all gone in previous versions of the mod. While this could accurately show modern medicine, it's more fun to keep them, so now you can get them all depending on your game rules choices.

You'll only get Black Death if you set it to random, as on historical settings it will be long gone. Minor diseases happen just as in vanilla.

There's no SARS / HIV / bird flu or anything like that.

Thursday, September 01, 2016

How to teach coding

Book cat by raider of gin from flickr (CC-BY)

I've been helping people learn coding for a while now. Here are some notes.

Free resources

  • there's a lot of free resources out there
  • nearly all of them are of poor quality
  • it's very difficult to make good resources for make resources for someone very different than you - and by the time you can write a tutorial you're long past beginner phase
  • very often resources spend far too much time on pointless distractions, have huge difficulty spikes, present material in order where current lesson depends on something that will only be explained in the future etc. It's clear they're not adequately tested on actual beginners.

How to learn coding

There's absolutely no reason for anyone to ever do anything else than:
  • stay in-browser as much as possible
  • learn basics of HTML and CSS
  • learn basics of jQuery
  • only then progress to anything else
As far as I can tell that's the only way beginners can actually create something interesting and useful.

If you start by teaching people ruby or python, the best they can do is some completely artificial terminal programs like guess-a-number or such.

Even if someone needs to learn ruby/python, the best way is to first teach them web technologies, and then thanks to some framework like Ruby on Rails they can build something useful.

I'd very strongly recommend against teaching people "Javascript" as such. What people need is just bare minimum to be able to do simple jQuery style manipulations. Non-jQuery Javascript is better left for far later.


A lot of resources try to teach beginners how to use terminals, text editors like Atom, git, github etc. before they get to any coding. Crazy ones even try things like vim.

It's mindboggling why anybody would consider it appropriate to start with this. It's a massive distraction from the goal of learning programming and writing useful programs.

Fortunately there's a powerful environment even absolute beginners are comfortable with, and that's the browser.
  • repl.it - run simple program and repls in almost every programming language
  • codepen.io - experiment with HTML/CSS/Javascript and related technologies
  • most online courses have in-browser editors and tests
It's useful for every beginner to have a github account and to download Atom, but these shouldn't be the focus.

For people who use OSX, going off-browser is tolerable, but for people with Windows laptops that's huge amount of pain, so it's especially important to stay in-browser as much as possible.

Free resources reviews for web development

They're fairly good, and you can do a lot in-browser:
  • freecodecamp - this is the best beginner resource for web technologies I found - it covers a lot of content, it's well structured, and contains low amount of nonsense; there's a bunch of stuff that's "coming soon"
  • codecademy - it has a lot of content (web and non-web), but a lot of it has serious issues like random difficulty spikes and chapters with poor explanations
  • codebar tutorials - they're OK, but they suffer from having to download files and do everything locally - I found that in-browser lets beginners focus on the subject much better and be less confused by tooling
It's important that beginners can use minimum of unfamiliar tools for it, and mostly stay in-browser.

It's also great that hosting on github.io offers free and very easy to setup hosting for such apps.

Free resources for non-web development

I'm much less happy with these resources compared with web development resources:
  • ruby in 100 minutes - it seems to take people about twice as much. Whenever anyone wants to do it, I generally tell them to go chapters 2, 3, 5, 7, 8, 6, 9, 1 0, 11 and use repl.it.
  • Learn Ruby the Hard Way - I don't like this book, as it teaches Ruby as if it was Python, which feels like it completely misses the point.
  • codewars - good practice for intermediate level if you set the filters correctly (8kyu only, unsolved only, sort by popularity), as the defaults are completely wrong for beginners. It's much more useful for people who can already program and simply want practice in new language.
  • try ruby - a nice in-browser introduction. It suffers from minor distractions like symbols (I wish ruby just killed them completely) and ruby 1.9 leftovers.
  • udacity - I've been generally rather unhappy with quality of that, and they completely ignore all reported errors
  • books - just not worth it for beginners - in-browser environment and immediate feedback are just far superior
  • everything that you need to download to solve like rubykatas, exercism etc. - they're ok, but best left for later
It's much harder to setup hosting for your ruby/python programs, and it usually costs money.

Free resources for tools

Tools I'd recommend teaching:
  • stay in browser as much as possible - that's what everybody already knows
  • browser's development tools - this is generally fairly straightforward progression from basic browser skills everybody already has
  • codepen.io - far easier to get started than creating a bunch of files and keeping them synchronized etc.
  • repl.it - this should be deafult repl, not any kind of in-terminal irb/ipython/etc.
  • Atom - from what I've seen beginners have little trouble with it, unlike with some complex editors. It has ton of plugins, works on everything, and it's perfectly adequate for serious programming as well.
  • github - the browser side of it is reasonably approachable, terminal side much less so, and I'm not sure if there are any good client-side programs to make it easier.
  • github.io hosting - to keep people's motivations
  • terminal basics - it's fairly painful, and I wish Atom did more of it, so terminal would be needed less.
  • git basics - it really pains me, as this is extremely unfriendly towards beginners, but there's no alternative, and at some point they'll need to learn it - at least there's immediate payoff in github and github.io.
Unfortunately I haven't found great tutorials for any of the tools.

Wednesday, August 31, 2016

Let's Play Hearts of Iron 4 as Poland

I played Poland once before, in 1.0 version. It was my first real campaign after short Iran game to figure out game controls, and mostly thanks to AI being horrible I conquered Germany by 1938.

Now AI is somewhat less dumb, so it would probably be harder. The rush Germany strategy probably still works, but I wanted to try some alternative, in case they ever make Germany too strong for Poland to take. (there's even stronger strategy of taking advantage of Sudetenland glitch, but this is exploit-free campaign)

The strategy I wanted to try:

  • rush revanchism focus to be able to fabricate at 10% world tension, before guarantee spam starts
  • conquer all 3 Baltic states for extra factories
  • give Germans Danzig when presented with ultimatum
  • focus exclusively on Soviet Union
  • after Soviet Union falls, get Danzig back, and while at it Berlin as well

The series also tries to answers the question of just how good the build of 6 mountaineers, 2 artillery, and 1 medium tank is, but conclusion of that is only in the last episode.

Here's episode 1. The rest will be published once a day.

Sunday, August 28, 2016

Let's Play Crusader Kings 2 as Islamic State with Modern Times mod

Here's a fun campaign I played on twitch as ISIS in Modern Times 2016 start. It should hopefully suffer from fewer technical problems than my HOI4 Nationalist China campaign, which started with poor microphone positioning (but eventually got better).

The campaign was fairly short, as after death of the first caliph my backup system failed me, so I couldn't continue - but his life was definitely eventful, and it should be fairly fun to watch, and in a way this gives it some kind of closure.

It's all using Modern Times mod I wrote, which allows playing any time from 1815 Congress of Vienna to 2016 today. It's still on 2.5.2 so if you want to enjoy the diseases in Modern Times you'll have to wait a few days for mod to update.

The whole playlist is on youtube, with episodes coming once a day as usual.

Here's the first episode:


Friday, August 26, 2016

Data loss postmortem

Flash Fail? by E V Peters from flickr (CC-NC)

I just lost a lot of data, and I'm extremely annoyed, to describe thing mildly.

Here's my backup setup:
  • OSX laptop as primary
  • Gaming Windows 7 box as secondary, with cygwin installed
  • (in the past I also had a few more boxes to which this system was extended)
  • status script automatically checks all boxes - every file or folder is inspected according to some set of rules:
    • system files are considered safe
    • all git repos are considered safe if they're pushed to master with no extra files
    • everything that's in Dropbox folder is treated as safe
    • for things too big for Dropbox there's a pair of backup drives - everything on them is considered safe as long as both files contain same files (for obvious performance reasons I'm only checking directory listing not TBs of content)
    • symlinks pointing to safe locations are safe
    • there's a whitelist of locations to ignore, for various low value data, applications' folders with nothing I care about etc.
    • everything else is automatically flagged as TODO
  • to prevent data loss in shell, rm command is aliased away (safe trash is used), mv and cp are aliased to -i to prevent accidental overwriting, and I'm very strict about always using >> and never under any circumstances > in shell redirects
  • Dropbox offers 30 day undelete, so anything deleted locally can still be recovered
  • and just to be super extra sure, various cloud contents are snapshotted every now and then to backup drives; list of installed software is snapshotted etc.
  • phones, tablets etc. all sync everything with the cloud, and contains nothing valuable locally
  • MP3 player and Kindle are mirrored on Dropbox, and synchronized automatically by scripts whenever they're connected
This system is really good at dealing with hardware failures, system reinstalls, and random human error. All files are protected from single failure, and in some cases from multiple failures.

Unfortunately there are two huge holes in the system:
  • configuration which doesn't live in user-accessible files - like /etc on OSX, Windows registry etc. This is less of an issue nowadays than it used to be.
  • the manually created whitelist of locations to ignore. You can guess where this leads.
It also offers limited protection from any kind of hacking or ransomware attack, but in any realistic threat model they're not terribly important.

Video Games

For casual gamers it's enough to just install games with Steam or whatever, and enjoy.

This unfortunately is absolutely unacceptable if you're into any kind of serious gaming. Steam autoupdates both game and all its mods, with no way to roll back, so if you had any kind of long running campaign, it will get wrecked.

As far as I can tell, that's what caused death of my Let's Play Civilization V as Germany series - I probably mindlessly pressed buttons to update 3UC/4UC mods, and that resulted in unfixable save game corruption.

So to protect against this, if possible I'm not playing using Steam - instead I install every version to separate folder. All versions of same game unfortunately share same user data folders, so if I ever want to go back I need to do some folder reshuffling, but as long as I don't run that game in Steam, mods won't get overwritten by newer versions, so I can safely play even campaign that takes months.

And I'm perfectly aware than for Paradox games it's possible to revert to previous versions as betas, but that does absolutely nothing whatsoever to deal with mods irreversibly autoupdating without my consent, and in HOI4 (and apparently Stellaris, but I never played that) it's even worse as mods are saved deep in Steam user data, so I had to write some script to even have mod folder I can safely backup.

Now here's where first part of the problem begins - I added all folders with save games to the whitelist. This is mostly reasonable, as I don't need long term backups of them, and if I lose saves from campaigns I already finished, it's no big deal.

Unfortunately whitelist has no good way to tell them apart from saves (and mod folder) for any ongoing campaigns, so here's failure number one.


I've noticed that I had way too many old versions of various games installed, so I decided to clean them up - there's zero risk in deleting installed applications, so it was a routine thoughtless operation.

While uninstalling some old version of Crusader Kings 2, just another confirmation popup happened, which I automatically replied with a yes, and then it deleted my whole user directory with all my saves and everything else.

This is unacceptable UX on so many levels:
  • Surprise popups should never ask to delete user data - it should either never happen, or be a checkbox user must explicitly choose. It is completely unacceptable.
  • if you ever actually delete user data, use system trash. It is completely unacceptable to use hard delete like it's 1980s and we learned nothing in last 30 years of computing.
If your software does it, just stop writing software until you learn better, because you're causing more harm than good.

So we had 3 failures in a row (one my fault, other two the fault of whoever wrote that uninstaller), but that was still sort of recoverable with undelete process which existed since days of DOS.

I downloaded some software for it - the first one was bait and switch bullshit which would display files it found, but wouldn't actually recover anything. If you write that kind of software, please just kill yourself, there's no hope for you.

Second I found some legitimate recovery software, it recovered the files to second drive, so I thought 4th level of protection worked... and unfortunately they were all filled with zeroes. That confused me, but then I noticed that it was all on an SSD and TRIM command was indeed enabled, so completes the explanation.

Next actions

Historical saves from my past campaigns were nice to have for testing some tools, but I don't care about them terribly much. Recovering settings and mod folder from scratch will take maybe an hour, as it contained a mix of mods from Steam Workshop, downloaded separately, and my own. Annoying, but not a big deal.

What I lost were mostly saves for my ongoing Let's Play CK2 as Islamic State [Modern Times mod] campaign I've been playing on twitch. It got up to the point where the first caliph died, and his underage son inherited Islamic State. It was still quite fun, and I have all the video saved, so I'm going to upload that to youtube soon enough - and in the meantime all 3 sessions are available on twitch.

Even after this loss, I still have 22GB of save files in my folder. If this was OSX, I could just move them to Dropbox and symlink back (size to value ratio is not great, but often doing this brute force is good enough), but that's not terribly reliable on Windows, so I'll probably just delete old ones manually, remove save folders from the whitelist and instead tell the script to copy them all over to Dropbox.

The upside is that this is the biggest data loss I had in something like 10 years. The only other incident was losing about two day's worth of git commits to one repository I apparently forgot to push before formatting old laptop, which also annoyed me greatly.

Two incidents in a decade is pretty much nothing compared to the kind of massive data loss I suffered (to hardware failure) before that twice, and which made me the level of anal about backups you can see.

Monday, August 22, 2016

CSS is uniquely impossible to test

Cat by Adrian Midgley from flickr (CC-NC-ND)

Times change. Back when I started this blog ten years ago serious automated testing was something people have generally heard of, but very few actually did. It was very common for even big projects to have literally zero tests, or if they did they were token tests or at best some regression checks.

Then TDD's shaming campaign happened, and it was even more effective than shaming campaigns against smoking, and now not testing is the exception, and most fights are over what kind of testing is most appropriate.

It was mostly cultural change. Ruby or Java were pretty much just as testable 10 years ago as they are now, but underlying technology changed considerably as well. Just some of such changes:
  • Very low level languages like C/C++ where any bug just corrupts memory at random are extremely hard to test - they're far less popular than they used to be (and the ones that still exist usually have nonexistent or very shitty tests)
  • Languages like Perl which didn't even have working equality and had a lot of context dependence are much less popular - Perl was still possible to test, but it was a bit awkward
  • Headless browsers made it possible to reasonably test javascript
  • jQuery and greater compatibility between browsers made cross-browser javascript testing basically unnecessary
  • Web-based user interfaces are far easier to test than most native interfaces
  • Going all web made cross-OS testing unnecessary, and if you really need them VMs are far easier to setup than ever
  • Application logic in database paradigm mostly died out, and much easier to test application logic in application paradigm is clearly dominant now
  • Complex multithreading never got popular, and it's more common to have isolated services communicating over HTTP or other messaging
  • Cloud makes it much easier to replicate production setup in test environment for reliable system-level testing
  • All languages have a lot more testing libraries, so things like mocking network or filesystem communication which used to be massive pain to setup are now nearly trivial.
  • There are now ways to test with multiple browsers at once, even if it's still not quite as simple.
And yet, one technology from dark days before testing is still with us, and shows no sign of either going away or becoming testable. CSS.

Let's just cover a few things which would be difficult to automatically validate, and in theory they ought to be possible to automate, but there are no good ways to do that:
  • Site works with no major glitches on different browsers. Any major difference should be flagged, but what counts as "major" difference would probably need somewhat complex logic in testing library.
  • Site looks reasonable on different screen sizes. There will be differences, and testing library would need to contain a lot of logic to determine what's fine and what's not. Some examples would be maximum/minimum element sizes, no content missing unless specifically requested to be hidden, no content cut by overflow, no horizontal scrollbars etc.
  • All CSS rules in your application.css are actually used. It seems everybody's CSS accumulates leftovers after every refactoring, and with some browser hooks it ought to be possible to automatically flag them.
  • When you do CSS animations, start and end state show what they ought to. Even disregarding transitions. Some kind of assertions like "X is fully visible and not covered by any other element or overflow: hidden", "Y cannot be seen" would be great, but they're not easy to do now.
As far as I can tell there's been minimal progress in ten years. There are some attempts at CSS testing, but their tests are far too low level, don't address real needs, and as a result nearly nobody uses them.

I don't have any solutions. Hopefully over next few years it will get better or we'll replace CSS with something more testable.