UTF-8

Russ Cox has a pretty good post about UTF-8. I basically knew the rough concepts but didn’t know that it can be treated as ASCII for many transformations, so that stuff like splitting text into lines, etc., can be done really fast.

What Programmers Should Know About DNS

I’m not an expert about DNS, but I learned a few things over the years by talking to people like Spider, and have noticed that there are a lot of misconceptions among programmers. Here’s a quick brain dump:

  • DNS is distributed and fault-tolerant. It’s not very likely to fail.
  • In order to be faster, DNS queries are cached by your ISP and your computer. If you don’t want your DNS name to be cached because you want to point it to another IP very quickly, you can set the TTL to one second.
  • Most ISPs have crappy DNS servers.
  • DNS supports different record types. The record type you resolve when surfing the internet is an A record. Mail servers, on the other hand, ask for MX records. There are a lot of different record types.
  • DNS names can be specific, as in www.example.com, or wildcard, as in *.example.com. If no specific name is found for a DNS query, then you get the results for the longest matching wildcard.
  • A DNS query can return many IP addresses for a single name. This can be used for load-balancing, by having a web server return a bunch of IPs in random order for a single domain name, in which case your browser will try to connect to them one at a time until it succeeds.
  • A DNS query for SRV record can return a list of other DNS names and ports. This gives you a layer of indirection that can be used to hardcode a DNS name in your client program, which can then be used at run-time to figure out the real server and port to which it will connect. This is why you don’t have to enter the server address in Skype, Yahoo Messenger or Windows Messenger.

Good Windows SSH Client

I like my SSH clients to be console applications, not GUI programs, so that I can script them easily. I’ve been using OpenSSH for Windows for a while now, and while it works, it has one major problem: its support for public key authentication is broken on Vista+. (Don’t ask.)

I recently found CopSSH. It’s basically a continuation of OpenSSH for Windows which hasn’t seen new builds since 2004. And while CopSSH is a server-oriented release (cannot install just the client), if you install it and then delete the service and the user account it creates, what you are left with is a perfectly good SSH client.

The only major drawback I found is that it looks for your settings directory (.ssh) in it’s install location instead of your user profile folder, but since I’m the only user of my computer that’s easily fixed with a single symbolic link.

Crafting Subtle & Realistic User Interfaces

Awesome post from Mike Rundle about how to design visual elements for your programs. I like the fact that he breaks things down so that even esthetically-challenged programmers like me can replicate these visual tricks.

Java GIT Performance

Beating a dead horse here, but here’s another text about about the woes of data structures done in a language without value types. If I were Sun I’d swallow my pride and start copying features from other languages and runtimes. They have an absolutely awesome JIT compiler and a very good garbage collector, but it’s time to come out of hibernation and start evolving both the runtime and the language.

Windows Team Needs to Celebrate

Irradiated Software has implemented Aero Snap from Windows 7 on a Mac. I believe this is the first time I’ve seen somebody copy a Windows UI element on a Mac.

If people on the Windows team are looking for confirmation that they’ve done an awesome job, this is it.

Java/C++ Performance, STL, Strings

Many years ago I wrote Why Java Will Always Be Slower than C++. As any article that says something “bad” about somebody’s favorite programming language, this one was controversial, resulting in a torrent of email spit being sent my way.

The problem mostly boils down to the Java bytecode not having support for “value” types (that’s a wrong name, a better one would be “embedded” types) except primitive types like int and double. Closely related to that is JVM’s lack of support for generics. Both things were proposed by Gosling in his (now deleted) paper on numerics, but were never implemented.

I got reminded of my article yesterday why reading the page Why We Chose C++ Over Java in the Hypertable wiki. They needed some large data structures and Java came out 2-3 times slower than C++ on their tests. A comment on that page about microbenchmarks by vicaya hits the nail right on the head here.

One thing I got reminded about while reading that text is how awesome STL is if you are obsessed with performance. Yes, the begin()/end() thing is full of pain. Yes, iterators need to be replaced [pdf] with ranges. But if you have a large data structure that you want to pack as tightly as possible in order to better use your caches and access it as “lightweightly” as possible, it’s hard to beat STL.

(Example: I find myself writing the following pattern of code over and over: You have a large dictionary. If a key exists in that dictionary, you want to tweak the corresponding value slightly. If it doesn’t, you want to create a new key/value pair. With .NET Dictionary this is a TryGetValue call followed by a write to the dictionary. With STL if the value exists the result of the find is an iterator that points to it and you can tweak it in-place instead of traversing the path to it with an independent write.)

And right after reading the Hypertable article I looked at some C++ code, and got reminded how much C++’s strings suck (talking about performance here, but they do suck on so many different levels). Java and .NET basically get it right by making strings immutable and then giving you a StringBuffer/StringBuilder class that represents a mutable string. C++ strings are mutable with copy-on-write, and have an absolutely awful performance profile. Back in ‘90s there was a proposal to make strings immutable and use Ropes for mutable string buffers, but the C++ committee didn’t want to break compatibility. Big mistake.

Free Belgrade Exchange Data on TickTail.com

I provided some seed capital for TickTail, so it was awesome to see it launch yesterday.

For those of you interested in financial markets, TickTail provides free real-time data from the Belgrade Exchange.

Unlike the exchanges in the developed world, Belex is fairly thinly traded. Nevertheless it does provide some awesome opportunities for a small(-ish) investor.

I’ll write a post at some later time about the underlying technology, as I think it’s pretty slick.

Muscle Injury and Healing

Pretty good article. Loads of references.

From C# to F# – Part 10

Today we’ll talk about a super-awesome feature of F# called Discriminated Unions.

From time to time you want to put N types under a single cap. The standard way to do this in object-oriented languages to is to create a common interface and then implement that interface in a number of classes. That’s perfect if N is an open-ended number and if those types have something in common. But sometimes the types are completely unrelated and OO doesn’t help.

Here are some examples:

1. Binary tree. Some nodes are parent nodes, some are leaves. The two have nothing in common. Sure you can tack on the visitor pattern to make the tree walk easier, but if you have to actually manipulate the tree in some way then interfaces don’t help and you need to know whether a node is a parent or a leaf.

2. XML. Your node may have a text children, or other nodes, etc. Not much in common.

3. Language syntax trees. What does an if node have in common with a variable declaration node?

Etc., etc. In those cases what you really want is to have a collection of types (LeafNode and ParentNode in the case of a binary tree) placed under a single type (let’s call it BinaryTreeNode) that can be one or the other, but nothing else.

The way to do that in F# is using discriminated unions. Say our tree is supposed to store integers. Each ParentNode will contain two references to child BinaryTreeNodes, and each LeafNode will contain an integer. The way to write all that in F# is:

type BinaryTreeNode = LeafNode of int | ParentNode of BinaryTreeNode*BinaryTreeNode

(In case you forgot BinaryTreeNode*BinaryTreeNode is a type signature for a pair of BinaryTreeNodes.)

This should be enough for you to try and tease apart the general syntax:

type <union name> =
    <union name member1> of <argument type> |
    <union name member2> of <argument type> |
    ...

To construct a variable of type BinaryTreeNode, you simply write the union member name followed by the argument:

let (x : BinaryTreeNode) = LeafNode 5

of course, the type declaration in the above example is unnecessary, I just wanted to make it clear that when you write “LeafNode 5” the resulting type is BinaryTreeNode. Usually you just write:

let x = LeafNode 5

To tease apart a variable that’s a discriminated union, we use pattern matching, demonstrated below with a function that computes the sum of all integers in a tree:

let rec sumOfAllNodes node =
    match node with
    | LeafNode x -> x
    | ParentNode (child1, child2) -> sumOfAllNodes child1 + sumOfAllNodes child2

and then we can test this pup:

let test1 = sumOfAllNodes (LeafNode 5)

let test2 = sumOfAllNodes (ParentNode (LeafNode 5, LeafNode 6))

Of course, as it is right now our tree needs to be perfectly balanced, with each parent having two children. We can correct that by adding a third member of the union, call it NoNode, which we’ll use to specify that a node is a missing:

type BinaryTreeNode =
     LeafNode of int
     | ParentNode of BinaryTreeNode*BinaryTreeNode
     | NoNode

As you can see from NoNode, an union member doesn’t have to take arguments, so NoNode is not decorated with of <type>. Thus we can construct an empty tree simply by saying:

let emptyTree = NoNode

With the above addition to BinaryTreeNode, the compiler will complain that we are not handling NoNode in our sumOfAllNodes function, so we need to correct that:

let rec sumOfAllNodes node =
    match node with
    | LeafNode x -> x
    | ParentNode (child1, child2) -> sumOfAllNodes child1 + sumOfAllNodes child2
    | NoNode -> 0

And now you know pretty much everything about discriminating unions.

Discriminating unions are so delicious that most functional languages have managed to be completely free of NullPointer exceptions. Pointers, by default, are never null. If you want to express the fact that a value may be missing, you use a discriminating union between your type and None. When you want use this type, the language forces you to pattern match between the two possibilities and handle all the cases properly. More here.