Ivan Gevirtz

The only grandparent I ever knew was a little old lady who lived in the Queens borough of New York City. "Nana," as I affectionately called her, was short, slight, and very old world. Or at least that was how she seemed to my childhood self. She came to America when she was 12 via Ellis Island in the early 1900's. She came alone, and her family remained behind in their sweet little village in Hungary. "Aha," you say, "That explains it. You always do eat a lot!"

Hungary is also the birthplace of Microsoft's former Chief Architect, Charles Siminyi. I don't know if he shares my appetite, but can safely infer he shares my passion for well constructed, defensively coded sources. Famously, he's also the inventor of Hungarian Notation -- the prepending of variables with letters indicating what kind of variable it was. In my post on Coding Standards, I discuss why I'm not a believer in decorating variables with compiler storage type information.

In weakly typed languages such as C and C++, the amount of storage allocated to intrinsic types such as int varies from system to system. This can lead to bugs and interoperability problems. At my last company, this was a keen problem, as we supported x86, powerPC, xscale, armix and mips processors on 16, 32, and 64 bit platforms running various flavors of Windows and Linux, as well as WinCE/PocketPC. Our core technology was a communications protocol, so not only did we have to deal with sizeof(int) differences, we also had to understand "Big Endian", "Little Endian", "Host Byte Order", and "Network Byte Order" issues.

Network and Host

In order to be sure our code handled type sizes and endian-ness carefully and consistently, we defined and exclusively used our own types. We used the usual nomenclature of u8, s8, u16, and so on. Everything internally was handled using host byte order, and was converted to network byte order before going on the wire. Converted data was only stored locally, and never exported by a function. While we didn't use Hungarian Notation to differentiate between local variables storing wireline or host data, our naming conventions did.

Anorexic Appetite

We were minimal about the kinds of types allowed in our applications. We only allowed the aliases defined in our types.h file, along with standard C modifiers. Typedef's indicated precise storage class and nothing else. Because of this, it was always clear if the compiler was doing a widening operation for us, like expanding a u16 into a u32. And, conversely, we always knew if something was being truncated, and had special #defines to designate we were aware of the down-cast. This strict type convention made it easy for us to make portable wire and file information. The size of a struct was never in doubt.

Going Negative

While I believe that types should be more descriptive than just storage class, I'm quite certain that types should be precise in their signedness. C basic types such as int and char are, by default, signed. And because programmers are lazy, most things are defined to be int and char, not unsigned int or unsigned char. This is unfortunate, because much of the world is not at all negative. Some astute developers have pointed out that if your values are ever getting close to using that last bit, that you should just use a bigger type. And, indeed, they are often correct. However, there are cases where you can't do that, including fixed structure or wire sizes. But, more importantly, they are missing the point. The point is that the code is not being clear on what it is doing. Someone can't have a negative age. The syntax does not match the semantic.

At this point you may want to argue that negative values are often used to indicate errors: -1 means the person is dead, -2 means the baby hasn't been born yet, -3 means age is meaningless when you're a vampire. Talk about layering on opacity! A better solution would be to use enumerated types:

enum { AgeDead = MAXUNUM - 0, AgeNotBorn = MAXUNUM - 1, AgeUndead = MAXUNUM - 2 }

And in the code, while doing the math to determine my tree's age, I can ASSERT(computedAge<(MAXUNUM -TooCloseForComfort)). That way, I can determine while debugging if I ever get too close to the size range, and can increase the storage type.

To int or not to int...

One day, a particularly pugalistic coder, thirty3, was working on our logging system. He found that people often would put variable values, such as memory addresses and counters, in their debugging messages. And he discovered that this broke his logging system, especially on 64 bit systems. Thirty3 realized that people would develop for the 8 bit Arm platform, and would set their types accordingly. When he tested on 64 bit systems, the values passed in were too big for the fixed size buffers! Puzzled, Thirty3 asked me, "Wasn't our strict use of user-defined and well specified types supposed to prevent this?"

In praise of size_t

We discussed the problem, and realized that there were times when what we really wanted was a plain ol' int. We wanted to be able to just use the machine's native word size, especially as a counter or to hold a pointer. What we needed was size_t.

My grandmother would have appreciated size_t. She spoke Hungarian, Yiddish, and English -- languages that vary widely in their syntax, compactness, and expressive power. Ane would quickly mutter something in Yiddish, and then spend several minutes paraphrasing the complex, nuanced thought in English. And in her native Hungary, no one she knew had much money. They could store their money in an unsigned char. But here in America, everyone was rich, and you needed an unsigned int or unsigned long. I bet she would never realized that, for an increasing number of Americans today with credit cards, you'd actually need a signed long to store their net worth!

I am Hungry

By Ivan Gevirtz

created: Thursday, November 17, 2005

updated: Wednesday, September 16, 2009