testing with gcc 4.6.3 on x86, -Os, the old version does a duplicate null byte check after the first loop. this is purely the compiler being stupid, but the old code was also stupid and unintuitive in how it expressed the check.