You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gist is that the ustring::strhash(str) function is modified to
strip out the MSB from Strutil::strhash. The rep entry is filed in
the ustring table based on this hash. So effectively, the computed
hash is 63 bits, not 64.
But rep->hashed field consists of the lower 63 bits being the computed
hash, and the MSB indicates whether this is the 2nd (or more) entry in
the table that had the same 63 bit hash.
ustring::hash() then is modified as follows: If the MSB is 0, the
computed hash is the hash. If the MSB is 1, though, we DON'T use that
hash, and instead we use the pointer to the unique characters, but
with the MSB set (that's an invalid address by itself). Note that the
computed hashes never have MSB set, and the char*+MSB always have MSB
set, so therefore ustring::hash() will never have the same value for
two different ustrings.
But -- please note! -- that ustring::strhash(str) and
ustring(str).hash() will only match (and also be the same value on
every execution) if the ustring is the first to receive that hash,
which should be approximately always. Probably always, in practice.
But in the very improbable case of a hash collision, one of them (the
second to be turned into a ustring) will be using the alternate hash
based on the character address, which is both not the same as
ustring::strhash(chars), nor is it expected to be the same constant on
every program execution.
Signed-off-by: Larry Gritz <lg@larrygritz.com>
0 commit comments