Click to See Complete Forum and Search --> : pk's - integers vs GUID's - Part 2


Greg Longtin
05-09-2000, 09:10 AM
To All,

I finally converted a system to GUID's instead of integers for pk's.
The main test procedure involved seven SQL statements, at least six
containing joins.

For those unfamiliar with GUID's (globally unique identifier), they are
a 16 byte code generated by the OS. They should be unique across all
computers in the world. I wanted to use them instead of integers to
allow easier 'off-line/replicated' use, since using integers typically
requires all sorts of allocation by user/system/server hocus pocus.
Also, no db call is needed to retrieve the next id.

Bottom line - query time was essentially the same. Over the years,
since integers were supported, I've seen many references that implied
that they are much faster than strings. Depending on the application, I
think that is far from the truth.

Greg Longtin


Two notes -

1. The integer version used Set collate to "General". I've seen
articles that "General" can take longer than "Machine". The GUID
version had to use "Machine", and character fields were indexed using
Upper(). This was my first foray into "strings for pk's" (2.5 was along
time ago...), and I was surprised that collate affects joins. IOW, even
if pk/fk indexes are created with "machine" collate, and the fields
created with NOCPTRANS, collate still affects their use in the SQL join
'on' clause. This seems stupid. I hope the VFP team is considering
changing this in v7. IOW, any field created using NOCPTRANS should use
a collate sequence of "Machine".

2. Since I am returning an XML resultset, I also needed to convert
GUID's to an XML friendly string. I used a modified Base64 encoding
with BitAnd and BitShift functions. BTW, this results in a 24 byte xml
friendly id. This was an additional burden that the integer version
didn't have, since a UDF existed in the 'GUID' SQL statements. The
integer version converted the integer to a string, then appended a
character to it because xml id's cannot start with a numeric character.
This was be done inline (vs UDF).

David Frankenbach
05-09-2000, 10:00 PM
Greg,

You left a couple of SIGNIFICANT factors out of your post. What are your
table sizes? Are you dealing with hundreds of thousands to millions of row
tables? Are these tables on the machine disk drive or on the network?

As far as the XML goes it sounds like you are using the field value as the
tag name. Why? I think it makes far more sense to just use
<PrimaryKeyFieldName>PK Value</PrimaryKeyFieldName> as one tag of the "row"
in the document.
--
df - (Microsoft FoxPro MVP) http://www.geocities.com/df_FoxPro/

"Greg Longtin" <longtin@att.net> wrote in message
news:39180e3f$1@news.devx.com...
> To All,
>
> I finally converted a system to GUID's instead of integers for pk's.
> The main test procedure involved seven SQL statements, at least six
> containing joins.
>
> For those unfamiliar with GUID's (globally unique identifier), they are
> a 16 byte code generated by the OS. They should be unique across all
> computers in the world. I wanted to use them instead of integers to
> allow easier 'off-line/replicated' use, since using integers typically
> requires all sorts of allocation by user/system/server hocus pocus.
> Also, no db call is needed to retrieve the next id.
>
> Bottom line - query time was essentially the same. Over the years,
> since integers were supported, I've seen many references that implied
> that they are much faster than strings. Depending on the application, I
> think that is far from the truth.
>
> Greg Longtin
>
>
> Two notes -
>
> 1. The integer version used Set collate to "General". I've seen
> articles that "General" can take longer than "Machine". The GUID
> version had to use "Machine", and character fields were indexed using
> Upper(). This was my first foray into "strings for pk's" (2.5 was along
> time ago...), and I was surprised that collate affects joins. IOW, even
> if pk/fk indexes are created with "machine" collate, and the fields
> created with NOCPTRANS, collate still affects their use in the SQL join
> 'on' clause. This seems stupid. I hope the VFP team is considering
> changing this in v7. IOW, any field created using NOCPTRANS should use
> a collate sequence of "Machine".
>
> 2. Since I am returning an XML resultset, I also needed to convert
> GUID's to an XML friendly string. I used a modified Base64 encoding
> with BitAnd and BitShift functions. BTW, this results in a 24 byte xml
> friendly id. This was an additional burden that the integer version
> didn't have, since a UDF existed in the 'GUID' SQL statements. The
> integer version converted the integer to a string, then appended a
> character to it because xml id's cannot start with a numeric character.
> This was be done inline (vs UDF).
>

Greg Longtin
05-10-2000, 10:05 AM
David,

Your post strikes me as confrontational. Why? Have you ever tried what
I mentioned? In advance, please accept my apology if I misread your
post. I am simply trying to point out that the somewhat commonly held
view that integers are the only field type usable for pk's maybe
incorrect. I wanted GUID's for their 'globally' guaranteed uniqueness.


> You left a couple of SIGNIFICANT factors out of your post. What are
your
> table sizes? Are you dealing with hundreds of thousands to millions of
row
> tables? Are these tables on the machine disk drive or on the network?

I tried to not state anything in *absolute* terms. No, I'm not using
millions of rows. I'm not sure what that has to do with the fact that
my queries weren't slower using GUID's. I hoped to make the point that
it didn't cause me a performance hit, although others may find it does.
My drives are on the machine. I suspect my system has enough RAM to
load all the tables and indexes. If it didn't, U2W 10K SCSI drives
would probably remove that variable's effects. Why would anyone bench
something like this using remote/networked drives? Did I say something
stupid?

It's a server app returning XML data sets. My query set involved seven
tables, multiple SQL statements, and thousands of records. It returns
XML consisting of two main heiarchical data sets joined many-to-many by
a third set which contains the pk pairs and also data. If there's any
other info you'd like, please inquire.


> As far as the XML goes it sounds like you are using the field value as
the
> tag name. Why?

I didn't mean to imply that and I am not.


> I think it makes far more sense to just use
> <PrimaryKeyFieldName>PK Value</PrimaryKeyFieldName> as one tag of the
> "row" in the document.

I generally declare pk's as id, and fk's as idref or idrefs, or
undeclared. Most XML implimentations require id's to be attributes, not
elements as you've shown. I've seen some discussions about that
changing, but I'd don't live to keep up with W3C docs and/or
discussions. I would declare pk's as such to quickly filter/join on the
client using XPath, XSLT, and/or DOM. For this app, the 'server' is
VFP, the clients are whatever.


Greg Longtin