Getting a large number of identities


DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 13 of 13

Thread: Getting a large number of identities

  1. #1
    Colin McGuigan Guest

    Getting a large number of identities

    Here's the situation:

    I have two tables (Orders and Payments), that look like this (summarized
    version, obviously):

    Orders
    ------
    OrderID (identity)
    PaymentID (foreign key)

    Payments
    ------
    PaymentID (identity)
    OrderID (foreign key)


    There are ~100,000 rows in the Orders table; Payments is blank. What I want
    to do is create a payment record for each order, and then update the
    Orders.PaymentID with the appropriate Payment.PaymentID, so my code is like
    this:

    INSERT INTO Payments(OrderID)
    SELECT OrderID FROM Orders
    WHERE PaymentID IS NULL

    UPDATE Orders
    SET Orders.PaymentID = Payments.PaymentID
    FROM Orders
    INNER JOIN Payments ON Orders.OrderID = Payments.OrderID

    The code is fine; it works. The problem is that it works very, very slowly.
    The INSERT is relatively quick; the UPDATE (because it joins two 100,000+
    tables) can take hours or days.

    Anyone know of any way I can speed this up, or any alternate way of doing
    this that might be faster?

    --
    Colin McGuigan



  2. #2
    Joe \Nuke Me Xemu\ Foster Guest

    Re: Getting a large number of identities

    "Colin McGuigan" <cmcguigan@imany.com> wrote in message <news:3cb30f37@10.1.10.29>...

    > Here's the situation:
    >
    > I have two tables (Orders and Payments), that look like this (summarized
    > version, obviously):
    >
    > Orders
    > ------
    > OrderID (identity)
    > PaymentID (foreign key)
    >
    > Payments
    > ------
    > PaymentID (identity)
    > OrderID (foreign key)
    >
    >
    > There are ~100,000 rows in the Orders table; Payments is blank. What I want
    > to do is create a payment record for each order, and then update the
    > Orders.PaymentID with the appropriate Payment.PaymentID, so my code is like
    > this:
    >
    > INSERT INTO Payments(OrderID)
    > SELECT OrderID FROM Orders
    > WHERE PaymentID IS NULL
    >
    > UPDATE Orders
    > SET Orders.PaymentID = Payments.PaymentID
    > FROM Orders
    > INNER JOIN Payments ON Orders.OrderID = Payments.OrderID
    >
    > The code is fine; it works. The problem is that it works very, very slowly.
    > The INSERT is relatively quick; the UPDATE (because it joins two 100,000+
    > tables) can take hours or days.
    >
    > Anyone know of any way I can speed this up, or any alternate way of doing
    > this that might be faster?


    Why have orders point to payments when payments already point to orders?
    Isn't that redundant? Could you have problems later if there might ever
    be multiple payments for a single order? (Of course, you'd be equally
    damned if payments later might need to apply to multiple orders...) Is
    the first query below really that much faster than the second or third?

    select orderid
    from orders
    where paymentid is null

    select orderid
    from orders left join payments on orders.orderid = payments.orderid
    where payments.orderid is null

    select orderid
    from orders
    where not exists
    (select * from payments where payments.orderid = orders.orderid)

    --
    Joe Foster <mailto:jlfoster%40znet.com> Got Thetans? <http://www.xenu.net/>
    WARNING: I cannot be held responsible for the above They're coming to
    because my cats have apparently learned to type. take me away, ha ha!



  3. #3
    Colin McGuigan Guest

    Re: Getting a large number of identities


    "Joe "Nuke Me Xemu" Foster" <joe@bftsi0.UUCP> wrote in message
    news:3cb3a571@10.1.10.29...
    > "Colin McGuigan" <cmcguigan@imany.com> wrote in message

    <news:3cb30f37@10.1.10.29>...
    > Why have orders point to payments when payments already point to orders?
    > Isn't that redundant? Could you have problems later if there might ever
    > be multiple payments for a single order? (Of course, you'd be equally
    > damned if payments later might need to apply to multiple orders...) Is
    > the first query below really that much faster than the second or third?


    Because it's a spurious example that I concocted for this post. It mirrors
    the problem I'm really dealing with, but is easier to explain. =P

    > select orderid
    > from orders
    > where paymentid is null


    That is quick (though displaying the results isn't).

    > select orderid
    > from orders left join payments on orders.orderid = payments.orderid
    > where payments.orderid is null


    That is quick before Payments is filled, very, very slow afterwards.

    --
    Colin McGuigan



  4. #4
    Joe \Nuke Me Xemu\ Foster Guest

    Re: Getting a large number of identities

    "Colin McGuigan" <cmcguigan@imany.com> wrote in message <news:3cb59737$1@10.1.10.29>...

    > "Joe "Nuke Me Xemu" Foster" <joe@bftsi0.UUCP> wrote in message
    > news:3cb3a571@10.1.10.29...
    > > "Colin McGuigan" <cmcguigan@imany.com> wrote in message

    > <news:3cb30f37@10.1.10.29>...
    > > Why have orders point to payments when payments already point to orders?
    > > Isn't that redundant? Could you have problems later if there might ever
    > > be multiple payments for a single order? (Of course, you'd be equally
    > > damned if payments later might need to apply to multiple orders...) Is
    > > the first query below really that much faster than the second or third?

    >
    > Because it's a spurious example that I concocted for this post. It mirrors
    > the problem I'm really dealing with, but is easier to explain. =P


    It seems strange that joining those two tables should be so slow, though.
    The indexes are built properly, UPDATE STATISTICS runs after you populate
    or make radical changes to the tables, etc.?

    --
    Joe Foster <mailto:jlfoster%40znet.com> "Regged" again? <http://www.xenu.net/>
    WARNING: I cannot be held responsible for the above They're coming to
    because my cats have apparently learned to type. take me away, ha ha!



  5. #5
    Joe \Nuke Me Xemu\ Foster Guest

    Re: Getting a large number of identities

    "MarkN" <m@n.com> wrote in message <news:3cb6d48d$1@10.1.10.29>...

    > >> select orderid


    Oops, I misspelled "orders.orderid" here...

    > >> from orders left join payments on orders.orderid = payments.orderid
    > >> where payments.orderid is null


    > And you got something back from this? It is the same as
    >
    > select orders.orderid
    > from orders, payments
    > where orders.orderid = payments.orderid
    > and payments.orderid is null
    >
    > Which should never return anything.


    The "left join" is supposed to make it equivalent to this:

    select orders.orderid
    from orders, payments
    where orders.orderid *= payments.orderid
    and payments.orderid is null

    I think that's how the old-style outer-join syntax worked, anyway.

    --
    Joe Foster <mailto:jlfoster%40znet.com> Sign the Check! <http://www.xenu.net/>
    WARNING: I cannot be held responsible for the above They're coming to
    because my cats have apparently learned to type. take me away, ha ha!



  6. #6
    MarkN Guest

    Re: Getting a large number of identities



    >
    >> select orderid
    >> from orders left join payments on orders.orderid = payments.orderid
    >> where payments.orderid is null

    >


    And you got something back from this? It is the same as

    select orders.orderid
    from orders, payments
    where orders.orderid = payments.orderid
    and payments.orderid is null

    Which should never return anything.

  7. #7
    MarkN Guest

    Re: Getting a large number of identities


    "Joe \"Nuke Me Xemu\" Foster" <joe@bftsi0.UUCP> wrote:
    >"MarkN" <m@n.com> wrote in message <news:3cb6d48d$1@10.1.10.29>...
    >
    >> >> select orderid

    >
    >Oops, I misspelled "orders.orderid" here...
    >
    >> >> from orders left join payments on orders.orderid = payments.orderid
    >> >> where payments.orderid is null

    >
    >> And you got something back from this? It is the same as
    >>
    >> select orders.orderid
    >> from orders, payments
    >> where orders.orderid = payments.orderid
    >> and payments.orderid is null
    >>
    >> Which should never return anything.

    >
    >The "left join" is supposed to make it equivalent to this:
    >
    >select orders.orderid
    >from orders, payments
    >where orders.orderid *= payments.orderid
    >and payments.orderid is null
    >
    >I think that's how the old-style outer-join syntax worked, anyway.
    >


    Then for that statement you will not have a joining statement and will probably
    get more records back then you expected (unless that is Oracle's way of outer
    join - then you will get nothing).

    If you meant outer join (and depending on the database - this is the SQL
    discussion - you might have to say 'left outer join') then you it would have
    to be the other way (other than Oracle) without the null. You still would
    get nothing back if you use the 'payments.orderid is null' statement.

  8. #8
    Joe \Nuke Me Xemu\ Foster Guest

    Re: Getting a large number of identities

    "MarkN" <m@n.com> wrote in message <news:3cb6e101$1@10.1.10.29>...

    > "Joe \"Nuke Me Xemu\" Foster" <joe@bftsi0.UUCP> wrote:


    > >The "left join" is supposed to make it equivalent to this:
    > >
    > >select orders.orderid
    > >from orders, payments
    > >where orders.orderid *= payments.orderid
    > >and payments.orderid is null
    > >
    > >I think that's how the old-style outer-join syntax worked, anyway.


    > Then for that statement you will not have a joining statement and will probably
    > get more records back then you expected (unless that is Oracle's way of outer
    > join - then you will get nothing).
    >
    > If you meant outer join (and depending on the database - this is the SQL
    > discussion - you might have to say 'left outer join') then you it would have
    > to be the other way (other than Oracle) without the null. You still would
    > get nothing back if you use the 'payments.orderid is null' statement.


    Thanks for reminding me why I /hated/ the old *= syntax... Anyway,
    I tried my queries out on some small tables in SQL Server, and here's
    what it spit back:


    select Orders.* from Orders

    OrderID PaymentID
    ----------- -----------
    1 (null)
    2 (null)
    3 (null)
    4 (null)
    5 (null)
    6 (null)
    7 (null)
    8 (null)
    9 (null)
    10 (null)
    11 (null)
    12 (null)
    13 (null)
    14 (null)
    15 (null)
    16 (null)
    17 (null)
    18 (null)
    19 (null)
    20 (null)
    21 (null)

    (21 row(s) affected)


    select Payments.* from Payments

    PaymentID OrderID
    ----------- -----------
    1 8
    2 9
    3 10
    4 11
    5 12
    6 13
    7 14
    8 15
    9 16
    10 17
    11 18
    12 19
    13 20
    14 21
    15 (null)
    16 (null)
    17 (null)
    18 (null)
    19 (null)
    20 (null)
    21 (null)

    (21 row(s) affected)


    select Orders.*, Payments.*
    from Orders left join Payments
    on Orders.OrderID = Payments.OrderID
    where Payments.OrderID is null

    OrderID PaymentID PaymentID OrderID
    ----------- ----------- ----------- -----------
    1 (null) (null) (null)
    2 (null) (null) (null)
    3 (null) (null) (null)
    4 (null) (null) (null)
    5 (null) (null) (null)
    6 (null) (null) (null)
    7 (null) (null) (null)

    (7 row(s) affected)


    select Orders.*, Payments.*
    from Orders, Payments
    where Orders.OrderID *= Payments.OrderID
    and Payments.OrderID is null

    OrderID PaymentID PaymentID OrderID
    ----------- ----------- ----------- -----------
    1 (null) (null) (null)
    2 (null) (null) (null)
    3 (null) (null) (null)
    4 (null) (null) (null)
    5 (null) (null) (null)
    6 (null) (null) (null)
    7 (null) (null) (null)
    8 (null) (null) (null)
    9 (null) (null) (null)
    10 (null) (null) (null)
    11 (null) (null) (null)
    12 (null) (null) (null)
    13 (null) (null) (null)
    14 (null) (null) (null)
    15 (null) (null) (null)
    16 (null) (null) (null)
    17 (null) (null) (null)
    18 (null) (null) (null)
    19 (null) (null) (null)
    20 (null) (null) (null)
    21 (null) (null) (null)

    (21 row(s) affected)

    --
    Joe Foster <mailto:jlfoster%40znet.com> L. Ron Dullard <http://www.xenu.net/>
    WARNING: I cannot be held responsible for the above They're coming to
    because my cats have apparently learned to type. take me away, ha ha!



  9. #9
    MarkN Guest

    Re: Getting a large number of identities


    Guess I was somewhat wrong. Although out join and *= aren't the same.

    Ok, now try

    select orderid
    from orders where orderid not in(select distinct orderid from payments where
    payments.orderid is not null)

    Should be the same # of rows as the outer join.


    >> >where orders.orderid *= payments.orderid
    >> >and payments.orderid is null"Joe \"Nuke Me Xemu\" Foster" <joe@bftsi0.UUCP>

    wrote:
    >"MarkN" <m@n.com> wrote in message <news:3cb6e101$1@10.1.10.29>...
    >
    >> "Joe \"Nuke Me Xemu\" Foster" <joe@bftsi0.UUCP> wrote:

    >
    >> >The "left join" is supposed to make it equivalent to this:
    >> >
    >> >select orders.orderid
    >> >from orders, payments
    >> >where orders.orderid *= payments.orderid
    >> >and payments.orderid is null
    >> >
    >> >I think that's how the old-style outer-join syntax worked, anyway.

    >
    >> Then for that statement you will not have a joining statement and will

    probably
    >> get more records back then you expected (unless that is Oracle's way of

    outer
    >> join - then you will get nothing).
    >>
    >> If you meant outer join (and depending on the database - this is the SQL
    >> discussion - you might have to say 'left outer join') then you it would

    have
    >> to be the other way (other than Oracle) without the null. You still would
    >> get nothing back if you use the 'payments.orderid is null' statement.

    >
    >Thanks for reminding me why I /hated/ the old *= syntax... Anyway,
    >I tried my queries out on some small tables in SQL Server, and here's
    >what it spit back:
    >
    >
    >select Orders.* from Orders
    >
    >OrderID PaymentID
    >----------- -----------
    >1 (null)
    >2 (null)
    >3 (null)
    >4 (null)
    >5 (null)
    >6 (null)
    >7 (null)
    >8 (null)
    >9 (null)
    >10 (null)
    >11 (null)
    >12 (null)
    >13 (null)
    >14 (null)
    >15 (null)
    >16 (null)
    >17 (null)
    >18 (null)
    >19 (null)
    >20 (null)
    >21 (null)
    >
    >(21 row(s) affected)
    >
    >
    >select Payments.* from Payments
    >
    >PaymentID OrderID
    >----------- -----------
    >1 8
    >2 9
    >3 10
    >4 11
    >5 12
    >6 13
    >7 14
    >8 15
    >9 16
    >10 17
    >11 18
    >12 19
    >13 20
    >14 21
    >15 (null)
    >16 (null)
    >17 (null)
    >18 (null)
    >19 (null)
    >20 (null)
    >21 (null)
    >
    >(21 row(s) affected)
    >
    >
    >select Orders.*, Payments.*
    >from Orders left join Payments
    > on Orders.OrderID = Payments.OrderID
    >where Payments.OrderID is null
    >
    >OrderID PaymentID PaymentID OrderID
    >----------- ----------- ----------- -----------
    >1 (null) (null) (null)
    >2 (null) (null) (null)
    >3 (null) (null) (null)
    >4 (null) (null) (null)
    >5 (null) (null) (null)
    >6 (null) (null) (null)
    >7 (null) (null) (null)
    >
    >(7 row(s) affected)
    >
    >
    >select Orders.*, Payments.*
    >from Orders, Payments
    >where Orders.OrderID *= Payments.OrderID
    > and Payments.OrderID is null
    >
    >OrderID PaymentID PaymentID OrderID
    >----------- ----------- ----------- -----------
    >1 (null) (null) (null)
    >2 (null) (null) (null)
    >3 (null) (null) (null)
    >4 (null) (null) (null)
    >5 (null) (null) (null)
    >6 (null) (null) (null)
    >7 (null) (null) (null)
    >8 (null) (null) (null)
    >9 (null) (null) (null)
    >10 (null) (null) (null)
    >11 (null) (null) (null)
    >12 (null) (null) (null)
    >13 (null) (null) (null)
    >14 (null) (null) (null)
    >15 (null) (null) (null)
    >16 (null) (null) (null)
    >17 (null) (null) (null)
    >18 (null) (null) (null)
    >19 (null) (null) (null)
    >20 (null) (null) (null)
    >21 (null) (null) (null)
    >
    >(21 row(s) affected)
    >
    >--
    >Joe Foster <mailto:jlfoster%40znet.com> L. Ron Dullard <http://www.xenu.net/>
    >WARNING: I cannot be held responsible for the above They're coming

    to
    >because my cats have apparently learned to type. take me away,

    ha ha!
    >
    >



  10. #10
    Joe \Nuke Me Xemu\ Foster Guest

    Re: Getting a large number of identities

    "MarkN" <m@n.com> wrote in message <news:3cb709e2$1@10.1.10.29>...

    > Guess I was somewhat wrong. Although out join and *= aren't the same.


    I'm sure it didn't help that I don't really understand *= ...

    > Ok, now try
    >
    > select orderid
    > from orders where orderid not in(select distinct orderid from payments where
    > payments.orderid is not null)
    >
    > Should be the same # of rows as the outer join.


    Both of these return the same rows as the left outer join:

    select OrderID
    from Orders
    where OrderID not in
    (select distinct OrderID
    from Payments
    where Payments.OrderID is not null)

    select OrderID
    from Orders
    where not exists
    (select *
    from Payments
    where Payments.OrderID = Orders.OrderID)

    In fact, SQL Server appears to generate the same query plan for
    both of these but does something different for the outer left
    join query. Of course, all of this is meaningless without lots
    of test data, since SQL Server is known to use table sizes and
    index distributions to choose its querying strategies.

    --
    Joe Foster <mailto:jlfoster%40znet.com> Sacrament R2-45 <http://www.xenu.net/>
    WARNING: I cannot be held responsible for the above They're coming to
    because my cats have apparently learned to type. take me away, ha ha!



  11. #11
    Colin McGuigan Guest

    Re: Getting a large number of identities


    "Joe "Nuke Me Xemu" Foster" <joe@bftsi0.UUCP> wrote in message
    news:3cb5ef81@10.1.10.29...
    > It seems strange that joining those two tables should be so slow, though.
    > The indexes are built properly, UPDATE STATISTICS runs after you populate
    > or make radical changes to the tables, etc.?


    Yes. Ah, well, looks like its time to shell out some money for a
    consultant...

    --
    Colin McGuigan



  12. #12
    Joe \Nuke Me Xemu\ Foster Guest

    Re: Getting a large number of identities

    "Colin McGuigan" <cmcguigan@imany.com> wrote in message <news:3cb73510$1@10.1.10.29>...

    > "Joe "Nuke Me Xemu" Foster" <joe@bftsi0.UUCP> wrote in message
    > news:3cb5ef81@10.1.10.29...
    > > It seems strange that joining those two tables should be so slow, though.
    > > The indexes are built properly, UPDATE STATISTICS runs after you populate
    > > or make radical changes to the tables, etc.?

    >
    > Yes. Ah, well, looks like its time to shell out some money for a
    > consultant...


    I wonder if, in the meantime, it might help to break your monster
    UPDATE up into more manageable sections, like this:

    UPDATE Orders
    SET Orders.PaymentID = Payments.PaymentID
    FROM Orders
    INNER JOIN Payments ON Orders.OrderID = Payments.OrderID
    WHERE Orders.OrderID Between 80000 And 99999
    AND (Orders.PaymentID Is Null Or Orders.PaymentID <> Payments.PaymentID)

    UPDATE Orders
    SET Orders.PaymentID = Payments.PaymentID
    FROM Orders
    INNER JOIN Payments ON Orders.OrderID = Payments.OrderID
    WHERE Orders.OrderID Between 60000 And 79999
    AND (Orders.PaymentID Is Null Or Orders.PaymentID <> Payments.PaymentID)

    etc.

    I don't know whether that "AND (...)" part would help you or not.
    Which indexes are clustered, if any? You might have hot-spots.

    --
    Joe Foster <mailto:jlfoster%40znet.com> Sign the Check! <http://www.xenu.net/>
    WARNING: I cannot be held responsible for the above They're coming to
    because my cats have apparently learned to type. take me away, ha ha!



  13. #13
    MarkN Guest

    Re: Getting a large number of identities


    Thanks Joe. I thought they would return the same thing (once I got my mind
    back on track). It would seem the subselect makes it more clear what the
    query is trying to do (at least it does to me).

    I looked it up and *= is for left join and =* is for right join. They are
    'non-standard, product specific'. Which shouldn't matter - what other database
    is there? Well I guess it would to this person - http://news.devx.com/cgi-bin/dnewswe...item=370&utag=

    Mark



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center