I have an array with 10,000 items :-
For each item in the array I insert it into the database using a for loop :-
for (i = 0; i<= array.size(); i++)
// insert into database array[i]
This takes a long time for such a big array so I built 10 mini-arrays each one having 1000 items. The first one 1-1000, second : 1001 - 2000 and so on.
My question is, how do I make the code read all arrays at the same time but each array independently ? Do I need to use threads which each one will kick-off the insert into the database independently of the arrays?
In other words, I wish for items 1-1000, 1001-2000, 2001-3000 etc. to all be inserted into the database at once, i.e. parallelism of the insert. The order of sequence is not important.
Any ideas please ?
Last edited by ami; 07-20-2009 at 06:52 PM.
You got a slight problem here. Threads are very likely not to give you any benefit at all here unless you have a multicore machine *AND* your threads are dispatched on those different cores. So if you have ten cores and split your array into 10 equal parts you could possibly achieve a speedup of ten (minus the overhead you will encounter!). If you run on a single core you most likely will actually have a performance decrease due to overhead and context-switches.
Also bear in mind, that there are other factors that influence your perfomance:
- network speed/availability
- database load
If you have a multicore system, then you can try the TBB library (Threaded Building Blocks). Your problem seems to be a good fit for the use of parallel containers and parallel execution primitives, but you gotta read up on that. I believe you can download the library from IBM.
I guess the main point I wanna make is that "multiple threads equals performance improvement" is actually not always true (in my experience more often not).
I suspect your bottleneck is your Database anyway so any clever threading won't do much good, but without more information I can't tell more...
Life is a short warm moment -
Death is the long cold rest.
right, what could be optimal is a batch insert command on the database side so you can sent it 1 message with like 100 items in it, to reduce both bandwidth and database processing time. IF you have access to the database at that level.
Or if you could send the file to the server and command the database to load the file directly, without the network overhead of one packet at a time (very likely its the thousands of packets that are slow).
Thanks for the information.
I cannot do BCP because this causes all page locking and other processes are also inserting to the same page at run-time.
I can only insert one row at a time as I want to have row-level locking. I'm using Sybase.
Any other suggestions please ?
Last edited by ami; 07-21-2009 at 05:52 AM.
First, I would start with a single-threaded design and see how long it takes to insert 10,000 rows. It shouldn't take very long, unless there are performance bottlenecks that can be optimized. If 10,000 rows take to long, I'd consider splitting them to two threads (certainly not 10) and see if this makes any difference. Threads alone will not change much unless you have a muticore machine, as others have noted. In addition, you may well discover that the latencies, if any, have nothing to do with parallelization. They usually stem from network bottlenecks, encryption, security checks and system load (on either side).
By letianrong in forum Java
Last Post: 09-19-2005, 06:37 AM
By martinfr79 in forum Java
Last Post: 05-15-2005, 11:18 AM
Last Post: 02-16-2002, 11:51 AM
By Dhiraj Nilange in forum Java
Last Post: 12-10-2000, 03:59 PM
By Sam Goldberg in forum Java
Last Post: 09-26-2000, 10:52 AM
Top DevX Stories
Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL