Real Time application
I need to write a Real Time application which listens on a particular queue and whenever there are new enties on the queue, the messages would need to be uploaded onto the database.
A couple of question please :-
1. Can someone please suggest a starting point, i.e. assuming I connected to the queue, how do I do the constant listening ? Can I use loop for ever (until the process is killed manully) ?
2. Since this needs to be a real time application perfomance is crucial. At any given time, there will be at most 10,000 messages and at the very least 0 (empty queue). Assuming there were 10,000 messages to process, instead of looping over them and get each in turn, is there a better approach to do this ? I'm trying to remove the need for a loop. Is there a way to say, take all 10,000 as one object and insert to the database : possibly using something like : AllMessages << DB.insert() Can someone please help ?
First, what kind of "queue" are you referring to?
I don't answer coding questions via PM or Email. Please post a thread in the appropriate forum section.
Please use [Code]your code goes in here[/Code] tags when posting code.
Before posting your question, did you look here
Got a question on Linux? Visit our Linux sister site.
Modifications Required For VB6 Apps To Work On Vista
Usually, reak time apps don't use a loop that spins forever. That's a recipe for performance bottlenecks. The more common approach is to use signals, interrupts (if you're dealing with hardware directly) or events. The signal wakes up the processing thread, informing it that there's a new item in the queue that needs to be processed.
Similarly, when you have many items waiting in the queue, the common approach is to launch new working threads dynamically so that the load will be split among multiple threads (which are hopefully assigned to different CPUs) and then kill those threads when the queue has been empty for a certain time limit.
So basicially, you need to look into the following issues:
timers, signals (or events, which are the Windows equivalent of signals), signal handlers, and concurrent execution (threads, lightweight processes etc.).
Last edited by Danny; 10-31-2008 at 12:19 PM.
If you can find a way to read and send the entire buffer, or some portion of it (1/2, 1/3, etc) at a time, this is very likely the best way to go (instead of spawning 1000 threads that each handle 1 item or something). Thread/process creation takes some time for the OS to organize and context switching (if more threads than CPUs) is costly and can degrade performance if not managed carefully. Also, since you echo the data, you will have to take care to not have a bunch of processes all trying to grab the same ethernet socket at the same time, and other race conditions may crop up.
As said, you do not want to ever run a tight loop that has no bound, the only use for a tight loop is finite processing (do this 1000 times and stop). A thread is great. A tight loop is good (really, a thread that has a tight loop) IFF you have a blocking read function (for example, cin blocks until the user types text and an enter key) because the blocking will not hog resources and the code only works when there is something to do. If the read is non blocking, you need to limit it.
You have bounds on the read as well, for example there is a finite rate at which your queue can fill, and if you can find out what that is (ethernet bandwidth, filling application delay, or other factors) you can find some amount of time to sleep the thread(s).
In windows, it really helps to launch threads or even the whole program in real time mode, but beware that if your threads > number of processors you will lock out everything else on the system including the mouse and will appear to "lock up" the OS (it is not locked up, its busy, but you can't prove it). So if you have the luxury, on a quad core machine use 3 real time threads, for example, and your application will run very fast while your pc will continue to function. (You can have more, non realtime threads as well, pick and choose with care is the key).
All that to say, I do not know much about databases. I would try to find a way to do the mass send and receive first, and work back from there if it is not possible to do this. I can also comment that if your bandwidth is low, it may be faster to compress the data, send it, and decompress than to send a lot of bytes over a slow link (odds are however you do not have a link that is this bad, we are talking like 9600 baud slow).
Other thoughts... this queue, what is it really? A serial port? Ethernet? Internally generated? Can you set up a DMA for it (basically, drop the data into ram at a know location, for fast access, its a hardware/software interface under windows)? Can you send it directly without the computer (some routers can be "hacked" to do simple tasks such as this by modifying the firmware, and this sounds simple (wrap the data in an insert command and pass it on))?
Last thing, do you have a hard time requirement you can share with us? Round trip under 10ms or the like?
Thanks for all the information.
The queue consists of XML documents consisting of approx 50 tags each. The queue is another team in the company which will have all the XMLs ready on the queue.
Currently, we have a process that reads each XML and directly BCP into the database. This results in approx. 1 XML in 1 second. However, if you have 10,000 XMLs, since we're using a loop it takes a long while before you get the last one...
Thus, this improvement : we want to be able to see the very last one (i.e. the 10,000th XML document) within 5 minutes if we can.
The OS is Unix Sun Solaris.
We wrote something in Java, but since Java isn't really designed for multi-threaded applications, we are thinking about C++ now. In Java it takes approx. 1 hour to process all 10,000 trades.
Any more advice please ?
Java is actually ideal for multi-threaded applications. The Threading is first rate component of the language. Only now is C++ adding threading to the language itslef.
But Java isn't good at all for real time applications. There is a Java derivative for real time.
Good C++ code should give you much better performance over the regular java.
Use POSIX to create a threaded applicatoin. You can implement a blocking queue fairly easily (you can find examples on the web) to let your thread sleep when the queue is empty.
You'll have to implement the queue, so you can controll how the messages are stored in it and retrieved. Use a continuous section of memory to store the messages and copy them off using memcpy in one fell swoop.
You have to implement a circular buffer for the queue, though, so it might take up to two memcpy.
Now that you've given us a more detailed description of the problem, I suggest that before you jump into threads, try to locate the bottlenecks. They may well be lurking in completely unexpected places, such as extensive usage of Java's String objects, parsing using subooptimal regex libraries and suboptimal I/O operations. So what I suggest: try to implement a single threaded C++ app (or a skeleton of that app, which merely processes the XML data and writes it to a database.) I'm not familiar with the DB you're using but there should be methods for writing a bulk of data instead of writing every record separately.
Java is really a bad choice for fast and efficient processing of data. It's good for portability, GUI and other stuff but when you have a CPU intensive task that doesn't require portability and fancy GUI, C++ is simply a better choice.
Thanks Danny for the information.
I'm using Sybase database. Are you saying that even if I have a single thread program I can read all the 10,000 messages of the queue, bulk-write them into the database (which would normally take a a few seconds obviously depending on the number of XMLs) and do a 'sleep' when the queue is empty ?
If I can bulk-write them into the database, do I need to bother with multi-threading at all ?
I suppose maybe I should do multi-threaded because I need to validate each record before it is inserted into the database.
Can you please advise ?
something is broken in your implementations, be it language, a logic problem, brute force algorithm, or something.
There is no way it should take 1 full second to process 1 record of this size, unless I missed something.
Just to make sure, can you drop 1 record here for us to see? And give a quick rundown of your validation if you are allowed? Because 1 second is an eternity on a modern machine and something is not adding up.
Odds are you can easily do 10k items in a dumb loop and in under 5 min, or, there is a very complex requirement that you have not fully explained and need to share with us so we can help streamline it.
Your 5 min givs us 30 ms per record. lets do it !
Thanks Jonnin for the details.
Maybe I didn't explain myself very well...
The issue is that we have a queue and I need to read each record from it. Currently, we have a loop which goes through the queue until it's empty. Once a record is retrieved, we build a XML document by reading each character (we assume that the record on the queue is in the right order). We don't have anyway to read the XML at once, so we read each character at the time until we end up with the proper XML document, for example: <Date>01/11/2008</Date>.
Only once we managed to build the whole XML, we do some validation on it, such as is it a valid XML for start, if yes, do we have 4 digits year for example. If only two digits (e.g. 08), then we padd with 20. We have a total of 50 to 70 tags in each XML and we check each one in turn.
As a result, you can see that reading each character at a time, build the XML, do the validation and then do 'prepare' and 'execute' of a stored procedure which inserts into the database, will add it all up to 1 second per record.
I am trying to find a way to speed this up. I will have to build the XML character by character, but if I can find a way to do the validation and bulk-insert into the database (instead of having to execute a stored procedure 10,000 times for each record), that would be good.
I know that generally read a record, validate and insert into a table will be very fast. The problem is, at the moment we're doing a loop, so if each record is 1 second or even 1/2 a second, then by then time the 10,000th record in on the database, then 5000 seconds (i.e. over 1 hour) will be spent.
Does this make sense ?
Last edited by ami; 11-01-2008 at 08:53 PM.
ok, yea that makes sense.
So are you truly locked into 1 char at a time, or is that just how it currently is?
reading one character as a time is very slow, but I suspect that the validation itself is the biggest resource hog. This task is exactly where concurency would be useful: you can have one thread read a raw record, and another thread validating the previously record at the same time.
If you can somehow buffer the validated records in a vector or some other RAM buffer, and then bulk-write all of them (or say, 500 records each time), you should also notice a performance improvement.
Here's what I suggest: try to read the records without any sort of validation and see how much time you save by that (for this purpose you want to create a special staging table in the database that will be deleted immediately after testing). This will enable you to corroborate my suspicion that it's the validation which takes most of the time, and if that's correct, you should focus your attention on optimizing the validation procedure first.
Thanks for all the information guys.
I'll look into your suggestions and get back to you.
I do however, have one other question please: in order to see whether the queue has any messages on it, I need to override onMessage() method.
The thing is that is written is Java. Can I override a Java method in C++ ? Is it JNI I need to use ?
you can make c++ and java work together, but I have not done it. You should be able to embed a call to a c++ library or assembly language segment (we are not there yet, but the time may come that you need a piece of assembly for this) into java. If not, you may have to use the java to spawn a full process and hook them up, but there *has* to be a way to call other languages/libraries in java else the language would be useless.
I may be missing something: why do you need to override that method? Doesn't it fire automatically when the queue has new messages?
Also, is the XML validation separate from the code that reads the character stream and inserts the XML page into the database? As I said, you need to isolate the validation from the rest of the operations to see where the performance bottleneck is.
By cssriraman in forum VB Classic
Last Post: 05-16-2005, 02:53 PM
By krishna in forum Careers
Last Post: 11-26-2002, 12:05 AM
By Krishna in forum vb.announcements
Last Post: 11-26-2002, 12:03 AM
By mark in forum Database
Last Post: 08-02-2002, 01:11 PM
By chiah in forum VB Classic
Last Post: 02-08-2002, 12:03 AM
Top DevX Stories
Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL