Store and instantly access a Billion xml records (HOW TO)
These are my preliminary results obtained storing 'many' XML files whose record
size matches the drive cluster size.
Format as big a hard drive as you need (or can) at 512 byte clusters with
NTFS and turn the compression on. In the event your xml files exceed this,
adjust the cluster accordingly, just be sure to sample your files to find
the best size needed to store the vast majority of the records in a single
Next, path encode your primary keys to distribute your files in a hierarchical
tree. In my case, I had two different kinds of records:
F:\Mailboxes\ZIP\ZIP4\Street (notice that 'ZIP\ZIP4\Street' is a primary
F:\Records\abc\def\ghi\jkl ('abc\def\ghi\jkl' is the primary key)
Initial sampling results in 63MB I had 56MB on disk with 51,000 files. In
other words you should be able store as many records as there are clusters
on the disk--that's right, billions with a 'B'.
Creating record objects that persist thier data using MSXML2.DOMDocument,
then spawn PublicNonCreatable classes that persist based on an element of
the xml tree, for example:
Friend Sub Constructor(hElement As MSXML2.IXMLDOMElement)
Your business logic/interface then writes to that DLL and never (directly)
touches XML. Best part, you can persist and instantly access a record among
billions. Also, it's possible for an end user to search the text from the
the top '\\share\xml\' directory with explorer, and then be able to view
the record in an understandble way. Binary files could accompany the xml
data file in the directory.
The downside is that you have to forget about sequential access altogether
since the primary keys must be stored hierarchically.
Instead, code new objects that keep lists of the things you need. If you
still have to go the RDBMS route to SQL a recordset, make sure the primary
keys can be encoded and decoded. If your primary keys are random long integers,
format them at a fixed length (Format$(lnPrimaryKey&, "000000000000"), then
split up into directories like
Top DevX Stories
Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL