Database Architecture - Handling Data for Different Customer Needs
I'm very anal about trying to maintain development standards when designing applications. This puts me in a difficult situation because every company I've worked for let's standards slide (or doesn't care) and especially in regards to data. This compromise is usually attributed to the lack of time that's available but other times due to the complexity of the data.
So, I am again faced with a project that may compromise the relational database model and I would like input from other people who've been in my situation. The situation is that my company currently has several stand alone applications geared towards different customers. Each customer has similar yet different data requirements but we would like to merge all of our separate applications into one and thus house all of this data under one database. I am perfectly o.k. with this concept but what is the best approach for designing the central database?
For example, we have a table for entering time sheets. Company A's table may look like:
Customer Number, Employee Name, Hours, Equipment Used, Description, Work Expense, Back Charge, Additional Charge.
However, Company B's table may look like:
Customer Number, Employee Name, Hours, Equipment Used, Description, Additional Charge.
My boss would like to make an aggregate table that looks like:
Customer Number, Employee Name, Hours, Equipment Used, Description, Desc1, Desc2, Desc3, Charge1, Charge2, Charge3.
Obviously, doing it his way would mean that all of the last columns for Company A would mean something entirely different for company B within the same table(entity). I've ran into this situation before and it became quite confusing when fixing data issues. Also, this situation requires a lot of upfront data checking by the UI and prohibits using solid abstract layering standards. On the other hand, this lowers the amount of tables needed and makes updating a cinch (possibly).
Is there a better way to design this?
I worked with one company that did something similar where they had attribute 1 to attributeX columns with a type field. Depending on the type, attribute1 meant something different in record 1 than 2. This was a mess because all of these fields were integers and required complex joins in order to bring in the descriptions. Then, if we needed to add another bit of info for company A, we had to add another attributeX field and re-program all of the UI's to handle this change.
Thank You for your input.
An option could be maintaining an index i.e. table with all the field names and a separate table containing the client specific name and the corresponding master field id. Use a third table to store the actual data corresponding to the client specific field id.
Similar issue - how to make it efficient & flexible
I am in a similar situation, yet a little more tricky maybe. I am working for several organizations, who have several databases with people. From their internal workers, to members, newsletter subscribers, partners,... all in all, quite a mess since everyone is using different tools, and no one is really happy about they have.
So I went on a quest of finding an address book sort of web app, that would be simple to use, and extremely flexible/scalable to reach all the needs. Mission Impossible, I didn't find anything. We've decided to launch our own Open Source project, and I am currently designing the web app.
But I am running in the same problem as you are, the whole point of it is that each administrator can build his own data structure from the user interface, without any coding. But my plan is to go further, allowing the use of plugins to define data types, and means of exporting the data. This would allow you to build all kinds of systems fitting the needs.
If you want to know more about it, you can read about it on our sourceforge project page, project named "adres"
But here is the design challenge !
To keep it flexible, I would like to let the plugin be responsible for his set of data. The plugin becomes therefore a black box. It can process a search, return a string to display it's value in a grid,... plugins are PHP classes that will extend an Interface called Field.
Now, it's much better to let MySQL do the filter and the sorting rather than trying to do it in PHP since MySQL is optimized for that. But plugins are PHP black boxes. So if I wanted to display all the data, I have to go through each contact, and for each field call the getString function.... That's plenty of SQL requests.
Better, I can ask a plugin to send me all the values for a certain Filter. In that way I can extract the datas column by column. But it makes sorting a little difficult... But so far, it seemed to me that this was probably the best solution. Giving a lot of freedom to the plugin developers.
But then I had an other idea, what about having the plugins sending back the information required to build one SQL request. They would send me the name of the table, the connecting field name, the parts that goes in the SELECT",... That would make searching and sorting a lot easier, and could be processed in one request. But is that really a good idea ?!? It sounds a little pulled by the hair in one way, but maybe it's the best way.
It feels like I am really mixing the logic and data layer, which is normal in one way since I want a dynamic data structure. But I just wonder if I'm not missing a point there.
Any suggestion ???????
If you wanna dig more into that question, check out the document called documentation.pdf on Sourceforge. I am discussing some of these options in a deeper way. But please forgive me for the quality of the writing, it's a working document, not a publication ;-)
Jobzesage can you please provide the link to your article?
With pleasure Cyrus ;-)
However, that documentation isn't really up to date I must admit... I've had very little time this past two months for this project. But I managed to land a functional solution :-) but there is still some graphical interface work to do before it's really usable.
At the end, I dropped the idea of asking the field object for a string (the cleanest POO version). The reason is that MySQL is a lot better at sorting and filtering than my objects in PHP. The last concept I implemented is an object ContactSet that builds an SQL query based on the information provided by each field (table name, JOIN SQL expression, and SELECT SQL expression). Then I have an object Filter that creates the filtering SQL expression.
Put it all together, and you get your query :-)
I mentioned this option in the documentation, it's a little bit more detailed I think. But back then I wasn't considering that option so much. But at the end, I think it makes it a lot simplier, and the speed is pretty good (I haven't tried heavy loads, but the goal is rather medium/small companies/organizations as an internal tool).
Good luck !
After 20 years of working on large enterprise systems, I retired and, just to keep my hand in, I started developing applications for small businesses, incorporating some enterprise concepts (metadata and suchlike). My particular targets were startups and businesses in a state of volatile change. I came across the problems you mention and one solution I have looked at was to adopt the anonymous field names (attribute1 or whatever) and to arbitrate between different implicit structures, I used queries with terms like "Select attribute1 As EmployeeLastName" and so on. I used serial number primary keys (hence independent of structure) and a code indicating the implicit structure. In a form for instance, you would first enter the code, which would change the recordsource to be the appropriate query and you could even switch the labels on the fields to reflect the implicit structure. The underlying control name (I am talking MS Access here) would of course be attribute1 but the user never sees that. The Save button would call up the appropriate VBA code to put everything away nicely.
Users could quite happily generate reports and subqueries, simply using the appropriate master query as a filter and naming device.
I hope this is useful.
thanks for your input, and sorry for taking that long to answer, I've been pretty busy ;-)
Your solution is nice in the way that it's fast to process, and easy to code/handle. Sadly for this project, it lakes flexibility. Since we're designing a system that lets the user design his own data structure, I have no idea on before hand on how many fields he will need. Further more, beside simple text fields, we want to be able to work with other types such as dates. If you store dates in a text field, it makes it more complicated to handle, sort, filter,...
And since we want plugins to create new available types, they might have all kinds of needs. For instance a plugin creating a types email and used for bulk emails might want to store how many bounces came back for each email, and maybe disable an email. Therefore a column might have several fields to store.
That's why at the end each column type (plugin) has a table, I have a table that describe the data structure of each data type defined by the user. When we want to display the data, from the table that has the data structure I extract the information that I need to build an SQL request that joins all the tables of the plugins together for each column of the type defined by the user.
That creates a big SQL request ;-) but it seems fairly fast to evaluate, and I can use SQL to sort, filter,... which is very convenient !
That's the solution we choose for a maximum flexibility and to use SQL for the sorting and the filtering. It works well but the database is a little messy if you look at it directly... harder to manage.
I think we have a "horses for courses" situation here, in that we are looking to provide solutions for somewhat different marketplaces. You are providing a "black box" solution which will perform certain explicit tasks for for IT-savvy clients (or those who can afford a sysadmin or programmer) who will provide the data to be processed in formats convenient for them. I, on the other hand, am providing a fairly comprehensive business model to owner/operators who need rapid response times to changes they have identified in the way they run their business. What we have in common is that we both need to use metadata (data which determines which way the engine runs) to achieve our objectives.
Here are a few details of my model which hopefully strike sparks in your mind and help give you a direction. Again, I am thinking in terms of the MS Access/Jet architecture, which allows you to have a front-end application program where tables can be linked to multiple back-end databases. The core backend module consists of basic real-world entity information (organisations, persons, comms channels, addresses, etc.) plus relationship tables to link them together and role tables to define the nature of those relationships. Some obvious roles are predefined (customer, supplier) but users can define their own. These are used mainly to drive extraction queries, but can also act as code switches. In separate databases, we have customer-defined information specific to their business which links to the relevant entities. Some tables map one-on-one to the entity tables and provide local variants to the data object. All application modules have their own VBA, but they all share the libraries which run the entities module. The aim is to be able to zero in to the points where the customer has requested changes.
Here is one example. A function caterer wishes to provide different services for corporate and private clients. He adds the appropriate roles to the entity relationships. Multiple multi-course menus are available for selection by customers and these may be priced on a per-dish or per-course basis. Then the client decides to offer a deal whereby if you select two course, a third course is free. I come in and do some minimal work, modifying lookup tables as required and adding some VBA code. The lookup code contains a pricing model switch which directs the application to the appropriate section of code. The client can provide variants by entering (for different menus) different prices, number of courses and number of dishes. I only come back if he needs a new pricing model.
Sorry for being a bit verbose, but that's the way it goes with abstract ideas !
Hi Him, interesting how your detailed the way you customize your software for the customer. It sounds like an easy way to bring custom logic on the top of a set structure.
You're right in the way that we're in two different situations, but in our case the goal is actually to avoid any programming work for the customization. Or if required, then under the form of a plugin that leaves the core untouched.
In our case, the user defines the entities, the relations between them and their data structure... all from the GUI. Our primary coal is to store contacts, as we found out that per organization I'm working with there are 3-5 databases of people (the one of the accountancy, the one of the administration, marketing,...) and I want to take it down to one where each department can customize the information they want to store about a contact.
Now since nothing is defined for a contact but an id, it can store more or less anything. It becomes sort of a web based MS Access. OK, not even close but I mean by the fact that the user designs his system.
Again, there is a price to pay, the core system will remain quite simple and be extended by plugins. So more than a real database system, we should probably call it a "list manager" where lists have relationships, and custom fields.
But interesting discussion :-)
For me the limitation of your model relies in the fact that further extension are limited by the data structure you set, or involves a change in the data structure.
In our case, the plugin comes with his piece of datastructure, and links to the core.
But your solution fits your needs much better than ours in the way that your databases are much more readable even by someone who has basic knowledge of MS Access. And that's interesting...
Common Driver for Processes
This is getting more interesting by the minute ! All interactions in business are driven by the relationship between the various entities and what I find intriguing is that we both seem to be trying to attack this basic business premise from different perspectives. You are endeavouring to leave as much to the user as possible and simply provide processes (plug-ins) to manage whatever he/she decides to specify. This may be the way to go in the web/cloud world. I, on the other hand have taken the view that inter-entity relationships are real, but that the nature and extent of these relationships is defined by the parties. Hence, in my server/client system, the de-facto real world relationships each carry with them one or more codes (which I call roles and can be user-defined), which are used to define the different manners of engagement. The user can write the code (or access a library) using the entity keys and the role key as parameters. For instance, one module may process orders, and another payments (which may cross-reference orders through their repsective role keys.)
Actually, all my stuff originated in some ideas for bringing sophisticated decision-support capability to small business and the entity model was my means of connecting what may be fairly remote areas in the overall business model.
For your interest, I attach an entity/relationship diagram and notes which may (or may not) be useful to you.
Back to the original question
You know, some of us have been having such an interesting conversation about a few ideas of interest that we seem to have forgotten about the problem posed by our colleague illumna ! Now, the premise seems to be that he has a number of versions all describing different versions of the same entity and he would like to have an overarching structure that would enable him to have a common extraction/processing mechanism that would handle all variants.
My line of thought that I have expressed in my contributions is to have a core entity table which essentially contains identification and relationship data (common to all versions) linked one on one to different information tables which contain the variable client data. The entity table would identify the variant and the type ID, if you will, would be a switch to invoke dedicated processing for each variant. This approach derives from my work on decision support systems, which need to reach the extremes of very large systems.
A simpler alternative (depending upon other constraints not yet offered by illumna) would be to create a database table where some fields would be common to several of the contributing client tables and others would be specific to some clients who were the only ones collecting the data in question. Any processing functions would know how to process all fields.
The requirements would be queries which will draw on the contributing databases and transfer the data to the appropriate fields in the new database. if you wanted to retain all of the individual reports in the different databases, you would need complementary queries to extract the data and present it as an alias (or put it directly into a table which would be referred to by the reports.
As I understand it, jobzesage favours providing functions (called plugins here) which will read the structure of each client table presented and then, through some switching mechanism, either process the data in a way consistent with the structure which has been read, or refer to some code provided by the user.
The solution selected, I would suggest, depends on where you would want to go next, illumna.
I hope that our earlier contributions or this summary are useful to you.
Last edited by jim_from_oz; 03-22-2010 at 03:58 AM.
Reason: spelling errors
This is a really excellent read for me. Must admit that you are one of the best bloggers I ever saw. Thanks for posting this informative article.
work at home on the internet
Last edited by oracle; 05-17-2010 at 02:48 AM.
good work.............keeep it up all the good work............
buy zhu zhu pets
Last edited by oracle; 05-17-2010 at 02:48 AM.
By City of LA in forum Careers
Last Post: 01-21-2009, 04:36 PM
By software_develo in forum Database
Last Post: 11-21-2005, 10:18 AM
Last Post: 06-05-2002, 09:07 AM
By Paul Klanderud in forum ASP.NET
Last Post: 02-04-2001, 04:24 PM
By Dan in forum VB Classic
Last Post: 03-17-2000, 05:14 AM
Top DevX Stories
Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL