DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 4 of 4

Thread: In dire need of advice...

  1. #1
    Volker Held Guest

    In dire need of advice...

    Hi,

    im working on a projekt where i have to
    translate XML source files ( writing
    the appropriate Java app ) into a
    database ( MySQL ).
    The XML or better the DTD has nearly a
    hundred elements, most of them with
    attributes and the elements can be
    nested "flexible" on most occasions -
    this means there is is not only one way
    an element can "look" - it can contain
    various sub-elements specified in the DTD...

    I know how to parse the XML, i know how
    to get access to all attributes and
    parsed character data, how to write that
    data to the db... but im having
    difficulties to set up the code that
    will handle the ( what i think ) rather
    complex structure...

    The problem is that a single element can
    be spread over multiple tables and vice
    versa...

    Maybe im thinking way to
    circumstantial, but if you have any
    tips, ideas, tutorial sources how to
    handle such things in java, itd be
    great if you could help me... in any
    way..im pretty desperate here...

    Thanks for any helpful ideas you might have.

    Volker Held


  2. #2
    MarkN Guest

    Re: In dire need of advice...


    Can you give an example of -
    "The problem is that a single element can
    be spread over multiple tables and vice
    versa..."

    With opensource tools (Castor and Hibernate) you should be able to do this
    with little code.



    Volker Held <vheld@gwdg.de> wrote:
    >Hi,
    >
    >im working on a projekt where i have to
    > translate XML source files ( writing
    >the appropriate Java app ) into a
    >database ( MySQL ).
    >The XML or better the DTD has nearly a
    >hundred elements, most of them with
    >attributes and the elements can be
    >nested "flexible" on most occasions -
    >this means there is is not only one way
    >an element can "look" - it can contain
    >various sub-elements specified in the DTD...
    >
    >I know how to parse the XML, i know how
    >to get access to all attributes and
    >parsed character data, how to write that
    >data to the db... but im having
    >difficulties to set up the code that
    >will handle the ( what i think ) rather
    >complex structure...
    >
    >The problem is that a single element can
    >be spread over multiple tables and vice
    >versa...
    >
    >Maybe im thinking way to
    >circumstantial, but if you have any
    >tips, ideas, tutorial sources how to
    >handle such things in java, itd be
    >great if you could help me... in any
    >way..im pretty desperate here...
    >
    >Thanks for any helpful ideas you might have.
    >
    >Volker Held
    >



  3. #3
    Volker Held Guest

    Re: In dire need of advice...

    <!ELEMENT sequenceset (( length?,
    starterror?, comment?, creation?,
    sequenceset+, data* ) |
    ( length?, starterror?, comment?, creation?,
    sequence+, data* ) |
    ( length?, starterror?, comment?, creation?,
    sequenceset+, sequence*, data* ) |
    ( length?, starterror?, comment?, creation?,
    sequenceset*, sequence+, data* ))>

    <!ATTLIST sequenceset id ID #REQUIRED
    name CDATA #REQUIRED
    start
    CDATA #IMPLIED
    stop CDATA #IMPLIED>

    <!-- starterror is used by sequenceset
    to give a possible derivation from the
    given starting
    point of the sequenceset. This
    information, if it could be estimated,
    can be used in assembly
    processes -->
    <!ELEMENT starterror (#PCDATA)>


    <!-- sequence is a container for nucleic
    acid sequence information together with
    all kinds of features
    that can be located on this sequence
    container. It stores the sequence
    together with reliability
    information
    Start and stop coordinates if given in
    the attributes refer to the surrounding
    container element.
    Counting of bases and aminoacids starts
    at 1 as always in molecular biology.
    Counting is always relative to the
    container element. If start and/or stop
    elements are missing
    the elment is treate as a top level
    container -->

    <!ELEMENT sequence ( comment?, dna_seq?,
    reliability?, feature*, protein*,
    genemodel*, repeat*, rna*, site*,
    transcript*, snp*, indel*,
    translocation*, expression*,
    creation?, data* )>
    <!ATTLIST sequence id ID #REQUIRED
    name CDATA #REQUIRED
    start CDATA #REQUIRED
    stop CDATA #REQUIRED
    generation CDATA #IMPLIED>


    <!-- tags representing features to be
    recognized on sequences.
    all features in an mbml document should
    be sorted by increasing start coordinates
    the tag "feature" is a generic tag
    without defiend substructure to hold
    some speacial features,
    the attribute "strand" denotes the the
    strand where a feature is located,
    the attribute "generation" gives
    features inside "version" tags the
    actual valid version of the feature.
    The strand attribute must be given, at
    the highest nesting level, and must not
    be given at deeper nesting -->
    <!ELEMENT feature ( comment?, feature*, (
    dna_seq | rna_seq | aa_seq )?,
    creation?, data* )>
    <!ATTLIST feature id ID #REQUIRED
    name CDATA #REQUIRED
    type CDATA #REQUIRED
    start CDATA #REQUIRED
    stop CDATA #REQUIRED
    strand ( watson | crick | both ) #IMPLIED
    generation CDATA
    #IMPLIED>



    Hi MarkN,

    the above i part of the DTD ( maybe
    8-12% of it )

    I try to make up an example...

    a <sequenceset> relates to a single
    table, but some sub-tags of it like
    <sequence> is stored in a data set in
    another table.
    <sequence> contains <features> ( another
    table ) but i need the ID attribute from
    <sequence> for the <features> table. (
    concatened key )

    Im using a SAX parser ( Xerces ) and i
    have to keep the memory footprint as low
    as possible ( some hundred mb to gb file
    sizes - the full amount of data to be
    "transmuted" is around 80 GB).
    Actually my idea was to write a specific
    tag to the DB once i reach the end tag -
    but im pretty unsure what is a good
    approach.
    I tried a few things ( none of it was
    able to handle the complexity so far ),
    and i ended up in "code monsters" with a
    lot of things that work similar, but
    with slight differences - maybe im
    wrong but i see this as redundant code
    and thats not a good design then....



    MarkN wrote:
    > Can you give an example of -
    > "The problem is that a single element can
    > be spread over multiple tables and vice
    > versa..."
    >
    > With opensource tools (Castor and Hibernate) you should be able to do this
    > with little code.
    >
    >
    >
    > Volker Held <vheld@gwdg.de> wrote:
    >
    >>Hi,
    >>
    >>im working on a projekt where i have to
    >> translate XML source files ( writing
    >>the appropriate Java app ) into a
    >>database ( MySQL ).
    >>The XML or better the DTD has nearly a
    >>hundred elements, most of them with
    >>attributes and the elements can be
    >>nested "flexible" on most occasions -
    >>this means there is is not only one way
    >>an element can "look" - it can contain
    >>various sub-elements specified in the DTD...
    >>
    >>I know how to parse the XML, i know how
    >>to get access to all attributes and
    >>parsed character data, how to write that
    >>data to the db... but im having
    >>difficulties to set up the code that
    >>will handle the ( what i think ) rather
    >>complex structure...
    >>
    >>The problem is that a single element can
    >>be spread over multiple tables and vice
    >>versa...
    >>
    >>Maybe im thinking way to
    >>circumstantial, but if you have any
    >>tips, ideas, tutorial sources how to
    >>handle such things in java, itd be
    >>great if you could help me... in any
    >>way..im pretty desperate here...
    >>
    >>Thanks for any helpful ideas you might have.
    >>
    >>Volker Held
    >>

    >
    >



  4. #4
    MarkN Guest

    Re: In dire need of advice...


    Sounds like a mess. Can you send me the xml? Mess up any data you don't
    want me to see if you need to but leave the elements as is. As long as it
    isn't any personal stuff it should be fine. You can send the DTD too but
    they usually tend to just add to the confusion. I usually don't use them.


    My domain is sprynet.com and my email name is mnuttall. Put them together.

    Mark

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center
 
 
FAQ
Latest Articles
Java
.NET
XML
Database
Enterprise
Questions? Contact us.
C++
Web Development
Wireless
Latest Tips
Open Source


   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center