-
In dire need of advice...
Hi,
i´m working on a projekt where i have to
translate XML source files ( writing
the appropriate Java app ) into a
database ( MySQL ).
The XML or better the DTD has nearly a
hundred elements, most of them with
attributes and the elements can be
nested "flexible" on most occasions -
this means there is is not only one way
an element can "look" - it can contain
various sub-elements specified in the DTD...
I know how to parse the XML, i know how
to get access to all attributes and
parsed character data, how to write that
data to the db... but i´m having
difficulties to set up the code that
will handle the ( what i think ) rather
complex structure...
The problem is that a single element can
be spread over multiple tables and vice
versa...
Maybe i´m thinking way to
circumstantial, but if you have any
tips, ideas, tutorial sources how to
handle such things in java, it´d be
great if you could help me... in any
way..i´m pretty desperate here...
Thanks for any helpful ideas you might have.
Volker Held
-
Re: In dire need of advice...
Can you give an example of -
"The problem is that a single element can
be spread over multiple tables and vice
versa..."
With opensource tools (Castor and Hibernate) you should be able to do this
with little code.
Volker Held <vheld@gwdg.de> wrote:
>Hi,
>
>i´m working on a projekt where i have to
> translate XML source files ( writing
>the appropriate Java app ) into a
>database ( MySQL ).
>The XML or better the DTD has nearly a
>hundred elements, most of them with
>attributes and the elements can be
>nested "flexible" on most occasions -
>this means there is is not only one way
>an element can "look" - it can contain
>various sub-elements specified in the DTD...
>
>I know how to parse the XML, i know how
>to get access to all attributes and
>parsed character data, how to write that
>data to the db... but i´m having
>difficulties to set up the code that
>will handle the ( what i think ) rather
>complex structure...
>
>The problem is that a single element can
>be spread over multiple tables and vice
>versa...
>
>Maybe i´m thinking way to
>circumstantial, but if you have any
>tips, ideas, tutorial sources how to
>handle such things in java, it´d be
>great if you could help me... in any
>way..i´m pretty desperate here...
>
>Thanks for any helpful ideas you might have.
>
>Volker Held
>
-
Re: In dire need of advice...
<!ELEMENT sequenceset (( length?,
starterror?, comment?, creation?,
sequenceset+, data* ) |
( length?, starterror?, comment?, creation?,
sequence+, data* ) |
( length?, starterror?, comment?, creation?,
sequenceset+, sequence*, data* ) |
( length?, starterror?, comment?, creation?,
sequenceset*, sequence+, data* ))>
<!ATTLIST sequenceset id ID #REQUIRED
name CDATA #REQUIRED
start
CDATA #IMPLIED
stop CDATA #IMPLIED>
<!-- starterror is used by sequenceset
to give a possible derivation from the
given starting
point of the sequenceset. This
information, if it could be estimated,
can be used in assembly
processes -->
<!ELEMENT starterror (#PCDATA)>
<!-- sequence is a container for nucleic
acid sequence information together with
all kinds of features
that can be located on this sequence
container. It stores the sequence
together with reliability
information
Start and stop coordinates if given in
the attributes refer to the surrounding
container element.
Counting of bases and aminoacids starts
at 1 as always in molecular biology.
Counting is always relative to the
container element. If start and/or stop
elements are missing
the elment is treate as a top level
container -->
<!ELEMENT sequence ( comment?, dna_seq?,
reliability?, feature*, protein*,
genemodel*, repeat*, rna*, site*,
transcript*, snp*, indel*,
translocation*, expression*,
creation?, data* )>
<!ATTLIST sequence id ID #REQUIRED
name CDATA #REQUIRED
start CDATA #REQUIRED
stop CDATA #REQUIRED
generation CDATA #IMPLIED>
<!-- tags representing features to be
recognized on sequences.
all features in an mbml document should
be sorted by increasing start coordinates
the tag "feature" is a generic tag
without defiend substructure to hold
some speacial features,
the attribute "strand" denotes the the
strand where a feature is located,
the attribute "generation" gives
features inside "version" tags the
actual valid version of the feature.
The strand attribute must be given, at
the highest nesting level, and must not
be given at deeper nesting -->
<!ELEMENT feature ( comment?, feature*, (
dna_seq | rna_seq | aa_seq )?,
creation?, data* )>
<!ATTLIST feature id ID #REQUIRED
name CDATA #REQUIRED
type CDATA #REQUIRED
start CDATA #REQUIRED
stop CDATA #REQUIRED
strand ( watson | crick | both ) #IMPLIED
generation CDATA
#IMPLIED>
©
Hi MarkN,
the above i part of the DTD ( maybe
8-12% of it )
I try to make up an example...
a <sequenceset> relates to a single
table, but some sub-tags of it like
<sequence> is stored in a data set in
another table.
<sequence> contains <features> ( another
table ) but i need the ID attribute from
<sequence> for the <features> table. (
concatened key )
I´m using a SAX parser ( Xerces ) and i
have to keep the memory footprint as low
as possible ( some hundred mb to gb file
sizes - the full amount of data to be
"transmuted" is around 80 GB).
Actually my idea was to write a specific
tag to the DB once i reach the end tag -
but i´m pretty unsure what is a good
approach.
I tried a few things ( none of it was
able to handle the complexity so far ),
and i ended up in "code monsters" with a
lot of things that work similar, but
with slight differences - maybe i´m
wrong but i see this as redundant code
and that´s not a good design then....
MarkN wrote:
> Can you give an example of -
> "The problem is that a single element can
> be spread over multiple tables and vice
> versa..."
>
> With opensource tools (Castor and Hibernate) you should be able to do this
> with little code.
>
>
>
> Volker Held <vheld@gwdg.de> wrote:
>
>>Hi,
>>
>>i´m working on a projekt where i have to
>> translate XML source files ( writing
>>the appropriate Java app ) into a
>>database ( MySQL ).
>>The XML or better the DTD has nearly a
>>hundred elements, most of them with
>>attributes and the elements can be
>>nested "flexible" on most occasions -
>>this means there is is not only one way
>>an element can "look" - it can contain
>>various sub-elements specified in the DTD...
>>
>>I know how to parse the XML, i know how
>>to get access to all attributes and
>>parsed character data, how to write that
>>data to the db... but i´m having
>>difficulties to set up the code that
>>will handle the ( what i think ) rather
>>complex structure...
>>
>>The problem is that a single element can
>>be spread over multiple tables and vice
>>versa...
>>
>>Maybe i´m thinking way to
>>circumstantial, but if you have any
>>tips, ideas, tutorial sources how to
>>handle such things in java, it´d be
>>great if you could help me... in any
>>way..i´m pretty desperate here...
>>
>>Thanks for any helpful ideas you might have.
>>
>>Volker Held
>>
>
>
-
Re: In dire need of advice...
Sounds like a mess. Can you send me the xml? Mess up any data you don't
want me to see if you need to but leave the elements as is. As long as it
isn't any personal stuff it should be fine. You can send the DTD too but
they usually tend to just add to the confusion. I usually don't use them.

My domain is sprynet.com and my email name is mnuttall. Put them together.
Mark
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Development Centers
-- Android Development Center
-- Cloud Development Project Center
-- HTML5 Development Center
-- Windows Mobile Development Center
|