DevX Home    Today's Headlines   Articles Archive   Tip Bank   Forums   

Results 1 to 4 of 4
  1. #1
    Join Date
    May 2004

    java.net question: Size of downloaded document

    Hey Guys,
    Got a question about java.net

    I am trying to create a simple application that connects to a URL, downloads an HTML page and prints the size of the page.

    Seemed pretty simple when I started out.

    Using URLConnection.getInputStream(), I read in the document.

    Then, I find out the length of the document using URLConnection.getContentLength(), which returns the size of the HTML doc in bytes.

    So far, so good.

    Now the only problem here is that the size returned is just the size of the file containing the HTML text. However, the page can have lots of images in it referenced by <img src ="....">, which increase the actual size of the downloaded page.

    Question1: Is there any way to get the TOTAL SIZE of the page (which includes the size of all images as well)??

    (Otherwise, the only option left for me is the parse the downloaded HTML text, search for <img src> and download each image seperately using the image URL)

    Another question:

    I am also timing the time taken for download, by starting a timer before establishing the connection and ending it after reading in the input stream. Curiously, the first time I run the application giving it some URL, it gives a download time which looks real. But after that, for ever subsequent run for the same URL, the download time gets reduced by almost 1/8th. I assumed some kind of caching was going on, so I set URLConnection.setUseCaches(false).

    But to no avail. Unless I kill the application, and start it again, some kind of caching is going on, which gives a much smaller download time for every run on the same URL after that first one.

    Would somebody help me out and make my life easier?


  2. #2
    Join Date
    Feb 2004
    the presence of image tags does not increase the size of a downloaded page.. even a web browser like IE must download the html, read through it, pull the links of the images, load the images, then display them in place... the presence of an img tag does not thus magically increase the size of the original document.

    if you want to give a "total bytes left to download" counter then yes, you must pull all the img links, then fire off requests for them and add their size into the total.

    did you ever notice that IE only says "20 items left to download" - it doesnt say their sizes? thats cause it knows from the original document that there were 20 images, but it doesnt know the sizes


    there are more factors than just the downloading of the page.. the time it takes to start java is lowering too, because your computer is doing caching anyway.. everything these days has caches.. you have a web cache, your network card has a buffer, the hard disk has a cache, and windows caches the hard disk again in memory..
    too much cold hard cache.. and not enough cold hard cash..
    The 6th edict:
    "A thing of reference thing can hold either a null thing or a thing to any thing whose thing is assignment compatible with the thing of the thing" - ArchAngel, www.dictionary.com et al.
    JAR tutorial GridBag tutorial Inherited Shapes Inheritance? String.split(); FTP?

  3. #3
    Join Date
    May 2004
    Thanks CJ. Guess, I'll have to sweat it out.

    By the way, I guess I wasn't very clear about my problem. I know the presence of <img> tag does not increase the size of the page. (C'mon man, I might type slow but I ain't dumb ). My query was that, does Java provide a method which can parse the downloaded HTML code for me, pick out the image URLs and download them as well? So that the final downloaded document that I have includes everthing.

    Guess, Java doesn't.

    (On a philosophical note, cache can't buy you happiness.Neither can cash. )

  4. #4
    Join Date
    May 2004
    Regarding the Cache problem, I checked out the sun forums. There were quite a few posts there stating the same problem. No one had a solution.

    This thread for instance:

    One of the posters offered the following resolution:
    URLconnection connect= myurl.openConnection();

    I tried but it still doesn't work

    Come on Java gurus... help me out with this. How do I get rid of this caching.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
HTML5 Development Center
Latest Articles
Questions? Contact us.
Web Development
Latest Tips
Open Source

   Development Centers

   -- Android Development Center
   -- Cloud Development Project Center
   -- HTML5 Development Center
   -- Windows Mobile Development Center

We have made updates to our Privacy Policy to reflect the implementation of the General Data Protection Regulation.