-
simulating synchronous HTTP transfers
hello,
I am interested in crawling websites automatically. Ideally, I would be able to get an image capture and a HTML source every few minutes.
However, there are some sites that resist asynchronous HTTP transfers. One example is the Yahoo front page. If you access it asynchronously, perhaps through a Java script and employing a method like 'readRawSource' or 'loadStrings', et. al., you're making an asynchronous request. The response you get in return is never what is showing up on the web page at the time you make the asynchronous request.
Is it simply impossible to crawl a website like this? Or, can some kind of browser emulation be performed in a Java applet that makes a synchronous request of a problem website, saves the source and saves an image capture?
thanks.
Similar Threads
-
By manishlondon in forum Java
Replies: 0
Last Post: 10-17-2006, 05:19 AM
-
By freesoft_2000 in forum Java
Replies: 12
Last Post: 08-03-2005, 12:50 PM
-
By Andrei Coler in forum .NET
Replies: 0
Last Post: 08-20-2003, 11:00 AM
-
By Michael D. Kersey in forum .NET
Replies: 2
Last Post: 08-30-2002, 12:05 AM
-
By Constance J. Petersen in forum .NET
Replies: 13
Last Post: 08-28-2002, 10:06 PM
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
Forum Rules
|
Top DevX Stories
Easy Web Services with SQL Server 2005 HTTP Endpoints
JavaOne 2005: Java Platform Roadmap Focuses on Ease of Development, Sun Focuses on the "Free" in F.O.S.S.
Wed Yourself to UML with the Power of Associations
Microsoft to Add AJAX Capabilities to ASP.NET
IBM's Cloudscape Versus MySQL
|
Bookmarks