C# Developers' Journal (original) (raw)

12:11p

Capturing web page as an image This subject has me utterly flummoxed and frustrated.

My goal: To capture the contents of a web page as displayed in the .NET 2.0 WebBrowser control. That is, the contents of the entire page -- not only what is being displayed currently on the desktop, but the contents of the entire Document.Body.ScrollRectangle.

I have three different potential solutions to this problem using three different methodologies. They all work on most web pages, but some web pages cause them to capture a blank (all white) image of the proper size.

I'll say it again, for those people who want to stop reading at this point and tell me it's not possible -- it works for most cases, but I want it to work in all cases.

My methods are as follows -- I'll go in order of increasing complexity. Note that in all cases, prior to capturing the image, I resize the window and the web browser control to the size of the .Document.Body.ScrollRectangle, such that no scroll bars are displayed.

METHOD 1: Unsupported framework methods! This one's the simplest -- I just call the .DrawToBitmap() method of the Web Browser, which is not officially supported. It works in ~50% of cases, otherwise returning a blank white image of the proper size.

METHOD 2: Sending a WM_PAINT message to the window or the control telling it to paint itself on a Graphics object created from a blank Bitmap of the proper size. I use the Handle property of the window or webbrowser control to get an IntPtr to it, and the GetHdc() method of the Graphics.FromImage(theBitmapOfTheProperSize) object to get an IntPtr to the graphics object. Then I call SendMessage() in User32.dll with the handle of the control, WM_PRINT, the graphics handle, and COMBINED_PRINTFLAGS. Then I call ReleaseHdc() on the graphics object with the IntPtr to it that I got earlier. This method seems to work everywhere but www.yahoo.com, where I get a blank white image of the proper size.

METHOD 3: I get an mshtml.IHTMLElementRender object from the Document.DomDocument object. Then I perform a BitBlt operation on the WebBrowser control into an empty Graphics object of the proper size. This method works 100% perfectly in Windows Vista, but unfortunately our target environment is Windows Server 2003, and in that OS (probably due to some differences in GDI), there are numerous clipping problems -- if the control's drawing area is obscured by another window(s) on top of it, they are captured, and if the window extends beyond the viewable desktop area, any portion of the window not viewable is replaced by a black area in the generated image. Again, none of that happens in Vista -- whether or not the window is visible, and whether or not the window is obscured, I get a perfect image of the web page every time.

I can post source code if anyone's interested, but I am very curious as to why www.yahoo.com presents such a problem for methods 1 and 2. Originally I suspected Shockwave Flash, but I can capture www.bored.com and www.ford.com flawlessly, both of which use more flash than Yahoo.

Any information is greatly appreciated.