#!/usr/bin/python # -*- coding: iso-8859-1 -*- ''' webGobbler 1.2.8 http://sebsauvage.net/python/webgobbler/ === Purpose ==================================================================== Purpose: This program creates pictures by assembling random images from the web. Think of it as attempt to capture the chaos of the human activity, which the internet is a partial and subjective snapshot of. Motivation: I recently discovered WebCollage (http://www.jwz.org/webcollage/) and debris (http://www.badmofo.org/debris/). - What's wrong with WebCollage : Not especially pretty, and written in perl. I hate perl. - What's wrong with debris : Sources not available. Only works under Windows. Does not support proxies. I created gossyp some time ago (http://sebsauvage.net/python/gossyp/). I told myself I could do the same for images. I also wanted to train myself better at multi-threading programming. I wanted to be able to feed those images in a desktop background changer, a screensaver or whatever I want. Authors: Sebastien SAUVAGE, webmaster of http://sebsauvage.net Kilian, webmaster of http://thesermon.free.fr/ === Features =================================================================== webGobbler: * creates images by assembling random images. * can get random images from the internet or from a directory of your choice. * can apply various effect to images (rotation, inversion, mirror, re-superposition, emboss...). * can generate images of any size (Want to create a 10000x10000 images ? No problem !). * can output many file format (JPEG, BMP, PNG, TGA, TIFF, PDF, PCX, PPM, XBM...) * can work as a simple image generator, a webpage generator, a wallpaper changer, a screensaver... * can run in command-line mode or GUI mode. * runs under Windows (all flavors), Linux, MacOS X and any other OS where Python and the PIL library are available. * can save/load its configuration to/from the registry or a simple configuration file in your home directory. * supports proxies, with or without password. * is opensource ! * is free ! === Disclaimer ================================================================= IMPORTANT - READ This program downloads random images from the internet, which may include pornography or any morally objectionnable or illegal material. Due to the random nature of this program, the author of webGobbler cannot be held responsible for any URL this program has tried to reach, nor the images downloaded, stored or displayed on the computer. In consequence: - this program may not be safe for kids. - this program is definitely NSFW (not safe for work). Use at your own risks ! You are warned. You are advised this program may use copyrighted images. Thus the images generated by webGobbler are only suitable for private use. If you want to use it for non-private purposes, you may have to requests grants from the original image rights owners for each image composing the whole image. (The URLs of the last pictures used to generate current image can be found in the last_used_images.html file in the image pool directory.) === License ==================================================================== This program is distributed under the OSI-certified zlib/libpng license. http://www.opensource.org/licenses/zlib-license.php This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. === Requirements =============================================================== * Python 2.3 * PIL (Python Imaging Library) * Optional: For the Windows Wallpaper changer and screensaver: ctypes module. * Optional: For the Gnome wallpaper changer: ctypes module. * Optional: For the KDE wallpaper changer: python-dcop module. * Optional: For the configuration GUI: Pmw (Python MegaWidgets) (provided with webGobbler source) * Optional: Psyco (to speedup webGobbler) === Platforms supported ======================================================== Any platform capable of running Python 2.3 and PIL. For the screensaver: Windows 95/98/ME/NT/2000/XP/2003 or X-Windows For the wallpaper changer: Windows 95/98/ME/NT/2000/XP/2003 or Linux with Gnome or KDE. webGobbler has been successfully run on Windows, Linux and MacOS X. === Technical details ========================================================== There are 4 different kind of objects in webGobbler: * The collectors are in charge of spidering the web and downloading images. They put the downloaded images in the pool. * The image pool manages the local image collection and ensures a minimal number of images. If the image pool is going low, it will ask the collectors to get more images. * The assemblers take images from the pool and assemble them in various ways: simple image output as it is, mosaic of images, superposition of images... * These assemblers can be used by different programs to produce images for HTML page generation, screensavers, desktop background... Each collector and the pool run in their own thread, so that the assemblers and other objects can continue to work while the web is spidered. The design is modular. For example, it's easy to write a new collector to spider a specific website. It's also very easy to write new assemblers. And assemblers are easy to use in programs. Still there is room for improvement (and refactoring...). Currently, existing modules are: * collector_deviantart: This collector gets random images from http://deviantART.com, an excellent collaborative art website. Anyone can post its creations, and visitors can comment. Site contains photography, drawings, paintings, computer-generated images, etc. * collector_randomimagesus: http://randomimage.us shows a random, user-submitted picture on homepage. (This collector is currently deactivated.) * collector_askjeevesimages uses the Ask Jeeves Image search engine (http://pictures.ask.com) by querying with randomly created words (I will later use a real word list). This search engine even has a "bad" image filter which should filter most pr0n away. * collector_yahooimagesearch: This is also an image search engine (http://search.yahoo.com/images), but with a different database than AskJeeves. * collector_googleimages uses the Google Image search engine (http://images.google.com) * collector_flickr uses random images from the famous Flickr.com website (http://flickr.com) * collector_local: If you do not have internet connexion, or a slow one, or do not want to eat bandwith, this collector can scan local harddisk to find images (Use the --localonly command-line option to use it.). Surprisingly enough, this gives not-so-bad results. * assembler_simple simply outputs a single image, resized to the desired dimensions (with antialiasing). * assembler_mosaic creates a mosaic of images (a grid of images). You can change desired final resolution and the number of images to put in the mosaic. * assembler_superpose is currently the most complex one: It superposes the images with transparency and does some miscellaneous stuff (compensate for poorly contrasted images, resize images larger than screen, try to detect "too white" pictures and invert them, rotate images, paste them with transparency, etc.). Applications are: * image_saver uses the assembler_superpose and saves the image as a simple BMP file every 60 seconds (configurable). This image_saver is available through the command-line or through a GUI. * htmlPageGenerator generates an auto-refresh HTML page and and image. * windowsWallpaperChanger changes the desktop wallpaper under Windows. * windowsScreensaver is a Windows screensaver. * gnomeWallpaperChanger changes the desktop wallpaper under Gnome (Linux). * kdeWallpaperChanger changes the desktop wallpaper under KDE (Linux). * x11Screensaver is a screensaver for X-Windows. * There are also other uses (Gnome & KDE wallpaper changer, etc.) Program source code is full of "FIXME" comments: There is a lot of work remaining. === Examples =================================================================== Command-line examples: * python webgobbler.py --tofile webgobbler.bmp Generate a new image every 60 seconds in 1024x768 (You will have to wait a few minutes until it gives interesting results.) * python webgobbler.py --tofile image.png --resolution 640x480 -every 30 Generate a new image at 640x480 every 30 seconds. * python webgobbler.py --towindowswallpaper --norotation --emboss Generate a new wallpaper every 60 seconds. Disable rotation and emboss the generated image. No use specifying resolution: the wallpaper changer will automatically pickup screen resolution. * python webgobbler.py --towindowswallpaper --proxy netcache.myfirm.com:3128 --proxyauth "John Smith:booz99" Generate Windows wallpaper, and connect to the internet through the proxy netcache.myfirm.com on port 3128 with the login "John Smith" and the password "booz99". * python webgobbler.py --every 120 --invert --saveconfreg Saves the options in Windows registry for later use with --loadconfreg or /s (Windows screensaver) * python webgobbler.py --loadconfreg Run webGobbler using options saved in the registry. * python webgobbler.py /c Call the webGobbler configuration screen. You can tweak all the options and click the "Save" button. These options will be used by the screensaver (see /s below) or the --loadconfreg/--loadconffile. * python webgobbler.py /s Call webGobbler as a Windows screenaver. Options will be read from the registry. (The DOS Window will still appear.) To create the registry setting with default values, run: python webgobbler.py --saveconfreg (Note that if you use the Windows binary, replace "python webgobbler.py" with "webgobbler_cli.exe" or "webgobbler.exe".) === Ideas, Todo, notepad, other stuff... ======================================= IDEA IDEA: Record all actions (image URL, rotation angle, past coordinates, etc.) in order to be able to save in a file and replay it in order to reconstruct the image !!! :-) In webgobbler_app: See if image can be scrolled by dragging it (Like a 'hand' tool). In webgobbler_app: offer the possibility to select which collectors to use in the GUI ? webgobbler_app: See how to display a tray icon. See how to change the program icon in taskbar. See how to mask program in taskbar. See how to set an icon on the cxFreeze exe. (ctypes/api Win32 ?) See: http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/d3fd71b3c2424746/5de432c60503c608?q=tkinter+set+icon&rnum=3&hl=en#5de432c60503c608 In webgobbler_app: Save application preferences in the applicationConfig object. (set as wallpaper, minimise to tray, show activity...) FIXME: implement /p /a Windows-screensaver-specific command-line options. Idea: Why not create a tray icon to start/stop/configure the wallpaper changer ? Idea: Use the transparency mask of the picture to superpose, and darken image with this mask a few pixels down and left. This may give a nice "shadow" effect on each pasted picture. --> to experiment. How to write a screensaver for Windows: http://www.christiancoders.com/cgi-bin/articles/show_article.pl?f=briant05292003004836.html Improve assembler_superpose: non-square transparency, Fractint-like plasma transparency... ? Change contrast, brightness, run through external program... Collectors should implement a delay-between-each-request attribute, or any other mecanism to be gentle with bandwith. --> How could I implement a bandwith limitation shared by all collector threads ? By centralizing downloads ? Utility methods to develop (for all the assembler modules and/or collectors) - better white image detection - banners/spacers/etc. detection (according to URL (see AdBlock), file SHA1, image dimensions (see Proxomitron), other ?) - pr0n image detector ? (using flesh-tones detection ?) Random links: See http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Searching_the_Web/Search_Engines_and_Directories/Random_Links/ Image search engines: http://directory.google.com/Top/Computers/Internet/Searching/Search_Engines/Specialized/Images/ http://dmoz.org/Computers/Internet/Searching/Search_Engines/Specialized/Images/ Add imageshack.us, others ? Collector to add: http://www.getty.edu/art/ (Art gallery) Find other sources (the more the better) Maybe usefull later: http://wwwsearch.sourceforge.net/ClientCookie/ This handles cookies automatically. ---------- Building instructions for the Windows application and screensaver ------------------- This involves some manual work. STEP 1: Get cxFreeze for your version of Python. http://starship.python.net/crew/atuining/cx_Freeze/ STEP 2: Bundle Pmw.py using bundlepmw.py (provided with Pmw). Copy Pmw.py (roughly 300 kb), PmwBlt.py and PmwColor.py in the directory of webgobbler sources. STEP 3: Run cxFreeze to "compile" the program: FreezePython.exe --install-dir dist_freeze --target-name=webgobbler_cli.exe webgobbler.py FreezePython.exe --install-dir dist_freeze --target-name=webgobbler.exe --base-binary=Win32GUI.exe webgobbler.py STEP 4: Copy those two DLL in the dist_freeze directory: copy C:\Python23\DLLs\tcl84.dll .\dist_freeze\ copy C:\Python23\DLLs\tk84.dll .\dist_freeze\ STEP 5: Copy the whole directory C:\Python24\tcl\tcl8.4 to dist_freeze\libtcltk84\tcl8.4 Copy the whole directory C:\Python24\tcl\tk8.4 to dist_freeze\libtcltk84\tk8.4 (with subdirectories) STEP 6: Remove extraneous tcl/tk scripts (demos, http, etc.) At this point, your have a full-fledge working webGobbler program. STEP 6: Usin AutoItv3, compile the following script webGobbler.au3 into webGobbler.exe: -----SCRIPT STARTS HERE-------------------------------------------------------- ; webGobbler caveat: Usage of tcl/tk library in webGobbler imposes that ; when webgobbler.exe is run, the current directory is the same as ; webgobbler.exe. ; Therefore I should put all the webGobbler files (including tcl/tk lib) ; in the c:\windows\system32 directory, along with webgobbler.scr. ; This is not good practice. ; ; This stub (placed in the Windows system folder) runs the real webGobbler ; program in its own director with the right options. ; Hide the tray icon. Opt("TrayIconHide", 1) ; Read webGobbler installation path from registry. $regpath = "HKEY_CURRENT_USER\Software\sebsauvage.net\webGobbler" $regname = "installation_directory" $installdir = RegRead($regpath,$regname) if $installdir = "" Then ; If not found in HKCU, Try to read from HKEY_LOCAL_MACHINE instead: $regpath = "HKEY_LOCAL_MACHINE\Software\sebsauvage.net\webGobbler" $installdir = RegRead($regpath,$regname) EndIf If $installdir = "" Then MsgBox(16,"webGobbler screensaver","webGobbler installation path could not be found in registry."&@CRLF&"Please reinstall webGobbler."&@CRLF&@CRLF&"(Key "&$regname&" in "&$regpath&")") Exit(1) EndIf ; Make sure installation path ends with a antislash (\) if StringRight($installdir,1) <> "\" Then $installdir = $installdir & "\" ; Make sure webGobbler is installed in this directory. If NOT FileExists($installdir & "webgobbler.exe") Then MsgBox(16,"webGobbler screensaver","webGobbler.exe could not be found in directory " & $installdir & @CRLF & "Please reinstall webGobbler.") Exit(1) EndIf ; If no command-line parameter is provided, exit. If $CmdLine[0] = 0 Then Exit ; Get the command-line option $opt = StringLower($CmdLine[1]) ; Call webGobbler ; PS: Looks like windows sometimes call /c with a handle ("/c:651484"). ; What's the purpose of that ??? If StringLeft($opt,2) == "/s" Then RunWait($installdir & "webgobbler.exe /s", $installdir) If StringLeft($opt,2) == "/p" Then Exit(0) ; Preview mode - FIXME: To implement If StringLeft($opt,2) == "/l" Then Exit(0) ; Preview mode - FIXME: To implement If StringLeft($opt,2) == "/a" Then Exit(0) ; Change password (Win95/98 only) - FIXME: To implement ; In all other cases, display the configuration screen. ; (For example, right-cliking the .scr and choosing "Configure" will call with no command-line option.) RunWait($installdir & "webgobbler.exe /c", $installdir) ; FIXME: implement /p with handle. -----SCRIPT ENDS HERE---------------------------------------------------------- Once compile into an .exe with AutoIt, rename it to webGobbler.scr CAVEAT: the AutoIt stub has to be patched to support the /s option (because the default AutoIt stub hooks the /s option.) webGobbler.scr needs to know where webGobbler.exe is installed. It read the registry (key installation_directory in HKEY_CURRENT_USER\Software\sebsauvage.net\webGobbler) === FAQ ======================================================================== * Why is it called webGobbler ? Because it gobbles anything it finds on the web. (Well, I should have named this something like "Chaos Tapestry" or "The beautiful trashbin" or "Shreddage". Whatever. Too late.) * Why choose Python ? Efficiency, readability, portability, large standard libraries, coolness. * Why don't you use [AltaVista image search][FastPath image search] [Insert-your-image-search-engine-name-here] ? Because most of these search engine have the same database as Yahoo and Jeeves. Give it a try: search the same word in all those engines: you will find the same pictures in the same order. * What's the largest image size webGobbler can generate ? I don't know, but this should be fairly large (depending on how much memory your computer has). It's bound to the PIL library. I managed to create a 10000x10000 image with no problem. It just ate an awfull lot of memory. * How much memory does webGobbler use ? It depends mainly on the size of the image to generate. The larger the final picture, the more memory used. The GUI version uses more memory, of course. Hint: If you want to create large images, use the command-line version. * How much CPU does webGobbler use ? When only spidering the web, almost nothing (usually below the 1% threshold). When assembling images, slightly more, but on a short period. If you want webGobbler to never slow you down, don't forget you can change its process priority so that it will *never* slow other processes. Under Windows NT/2000/XP, bring the task list (CTRL+SHIFT+ESC), right-click on Python.exe ou webGobbler.exe, "Set priority" > "Low". Under *nixes, use nice to set the priority to 19. Anyway, webGobbler is usually nice on the CPU. * What image formats are supported by webGobbler ? webGobbler will only download the following image types from the internet: jpeg, gif, png, tiff and bmp. webGobbler could be easily extended to support any format supported by the PIL library. (For the list of supported formats, see http://www.pythonware.com/library/pil/handbook/formats.htm) In output, webGobbler can write all format supported by PIL: As of 2004-09-16, PIL can write: PNG, BMP, JPEG, GIF, PDF, TIFF, PCX, PPM, XBM, EPS, IM and MSP. To choose the output format, you just need to use the desired extension in command line (eg. --tofile mypicture.tiff) * Will there be porn in images generated by webGobbler ? It may. I haven't developped anything to block porn. Flickr may churnout some porn and DeviantArt.com also has some nudes (rare). Other collectors are not likely to output porn, because the default behaviour of search engines is to block porn. If you want to reduce the risk of seeing porn, deactivate (in code) the two following collector: collector_flickr, collector_deviantart. But there is no guarantee ! The disclaimer of webGobbler is still relevant. * What's this imagepool directory ? webGobbler stores in this directory the images it has downloaded from the internet. Once a while, it picks an image from this directory in order to mix it and removes it from the imagepool directory. webGobbler will try to keep a constant number of images in this directory, so it will not grow out of control. * Why do the files in the image pool have those strange long names ? webGobbler ignores the name of the original image on the internet. The name derived from the content of the image itself. This get rids of duplicates (two identical images with the same name). This also ensures two different images with the same name will not clash. (This is much like most P2P programs do to identify files whatever their name.) If you open an image from the pool with a hex editor, you will see the original image URL and file name at the end of file ("--- Picture taken from..."). If you download the image and compute its SHA1 (with sha1sum for example), you should find the same SHA1 as in the filename (WG*.*) * How can I know which images were used to compose the image ? Look into the image pool directory (./imagepool): There is a file named last_used_images.html. It contains the URL of the latest images used to create the current image. Most recent images are at the bottom of the list. This file will be kept to a maximum of 1 Mb. * How can I participate ? What I need most now is a webGobbler logo. Ideas or images are welcome. I prefer 2D vector work more than 3D C.G. If your work is integrated into webGobbler, your name will of course appear in the credits. Don't forget this work will go under the zlib/libpng license. Right now, I do not seek direct contributions to code. If you have ideas (about image collection, assembling or any other feature), I would be most please to hear about them ! * Why not put webGobbler on SourceForge.net ? I have no time to administrate such a thing (CVS, bug tracking, etc.). This project is too small to benefit from this. * What is this 'psyco' thing ? Psyco accelerates Python programs on x86-compatible processors (Pentium). Acceleration ranges from x2 to x100 without a single modification in code. If psyco is installed, this program will automatically use it to run faster. Don't worry if you don't have psyco: webGobbler will still be fully functionnal and will run as usual. * WebGobbler does not create a nice collage of my photos ! WebGobbler is *NOT INTENDED* to create a nice collage of your photos. It's designed to be a random modern-art generation program. The "local directory" spider is here only for convenience. * Thief ! You steal images. No. I do not steal image. webGobbler does not steal more images than your average browser either: They both download images and display them on the computer screen. Respecting the work of others and their copyrights is *YOUR* responsibility, not mine, webGobbler's or your browser's. If you are creating art based on work of others, YOU'RE the person responsible, whatever tool you use (webGobbler, The Gimp or any other). * I want to be able to click on the image and be redirected to the original image. Not in a near future, I fear. (This is tricky, because a single pixel on the image is the result of the superposition of dozens of different images. This feature would not be relevant.) * I want to take only image from a single website. It's not possible with the current version of webGobbler, and this feature is not planned in a near future. As a workaround, you can download the website with tools like HTTrack, then ask webGobbler to use only images from this directory. === History ==================================================================== 1.0 beta 3 (2004-xx-xx): - First public release. zlib/libpng license. - Code was somewhat cleaned (Lots of work remaining) - I chose a global config (CONFIG) instead of passing parameters to each constructor. - Detailed command-line help is now displayed. - Added more documentation (license, FAQ, etc.) - Currently, only the image generator (--tofile) and Windows wallpaper changer (--towindowswallpaper) are implemented and active. - implemented persistence for assembler_superpose. Still need to add persistent directory in commande-line. 1.0 beta 4 (2004-09-10): (not released) - Yahoo Image search "not found" message has changed. - deviantArt.com link to full view image has changed. - In order to be more portable, I changed collector_local default start directory from "C:\" to "/" ("/" is also accepted under Windows) 1.0 beta 5 (2004-09-13): - Changes in assemble_superpose: New mode which does not darken image but uses Equalize operation to uniformize channels values. This give overall better pictures: - less dark areas - less grey areas - more saturated colors, even if all source images are not very saturated. - more contrast - better image mixing - less rectangular visible edges. - much more details - some details can last longer in the final image and shift colors. That's closer to what I intended to do. A more chaotic picture. Thinking of it, I should have named this program "Chaos Tapestry". This program is an attempt to capture the chaos of the human activity, which the internet is a partial and subjective snapshot of. This mode is now the default mode for assembler_superpose. The old mode (beta 4 and previous) is available through the new "--variante 1" command-line option. - I chose to de-activate Psyco by default. You will have to uncomment psyco code to use it (My old Pentium 200 with 64 Mb of RAM does not seem to appreciate psyco on heavy load). - randomimages.us collector deactivated because it gives often the same images. You will have to uncomment it to re-nable it. This seems to give better overall pictures. - new Emboss filter (--emboss). 1.0 beta 6 (2004-11-01): - When search engine are overloaded, the delay has been extended from 10 to 60 seconds to be more gentle with them. - last_used_images.txt is now last_used_images.html so that it's easier to view remote images without the hassle of copy-pasting URLs (thanks to Kilian for suggesting this.) - webGobbler image branding (lower right corner) font size is a bit larger. I still need to find a logo for webGobbler (maybe a 2D vector gobbler with a rainbow comb&tail and a vaccum cleaner in hand ? ;-) - Added new answers in the FAQ. - "--norotation" argument added. This disables image rotation. - "--proxy" argument added. Now you can properly configure proxy from command-line (without having to touch the code.) - Also supports Basic proxy authentication (for proxies which require a login/password): "--proxyauth" argument has been added. You do not HAVE to provide password in command-line. If the password is not provided, you will be prompted to enter it. Example (with password) : --proxyauth "foo bar:mysecretpassword" Example (without password): --proxyauth "foo bar" - When downloading an image, its MIME type (Content-Type) is now checked against a fixed list of known MIME types. This prevents the download of exotic image formats which would not be supported by PIL. (See ACCEPTED_MIME_TYPES in code.) Furthermore, the correct extension will be added to the image file according to the MIME type so that images in the imagepool will be saved correctly even if the original URL does not have the correct extension (such as images provided by CGI). - Slightly reduced message verbosity so that it fits a bit nicer on screen when using --debug. - Better deviantArt.com particularities handling (poetry pages, etc.) This will slighly reduce the number of outgoing requests to this site. 1.0 beta 7 (2005-01-14): - corrected a bug in collector_local which would reset its directory pool a bit too early in some situations. - collector_deviantart changed to adapt to deviantart.com website changes. - For the sake of the Netiquette, webGobbler now properly sends its User-Agent "webGobbler/1.0b7" in HTTP headers instead of the standard "Python/urllib". (But for the sake of the Netiquette, should I respect robots rules ? webGobbler does not technically 'spider' websites.) - socket timeout set to 15 seconds for the whole program so that the collector threads are not stuck trying to download an image from a site which does not respond (or does not respond in a reasonable time). - In all collectors, sleep() was replaced by self.waituntil so that collector will react more quickly on shutdown commands. FIXME: I still need to take care of some eventual time-warping risks using time.time(). - revamped all collectors so that their _getRandomImage() method returns more quickly while not slowing down the spidering process. This way, the threads will die more quickly when requested to shutdown. This is better for the screensaver. - Windows Wallpaper changer now automatically uses the current screen resolution. Command-line specified resolution will be ignored. - I wrote the core of the Windows Screensaver, using ctypes only. (Pew ! Win32 API programming sucks.) (No dependency on Mark Hammond's win32 module, nor pyGame, nor Tkinter, nor pyScr...). Resulting binaries will be smaller. With py2exe+UPX, I managed to have the whole webGobbler below 1,3 Mb. Right now, only the /s (start) option of Windows screensaver is implemented. I still have to implement /p /c and /a. You'll have to configure the screensaver by the command-line with --saveconfreg (see below). - The screensaver has a seperate thread to handle its window, so that it will immediately turn off if mouse is moved, even if the other threads are still downloading or crunching data. Better responsiveness, happier user. - I chose to put the screensaver in a separate file (wgwin32screensaver.py). This may change later. - Implemented the applicationConfig class which will ease the storage/retrieval of program configuration. It can import/export to/from XML, file or Windows registry. The screensaver (/s) automatically uses Windows registry configuration. - Added command-line options saveconfreg/loadconfreg/saveconffile/loadconffile to save/load options to/from registry/file. Usage example: specify all your options in command-line and use --saveconfreg Then you will just have to call webGobbler with --loadconfreg to recall all the options. The screensaver will automatically use options saved with --saveconfreg - I stumble upon this: http://www.scroogle.org/gscrape.html The author is quite right. After all, Google makes multi-million dollars benefits by indexing and using *our* sites. I don't earn a single penny out of Google, so why should I feel guilty of using Google in return ? So I decided to include a collector for Google Image Search. webGobbler beeing only for private, non-commercial use, I invoke the "fair use" right. - Better error handling (that's why the code is so verbose). 1.0 beta 8 (2005-01-16): - changes for the deviantart.com website. 1.0 beta 9 (2005-01-19): - Added the --singleimage command to generate a single image and exit. - Added the --tohtml command which generates an auto-refreshing HTML page and its corresponding JPEG image. Simply open the html page in your browser and the image will automatically refresh. You can also generate directly in the directory of your webserver (--tohtml "c:\wwwroot\fun\webgobbler_current.html") - Branding was redesigned with a new font and the small eye logo of my website. (The font is '04B-11' from http://www.dsg4.com/04/extra/bitmap/) - When starting a new image, a message is displayed: "Please wait while the first images are beeing downloaded..." (just to acknowledge that webGobbler is up and running, because the first image can appear 60 seconds after starting.) - Changed the name of some configuration parameters. - Switched from XML to plain .INI file (It's easier for the users to edit.) Types are checked against default values on reading. - In consequence: saveToFileInUserHomedir() and loadFromFileInUserHomedir() now save .INI-structured files instead of XML. - saveToRegistryCurrentUser() and loadFromRegistryCurrentUser() now use a different values in registry for each parameter (It's easier to edit with RegEdit, and will ease the creation of the screensaver configuration GUI - probably in Delphi.) Note that all parameters are saved as text (REG_SZ) in registry. Types are checked on reading. - Proxy password is now garbled when saved to INI/file/registry. IT IS NOT ENCRYPTED and can still be recovered. But at least it's not stored in plaintext. - Image download now immediately aborts if the image size announced in HTTP response headers is too big (in class internetImage). - internetImage object now returns the textual reason why the image was discarded (self.discardReason): URL blacklisted, not an image, image too big, etc. This is displayed when --debug is used. - URL blacklisting was implemented (see BLACKLIST_URL). (URLs filters use a "?a AdBlock" syntax.) - blacklist.imagesha1 and blacklist.url are now exported/imported in .INI files/registry so that they can be user-customized (instead of hard-coded). (They still cannot be customized through command-line: the configuration (file or registry) needs to be manually updated.) Values are separated by |. % must all be escaped to %%. Example: http://*.doubleclick.net|*/adserver/|http://*.xiti.com - Global application logging mecanism is in place. (Woao... logging module is really *great* !) - Cleaner shutdown (I enforced threads shutdown order) - Prevented simultaneous calls to superpose() in each assembler_superpose instance in order to prevent CPU and image pool waste. (self.currentlyassembling attribute, but not read/changed in critical section because it's not worth.) - --debug mode will now also write log to a file (webGobbler.log) Now I can catch almost any unexpected exception and log it to this file (even in the console-less (screensaver) version of webGobbler) - psyco was re-enabled (gives a good performance boost, especially for the screensaver). - psyco warning is now catched and silenced. - code was adapted to run both in console and console-less mode (win32gui.exe in cx_Freeze). It's now possible to 'compile' webGobbler with cx_Freeze and get rid of the Dos window (It's better for the screensaver). You can still use --debug in the console-less version to see what's going on (in the webGobbler.log file). - Side effect: You can run the console-less version with: wg.exe --towindowswallpaper to have a background process which will change your background. The process is nice enough to not fail if the internet connection drops. It will resume downloading and generating images when the internet connexion is available again. You can put this executable in your startup menu. (But to stop it, you will have to kill the process.) - Still no binary this time: There's work remaining (screensaver configuration GUI, installer, etc.) 1.0 beta 10 (2005-03-20): - Added the variante 2 (--variante 2) which mirrors and re-superposes the final image. It creates a quasi-symmetry in the image. - Small bug corrected (session saving). 1.0 beta 11 (2005-06-29): - option --variante 2 changed to --resuperpose - spurious exceptions trapped on some PIL calls. - oops... in beta 10, I forgot to update the version number in User-Agent. - collector_deviantart changed to accomodate DeviantArt.com changes. 1.0 beta 12 (2005-06-30): - I got a *lot* of exceptions with the new version of PIL (Hence all the new try/except). - collector_askjeevesimages changed to adapt website changes. - AT LEAST ! A configuration GUI developped. You can now access webGobbler configuration with the /c or --guiconfig option. The GUI is developped in Tkinter, which makes it portable. Configuration GUI will automatically pickups registry or .ini file according to what's available. Though... It's not completed yet. (For example, the help does not display help, and I still need to tight up the widgets (alignment, resizing, data controls, etc.). I think I will also put some icons to illustrate the different options. - Therefore: the /c option for the Windows screensaver is now working ! - Still no binary this time (I have to implement help and also get rid of a packaging & path issues (Is pyco is dead ???)). 1.0 beta 13 (2005-07-02): - Corrections for Python 2.4.1 (You were getting an exception in the GUI on the "Save" button). - Tested successfully with Python 2.4.1, PIL 1.1.5, ctypes 0.9.6, Pmw 1.2 and cxFreeze 3.0.1. - assembler_superpose() now closes more quickly when asked to shutdown() (Previously, he used to finish to process its 10 images before dying.) It's much better for the screensaver. - When resolution is changed, the previous image is not trashed anymore: It's resized. This way, the user will not needlessly lose previously used CPU cycles and bandwith. - Debug option added to configuration GUI. - Added some icons in the GUI. - Removed the Help area from the GUI. 1.0 beta 14 (2005-07-02): - collector_askjeevesimages changed again to adapt website changes. 1.0 beta 15 (2005-07-05): - AT LEAST, a working binary for Windows. No command-line hassles. Rejoy ! - webgobbler_config now derives from Tkinter.Toplevel so that it can be used as a dialog window in an application. - corrected a bug in --guiconfig which would not display the window (!) - assembler_superpose refactored: It does not derive from abstract class assembler anymore. The new assembler_superpose is more efficient. (Most methods of this class are now non-blocking.) - Method assembler_superpose.saveSessionImage() was removed: this assembler now always save session state once it has completed assembling images. - The default behaviour of webgobbler.py when no command-line options are specified is to run in GUI mode. All command-line options are still available. The command-line help is available through the new --help option. - As the application is in a separate thread than image downloading and crunching, it should be fairly responsive. (Well... maybe except when shutting down due to network timeouts.) - CONFIG is no more global: it is passed in each constructor. (It was a mess, really.) 1.0 beta 16 (2005-07-06): - Oops... I completely fucked up the distribution of beta 15 because I picked up the wrong directory. Sorry for that. - ask.com changed again: collector_askjeevesimages was changed accordingly. - corrected a bug in GUI which would trigger the Tkinter timer twice. - corrected a bug when using a proxy with password with the GUI (didn't work in beta 15). 1.0 beta 17 (2005-07-08): - In beta 16 binary, I had to include the whole tcl/tk library :-/ In beta 17, it's still the case, but I removed some useless parts (demo, http...) - Added a handler for the "pr0n" warning message of Yahoo. - I used AutoIt to create the .scr which runs the main webGobbler.exe - Windows binary is now nicely packaged into an installer using InnoSetup (Great program, really ! And very easy to use.) - To help the Windows installer detect running instances of webGobbler, webGobbler creates a mutex under Windows. - The screensaver seems to be a bit too sensitive to Windows events and it turns of too easily. I saw this behaviour under Windows 2000 but could not reproduce it under Windows xP. I will have to investigate this. 1.0 beta 18 (2005-07-14): - In GUI, added a confirmation box for "Start new image from scratch" - Added a status in the main Window (to see collectors activity). - Now you do not have to restart the application if you change proxy settings. - The initial message "Please wait while the first images are beeing downloaded..." is now removed as soon as the first images are superposed. 1.0 beta 19 (2005-07-18): - Requested feature: In GUI, added ability to choose start directory when the local disk only is used to get images. - The local disk collector status now also displayed. - Corrected proper status display ('Off') if a collector was de-actived (Previously status was not updated after collector de-activation). 1.0 beta 20 (2005-07-19) - Improved the random word generator. - Removed the forced GUI update (was useless). - In the GUI, added the status of the assembler (Now you can see when it's working). - In the GUI, the tkInter timer (.after()) which triggers image assembling was replaced by an internal variable. - In the GUI, timer is now re-armed with proper delay when user selects "Update current image now". - I got rid of a major plague of webGobbler: Sharp edges ! webGobbler now smoothens borders before superposing images (see the _darkenImageBorder() method). This gives much better results. Border smoothing is enabled by default (with a 30 pixel border). If you want to disable border smoothing, set border size to 0. Border smoothing parameter is available from command-line (--bordersmooth) and from the GUI. 1.0.0 (2005-09-23) - Yahoo "no result found" message changed. - Corrected a bug in the screensaver which would display the configuration screen when the screensaver stops. - Apllication seems stable enough to go out of beta stage. Welcome the version 1.0.0 ! 1.0.1 (2005-10-30): - In Windows screensaver, I lowered the sensibility against the WM_MOUSEMOVE message. - Changes in collector_askjeevesimages to adapt search engine changes. 1.2.0 (2006-02-03): - Added gnomeWallpaperChanger and kdeWallpaperChanger, Contributed by Kilian (http://thesermon.free.fr/) Thanks a lot for the contribution ! gnomeWallpaperChanger requires gconf 2 to work. kdeWallpaperChanger requires python-dcop, which should be installed by default with KDE. - Added a new collector: Flickr - Changes in collector_yahooimagesearch to adapt search engine changes. - Changed the name of debug*.html files when a search engine change is detected. 1.2.1 (2006-02-04): - Added in the GUI: - A "Save image" button (same as the "Save" menu option) - An "Auto-save" checkbox which saves the image after each update as yyyymmmdd_hhmmss.bmp (eg."20060204_2142.bmp"). - An "Update image" button (same as the "Update" menu option) 1.2.2 (2006-02-05): - In gnomeWallpaperChanger, correction for the libgconf-2 library path. 1.2.3 (2006-02-26): - At least, the X-Windows screensaver ! Contributed by Kilian (thanks for the work !) - collector_local now avoids /mnt, /proc and /dev directories. - improved a *lot* the ImportError messages. This helps to spot import problems (such as nested imports, eg. ctypes imported in wgx11screensaver imported in webgobbler.) 1.2.4 (2006-02-28): - New --scale option to allow the images to be scaled up or down before beeing superposed. This can be used to create images with more or less details. For example, use --scale 0.5 to create more detailed images. - This version was submitted to several download websites (clubic.com, snapfiles.com, uptodown.com, softpedia.com, download.com, etc.) 1.2.5 (2006-05-02): - Added keyword search. This feature was requested by several users. You can use keyword search from the command-line with the option --keywords (eg. --keywords cats or --keywords "cats dogs"). This option is also available in the GUI. - In GUI mode, the main window now immediately closes even if a download is in progress. (This was confusing some users in 1.2.4). (I'm still trying to find a way to kill a thread in Python... :-/ - corrected a bug which would needlessly waste an image from the imagepool when the image was larger than screen. - corrected a bug in the get_unix_lib() library search function. - corrected a bug in the X11 screensaver. - changes to accomodate ask.com search engine changes. - other minor corrections. - msvcr71.dll is now bundled with the Windows installer. (This is the Microsoft VisualStudio runtime the Python virtual machine depends on, and some people do not seem to have this DLL on their system.) - version tested with Python 2.4.3, ctypes 0.9.9.6, PIL 1.1.5, psyco 1.5.1 and PMW 1.2. 1.2.6 (2006-08-08): - collector_deviantart changed to adapt to the new version of the website. 1.2.8 (2013-04-08): - checked against Python 2.6 - upgraded Pmw to 1.3.3 - psycho removed (not maintained anymore). - sha module replaced with hashlib. - askjeeves and randomimages.us crawlers removed. - small refactoring - flickr and deviantArt crawlers updated. - flickr keyword search now uses Google Images search engine (because flickr search sucks big time.) ''' # ============================================================================== import sys # FIXME: Assign an icon to the EXE. (Use http://www.angusj.com/resourcehacker/ ?) # FIXME: Change the default Tk icon, too. # When this program is frozen into an EXE with cx_Freeze with the no-console version (Win32GUI.exe) # stdout, stderr and stdin do not exist. # Any attempt to write to them (with print for example) would trigger an exception and # the program.exe would display an exception popup. # We trap this and create dummy stdin/stdout/stderr so that all print and log statements # in this programm will work anyway. # This is needed when bundling webGobbler with cx_Freeze with the console-less stub. try: sys.stdout.write("\r") sys.stdout.flush() except IOError: class dummyStream: ''' dummyStream behaves like a stream but does nothing. ''' def __init__(self): pass def write(self,data): pass def read(self,data): pass def flush(self): pass def close(self): pass # and now redirect all default streams to this dummyStream: sys.stdout = dummyStream() sys.stderr = dummyStream() sys.stdin = dummyStream() sys.__stdout__ = dummyStream() sys.__stderr__ = dummyStream() sys.__stdin__ = dummyStream() import os import stat import threading import Queue import socket import urllib import urllib2 import re import StringIO import time import hashlib import random import glob import getopt import base64 import binascii import getpass import ConfigParser import copy import logging # Set default timeout for sockets. # urllib2 and all other libraries will use this timeout. # We keep this short so that when we ask collectors to shutdown they are # stuck no more than 15 seconds waiting for network data. # This should be okay for most websites. socket.setdefaulttimeout(15) try: import Image # Note: For cx_freeze or py2exe, we need to import each image plugin individually: import ArgImagePlugin import BmpImagePlugin import CurImagePlugin import DcxImagePlugin import EpsImagePlugin import FliImagePlugin import FpxImagePlugin import GbrImagePlugin import GifImagePlugin import IcoImagePlugin import ImImagePlugin import ImtImagePlugin import IptcImagePlugin import JpegImagePlugin import McIdasImagePlugin import MicImagePlugin import MpegImagePlugin import MspImagePlugin import PalmImagePlugin import PcdImagePlugin import PcxImagePlugin import PdfImagePlugin import PixarImagePlugin import PngImagePlugin import PpmImagePlugin import PsdImagePlugin import SgiImagePlugin import SunImagePlugin import TgaImagePlugin import TiffImagePlugin import WmfImagePlugin import XbmImagePlugin import XpmImagePlugin import XVThumbImagePlugin import ImageFile import ImageOps import ImageEnhance import ImageFilter import ImageChops import ImageDraw except ImportError, exc: raise ImportError, "The PIL (Python Imaging Library) is required to run this program. See http://www.pythonware.com/products/pil/\nCould not import module because: %s" % exc CTYPES_AVAILABLE = True try: import ctypes except ImportError: CTYPES_AVAILABLE = False # If ctypes is available and we are under Windows, # let's put a mutex so that the InnoSetup uninstaller knows # when webGobbler is still running. # (This mutex is not re-used in any other part of webGobbler). WEBGOBBLER_MUTEX = None if CTYPES_AVAILABLE and sys.platform=="win32": try: WEBGOBBLER_MUTEX = ctypes.windll.kernel32.CreateMutexA(None, False, "sebsauvage_net_webGobbler_running") except: pass # If any error occured, nevermind the mutex: it's not critical for webGobbler. # (It's only an installer issue.) # FIXME: Code some unit-testing ! # FIXME: Pychecker the code often ! # FIXME: Profile the code with the profile module, then optimize. # === Globals ================================================================== # webGobbler logo # FIXME: How can I automatically paste GIF/PNG with its transparency ? # (See assembler_superpose.saveImageTo()) # "IMAGE CREATED WITH WEBGOBBLER - HTTP://SEBSAUVAGE.NET/PYTHON/WEBGOBBLER/" # Font is '04B-11' from http://www.dsg4.com/04/extra/bitmap/ WEBGOBBLER_LOGO = Image.open(StringIO.StringIO(base64.decodestring(""" R0lGODlh9gEaANX/AP//////AOfn597e3t7eAM7Ozs3NzcbGxrq6urW1tbS0tK2traqqqqWlpZyc nJSUlISEhIKCgnx8fHt7e3V1dXNzc25ubmtra2pqamlpaWVlZWNjY2FhYWBgYFlZWVJSUlFRUUFB QUBAQDm9ADc3NzQ0NDIyMjGUADGTADExMS8vLy4uLi0tLSsrKyFzACFyACEhISAgIBoaGhhSAA8P DwgICAUFBQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACwAAAAA9gEaAAAG/8CbcEgsGo/IpHLJ bDqf0Kh0Sq1ar9isdsvter/gsHhMLpvP6LR6zW673/C4fE6v2+/4vH7P7/v/gIGCg4SFhoeIiVEQ Do0PEIqRkpOUlWQOBQUHHy4uLxKWoaKjpKKYmhMuIwQEKBalsLGys3enmR8nATM3BCcVtMDBwsNc EJkCB5u5u7wfxM/QNwDTANLVQtPY1EPU2d3ZRd3a29bi5dff1enn10Te6vDv4uDg3PDW7PH13+Po 5vz43PEDSOTUgAMFUo0IUKMGARdH6OmTt42gkXra+k2caI/cP38BJe6bZ1FivpMC2WXEt87kuZUY Wd6jSLOjSH8eP4YbaC5lEv+R5ULe8xkRpMmYLgMqhdku47uVUJM6ndlU6FKrR41WRQqyoCaEBRK4 OLEwwIgXFC52vTpyKlGmO+PKlPt0rt269pBUHCrXZ9ao7f7K5IuVKl3Chd3azeu3aUy1GmlS5dgX LsB1gHNCdSt1ZE/OlC/HK5xTs0DTMW1lOvBg7IkTLi6k3dmTK+PHtm1q3N0vstWXJ+WpPL2X3Gal Lj973ju1dmDDojUzH3xzLe+kjyGzvDsUr/aabNe2bUxbNFGp+bwn5r5ePVfjQ1RrKuCgk4sPCTB8 f97XfHi9iKFX14A4dUXgf8Jlh9GCym3FH3vCLZWOe1opdqBgjNn0oBIk5ZXOHEmf9TZYcOI9B6KC Bho3Hm4pnmjiTdd95OJkRTCSyY03LpAAfRrslyFgAFY2XEsyfigikEIVx1xWzv141XErFmhZVb8R qWKRMZam5FPTReOlEwoYsNqNYBUAAghfpqnmmqJkgMB8OB6QwAUiyMDmnXjmOYgHDOCoyQIfqKCC noQWamgdHXAQQQMLPLABDCuQUMOhlFZqqRkmlMBCDDGoEIIKNlwq6qikYkFDCymo0IKdpbbq6quw xirrrLTWauutuOaq66689urrr8AKEgQAOw=="""))) WEBGOBBLER_LOGO_TRANSPARENCY = Image.open(StringIO.StringIO(base64.decodestring(""" R0lGODlh9gEaAPf/AP////Pz8/Ly8vHx8fDw8O/v7+7u7u3t7ezs7Ovr6+rq6unp6ejo6Ofn5+bm 5uXl5eTk5OPj4+Li4uHh4d/f397e3t3d3dzc3Nra2tnZ2djY2NfX19bW1tXV1dTU1NPT09LS0tHR 0dDQ0M/Pz87Ozs3NzczMzMvLy8rKysnJycjIyMfHx8bGxsXFxcTExMPDw8LCwsHBwcDAwL+/v76+ vr29vby8vLu7u7q6urm5ubi4uLe3t7a2trW1tbS0tLOzs7KysrGxsbCwsK+vr66urq2traysrKur q6qqqqmpqaioqKenp6ampqWlpaSkpKOjo6KioqGhoaCgoJ+fn56enp2dnZycnJubm5qampmZmZiY mJeXl5aWlpWVlZSUlJOTk5KSkpGRkZCQkI+Pj46Ojo2NjYyMjIuLi4qKiomJiYiIiIeHh4aGhoWF hYSEhIODg4KCgoGBgYCAgH9/f35+fn19fXx8fHt7e3p6enl5eXh4eHd3d3Z2dnV1dXR0dHNzc3Jy cnFxcW9vb25ubm1tbWxsbGtra2pqamlpaWhoaGdnZ2ZmZmVlZWRkZGNjY2JiYmFhYWBgYF9fX15e Xl1dXVxcXFtbW1paWllZWVhYWFdXV1ZWVlVVVVRUVFNTU1JSUlFRUVBQUE9PT05OTk1NTUxMTEtL S0pKSkhISEdHR0ZGRkVFRURERENDQ0JCQkFBQUBAQD8/Pz4+Pj09PTw8PDs7Ozo6Ojk5OTg4ODc3 NzY2NjU1NTQ0NDMzMzIyMjExMTAwMC8vLy4uLi0tLSwsLCoqKikpKSgoKCcnJyYmJiUlJSQkJCMj IyIiIiEhISAgIB8fHx4eHh0dHRwcHBsbGxoaGhkZGRgYGBcXFxYWFhUVFRQUFBMTExISEhERERAQ EA8PDw4ODg0NDQwMDAsLCwoKCgkJCQgICAcHBwYGBgUFBQQEBAMDAwICAgEBAQAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACwAAAAA9gEaAAAI/wDfCRxI sKDBgwgTKlzIsKHDhxAjSpy4sN26c+K4XZP2jNmyZcycRaumDVy5dOzcUVzJsqXLlzBjypxJs6bN mzhz6hzorp26ct6oKQuWaxYsV65eybrVq9gzbODMrWu3s6rVq1izauXprmu7r2DDih07tqvZs2bJ ql3Ltizat23bvk0bt67dtXO9smWnjty2Z8Fijbr0SBEiRIoaUeqU6laxad7KqWt39q5YtAXz0rXM ubPnz5o/g84rujRe0m41q+TZjt26derSyZ5Nu7bt2+nUqXvNW3ds3MCDC5/tm/dr3cOH+16evLlz 4Mt9D8eojdmuU5IAxTkj5suXMGXY4P8xlKlVMGjdTkp/Xlv3a3ZgXcOOjpy9/fv479P/nV8//f4A QvcfbsUZt9s68PW0DjrmjBMOON94I+GEFFZooYXfgBOOOON0KM6DEF4o4ogkYgiOhhx6+GCEJY74 TYbhxBgjhCy2aOONGMIo44o1TvgihNxYowwuoiACxxZLAKEDDjjo4EMRUYiRRySpAAPNNhquiGOF MHJYjjnnmFPOOB+eaOaJL26p5ppstvnjmWj22OaNb8KZ5px45piljDSaiKI4gMYoDjnmpAObOeBo Qw00zSyjzKOQRirppJMu08wz0UxDDTXSQOOMR5SGKuqolYYEjTSbThPNM42SOipIl0L/Ew00rILq 6q24UgrrM7LS2oytkcLajDLD0AJKIWo8kYMKADTrLAkwBIEFHZGsAswy0azaaq7BmjqNNdhkg401 03jaTDPOpPvrR9y26+678H7EDLrqeuQovO3KS68z696L77+S7tprrf5CKnC20CQ8UknmmOPNNMj8 ckssrrTSCisYZ6zxxhxjbLErsNCCyy699KKLLbK8UvHFHbfs8susfKyULbqUvAsutBxlMcwcfwyL LLXcgssttciiM8s8J620x62ADLTQRBu9MtNJxWJ0KpoQkgYTM3zg7NfNmsBDFnhUogouvOhSC8U7 Ly1zLLXo4oswwwjjy8mzxGK1LLLE/6KyxUgvLfjghA8OeNV89/1324Unffgreyu+cuCNVx5z0z8H PXTRR1MN+SxB32JLLbXkAgwykIEzDTGzjGKJI4kccoghtNdu++240y57IoxEcgknn3iiCSWFISJ7 7sgnr7zuhyD2CCWaePIJJ5dEwkjssy9/++6LPH8JJpcQvwj22pdvvu3cPzKJJZdYMskj4x9viOyI NQIJJIr4gYYTNHgA9v8nEEIY/kAJUISCE5RoBPlqJ7sGOpCBzeOd7zoRilGEohOXkIQjGMGIRniQ EYZp4PlGSMISmnB+9FNEBz8YQvmdUHsNRMwKGwFC47nwhThM3/ra9774ZY9+vHuEJP8mQQlJRGIS mSDFLIgxDW0gYxabIMQc0BCGL3jhiljMoha3qEXwrGEOeugDH+zwBu5YkYtoTKMatfgFMZzhDXbg Qx/0MIc1lCEMa1QjGMiQBjjUwQ51gEMayACGPBrykGjcIxrcIIc5yMENaCDkFttohjW4YQ1hUJYI /sfJGDhBDX4wBCH4AIczVBGN3jnjJMPARzjqgQ96IKMazECGWpaBDGM4JSJ3ycte+nKVYyDDLXOp yl/y8jvBHKYujcnMLSqSkY6EpCTZCIYxmCENa1iDGtTQhjsYghOzSAY1fjEKQqBBCkPYAQ5uwM52 uvOd8HRnk35wBChggQtamMIShMD/g3XG858ADeg7ccADISxhClrgAhagcIQfLEmg/5ynEZpABStQ oQlGcKg/IcrRjspTBz0YQhKcAAUnJGEIPXioPHcQBCQ0wQlI0AEKOMnJEvwAC2l4gxq64AQhqBOe OMiBUIW60XYGlQdBOEITojCFKDThCELwAQ940IOq8kAHOfCoVrfK1a7GMwc6oKpVsepVr4JVrD24 ag6KWta2srNJIR1pSU+aUrYGdQc+CMIQiLBXJVyhDYgwxXlucQk6SOFrGWgWBharWAx8bbGOfez/ atCDIPzgBi14bGSdxViwdbax/9ssZ8Hmghv8IAg+qEGzOKDY1o7Ws83SAGsBMAOQ/+pgBqt1rWhd y9nPgra3nt1tBjbQrBXAIAYwWEFowbaCGdygBizwH03BxgEaLEELXaACEZrVAQ00K7Fgky5iwRYC E6igBS9ogQpKEALudsADHZhtBsD7Wslqdre+7ax+f5tf3wJgt/+FLH+XS933xve7YJvvd8FL39fq V7SdVTAAOGDgDiAYAOCNLIDx69/+BjfBxAWAcZGrXNh+zQMhEMEISHCCZnGhD57oRTNk8Qg0EGGm AEgAApx1AB436wA9BkCQgUzkZiWAARLIAAlYAAMXmKBZDUjAj388ZCoXWchDBrKVrazlHDcgbC6I QQtI0CwGTBnLXN5yjs0MABGcV/8FIyjzmbN85S7T2c5VRrOQqdwsBTTLAhrYgAYs0GcEIKDLR27W CFKQghEAeLotEMITonAE1QIgys0ytJQB8IAISCACDzCyoY3sZwBI4M8VmEAEmrWAVrda1Jkm8p2z PGdZ83nPeqY1ru3M5Vlfede9RnQCFOBqWOd406VOALIVsGlD6/rZztoxAIi9ADYzO9Nz7vWtZ73t buu51IAWNKGn7ewgI2DTDXgABCbQrBxcYQ8xZgYsGFEGIGwSAAMIACf1DTZ+8xsA/24WBTxwghWg wGsAMEC+A+Bvf/c74M1ieMMhzvBmDcAAzQJBCliQAhA0C+MAj/jEJ/6/DpggBSb/kC7IHf61irec 5SEHOMxFDjYBDADKEJAABL78couDvAMkKAEILjDd/6nAB0pgQhBe8L8A2BznOuf5AAQggGYRAOQM aEADGLCATReAAGAngNUHMPWem91ZEj872mkuc7VH/O1wR7vEIR5zubuc6mQXOwAIQHarF+Djf3eW AQYfeAG4vOVsp/sAwA54vVc95hSPfNwbHne2O+vpl875zh8e8ccXwACbNsEPviAIUwxjxjW+sbF7 nGde59rISFYyk50M5U0HmcfczjWec58Anp8ABjOAQYs5uXs6w17RTG5BCWiae9c7v/XQN3KzOv3p UOe43J4NAQpQ4Oiigy0FPVhC/xOIQIPaZxoB4A70oAstbWdRoALNioADysz1VwPA1czWsbSb33xv 4/lr/7dr0FdnZyaAsmZumsZqrVZqzhIBEPAAPOcA6gaB5wdtr4drOVZqDeAADsBz/8N/tRaCAiiC m0Z9oCZqh3Z7RpYAC9AsImADUSAHkyALygANhGVY4xVgGsZf+DVZlXVZmQVc/1VfDqZhD7ZhYPMC OjAERjAEOsB0AKABGQZhRxhbs0VZloVZsTWFBNZbHKZZH4ZYDZZbJuZdzQIDNUADKyBe3gcAKgAE T1AFTyAE3GWG8xViI5ZczSJo4AUCJZACK7ACKHdv7wVfFgYAHWBgsyWE9sWIPP/4W41VhQEGibzl YAMWXJE1X+BFYYaYcSNgAidgAiPQXgAQAis2iltohJSoXwzWLB+QYiIQAtIlYf41hPcVhpQ4iWL4 P4v4WvRlhiqgA1DABolwCsGwKeRkTuj0U251A/NUT/eUT/vUT82oVQQlBExwBW30BVfABPzEVh31 jPaET/r0jdXoVkE1VGsVUU5CBEqwBEewAyrwaJzkAkaQBWMgBlngjczojCAlUiRlUii1JEcVBEaw BE7gBPAYVVNVVWm1AzuAVmp1jhRZkQJ1Vg45VXlVBEiQBEhQBELwA6e1V0MABHXVVkeVV0MwBKi1 A+tokVyVjkMFjkC1A0CwBFv/EAeJQAq78Aze4ERQJEVUVEzM5EVgJEZkZEbNdExigAZxoAeBIAiB oAdxgAbd8UtGGUZjVEZXuZRemUprpEhrkAZgAAU7sHxtCAA0AAVnYAex5AZmsExe8EyN9EiRBAbe wUo5RQd3cAd08AZpQEu2VEtjEEzChEty6ZWKuZhqhEyHaUtp0AZwIAdw0AZpcAZngAZqoE2RlJjH VE3XlE2BOQZ4yZjMBJaG1EZnEAeAIAmnsAvNsA3loDqs4zqwc0MmtDu98zvBMzzFg5s4lDz00z2W 0AmjUAqj0AmWAD82NEK66TvAIzzE00LBmUMO9EDbcwiJsAiNADt+sAZScAMI/1d0I/ADXpAHi1AJ liAJCnQ8OsQ+7gM/iXAYEmQJm+AJnrAJGbRBM8RBHORBNNRC2VOdBFqgwplC/ckIjhAJlYAJmoAJ lSAJj/AIkCAJFvoIjUCdJAREjCBEkzAJEno9zWmgG3qdIoQ8KfQIlzAKsRAMz7AN5KAODgMxEsM2 lEM4PiMyJGMyKLM4lqM0PjMLt8ALwCAMwMALt2AUU+M2mKOjJXMyKbOkPzqlMCMzsPAKqMAJhsAG TjADvchJMOAEb8AIo/AKs5A3U+MzT7M5UtM0kFM6vfALv9ALuWALeRM5eqM3ieM3k0OlfvqnLfM4 eGo1tTAyvHAztXCmtEA6if/Kp4zDpFUDOrdANHnjo4B6qRsTpLkQDMpADZExGTKaKIuyLQDzKJaC KZrCKZ4CLKV6KyAhEtRgDddgDdQQDZ9SMNxyqpmyKZ1yq636q67KDMyQDMNgC6JwCFsjA+NpdEQw BoMQCo5hKayqDAfjK/byqtAwDdVwDddQDeXCL/tyLueSLvxiL8B6ruhaKbBCruN6Kpuiqs+QLrwS r+vyLwczKwSTrvpaKs8gDdfADeJwDlPxDhbBIA4SInniIyjSIWQCInKSsCXSJeNQDhTbsAi7JhKr ImgCsRxrIy/yDUGyDLogCoaQBsrCLGBTAjFABF7AB5pAC8hgDd3wsVyiI3z/8rEZAg4cQg4USw4W CydAGycdO7REqydAuyHjQA5K27A7MiN34iYnsiGAUiYXW7RWq7AbUg7noA4pwRrywR8BIhvuYSD1 EbbOMbbskLbHAbb2Mba9UbZmG7fTMQ7VoQujYCRIoiRM4iRGIAVl0AeWAAvFYA3hcA7BsR+0gbZp +7X7gbhy+7iQaxuN+x5qOx8DAiAFsrZsG7mcmxu7kSCrkRmbYRqUoRo9Qbp3Ybp60Rmqi7quaxl8 4RfNcB3ZsR3d8R1l0AZ5gAicAAvEQA3ggA7woRahUbqqe7yvm7zKS7zHa7qm0brLG72XcRZbUb3W e70t0RN98ReBMRjFUz+UUuAJZ3MMwHsOXYu96Ju+6ru+7Nu+7vu+CeETQCEURKGkSbEUvmAMz5AN hbsOoQu/ABzAAjzABFzA8GsRGKERHGEvIbEwJoES/2vAEjzBFNwQAQEAOw=="""))) WEBGOBBLER_LOGO_TRANSPARENCY = WEBGOBBLER_LOGO_TRANSPARENCY.convert("L") # Force greyscale PLEASE_WAIT_IMAGE = Image.open(StringIO.StringIO(base64.decodestring(""" iVBORw0KGgoAAAANSUhEUgAAAfIAAAAbBAMAAAB1gxAdAAAAMFBMVEUAAACCgoLX19coKChSUlK2 trb09PQPDw+dnZ1sbGw9PT3Gxsbp6ekbGxsyMjL////zNJnoAAAAAWJLR0QAiAUdSAAAAAlwSFlz AAAOxAAADsQBlSsOGwAABstJREFUWMPtmGuMG9UVx8+uvczau+M1hUQRonjbDxBCi/0hkCpIWQtS AlEVGwWUkggNQkqLVMQMIqpIRDcWEmCUSLutUKUikM1DjeoKxQSEEBvJaaOGQCLGLY9QBOmKBOUD Yrf22l57PeTPOXds78MO4aVUqnw/jMfnnvv4nde9NlG3dVu3dVu3/V82PZN53qLARPA8et6J6GKB b6mASFu2LX2u8dpE4vvc9g0vdZJ6vskiPQDKES8S59UbbxMshe8Foue0MIYXGyK+RKF/wuo8ciLb QThd7kgOV1f/OhYQclx1fnJvMt0mWGqN6eLz1jl9ngx+hSGU2fKdR5r/7SD84ImO5m0s0rTAeci3 7ERNyLXTN/EGTz85yc8XNjLCro0uh2eTNhXRpqx1k6RN5Ztifcr6Ga44I6+r1+9TeqnyFH/hWVhz XV50iU6e8W+gHRuIZIr86jRPYfH7h7j2lKtL7z4po7VPsGmSPO5M3BMZ/ZxodH2eRnF2SiS7br1J 9Yye0tdb/CRXObA+ogw7FVm9T5489CnrXZ6d19HcNSO0epPViTxORlXIfwE8TkNAMU1/BGrUY6Kg VEK4GnH271iJ+hFtillgQr32AY44f4CjJ/8csJnGzmrI9qjQH6uFK6dNJ81O9iKJouVDkF0yxmPo M+BlnhLgcPLyx4xuoKLC6CGUQwXy2ahaKe5giR+izG2knsJhiXZ/jjv8YZTFwAHchmJEfP1XYJZn L/L++KsPKzD+HrCqM7ldYHI/tufKNPTaHryh4+ieGoWcvZDg8aEIRc6r/dZpiVlwJe57U4hXPmrO iQXsyvF3zLJRtGI1P2b7VPiOoQL7V8gqctgIuuSXwzmms65j7S8mc0zut3H88AAvVXONHWajxiqP IPE0nONCvj0ZVrk9AscsC/n1qIdpGttUZAfAI4Z5Yo+JA7M8+z+a5CbGw8XHEOlA/qO9mGPyXqT7 ENWJjJKf8/d3On5N9ixrDCL4L0XeW7FihZaYBY085zEpFQaxKvtfpP8p92DuoqLIxoprzPrHmFXk Ja/sRcglz/tFN5iaI91q5HmqmI9V+Itmljx2gewZyp1t5Llm0cOwFHlwpCLkRjnvp1yJo0qRl3S7 xBP34S5Ki++b5NU1/0bci3jnChdk8t1I7kEwsOcndk1H8U4u9yuTYXHlblg9ityHcWOmJV5AvuzH drlBPojko4j3VgaqhemqIi9QuEbhGUU+rOHQPHmv0o3htWeaFY6VBzEpHIf43YOjSR7bqHC3HDgi WcHk1iAsJldxZlaTRpXmR2QHJccWkh9iC/8yyR/t5MfuuISzLDEiJojvx31s7Kc5z33yXcw54nC3 kGvmFsRb4nnyj+C0yIekO9uPP79dT5UUeYmMOcq55AmXPOCSK91hNn0l3SA3alwsopL0WQoVJLU5 5l1yH5wGeZ0UuQaOPE1Uyoo8S6kqo+6WwHDJg34hT6j6I8od8lyWSgxVMplMxCiI4bW/jDl+vJ7J SKUdQqRfkXMPoi3xPHmqbMWa5L3YyLN4YcSPGLPz5OFF5D5kNeVz0aUdy9QWlM+rdJFkpIc3arAH 38pkXmyQsy97F5GTLZY15/gmpsjvXuJzXqdH5Rb7/LJMJi1HDZf+k2fayAdY55+RnCSYvoEGKxpH UyChSvfW5S75fq7HLbEK/81SxEJVLdck72cb70yzK8YN90xtI/fipzdyF+ZO9PNZsjO6ls/2rFrm GSvm7As5Mipcfpa9zWyeH1K4dELZPx9bTG7U0z62D+mXKfLCDnvOI+G2isY13H/CXUfIvbiKvOPy JilRaiP3wDmIdAg5u+CpXMPhm8IxW8wdMLniKvJeOcSaYpVHkFzbjytb0a7ZlQOsmcNkSp0A7eSa DYdRc6i6urF7DjbOChzq44NA7SzGpbrAEx+zv+DToa4Mc4+zmJyTDPQ+fqN8H+AazoWdg8mGOUth nt2UdYSXDBxA4tzkdB1wNN+HQqig80m5hbxhFNXN7tO//wkJIfdhhlpiEfxBkftQbkU7fQBsj1CI q6/aZzs53YytTH6dXVW6k3wBuFcV771sleUoqvuv/9VXuGTzie08Tu/ZQq7nnJ8vJtfHUCSPAWez Ir/dLlsS5TzpMH3Ks6t1FPkDNurRNvKFt97P5c6ldrF2jTxPNW8+AwuvqfPiZhu1ls7yVU07s1j3 wYVX3pOtd5uNqq2b7/K0n8jr8s0Hk8c9DV3Pi23r6Kesb/OzSPvb73OORRe0fXLpD77O1Xu+BToc 2N+5ycmx6gL/dp4G6pH/OTk9m7z0Qv9r4F/xRPqb/c1wcbT7X0u3dVu3LWhfAnLFm5OGgahmAAAA AElFTkSuQmCC=="""))) VERSION = "webGobbler 1.2.8" # License text will be used in the GUI. LICENSE='''This program is distributed under the OSI-certified zlib/libpng license. http://www.opensource.org/licenses/zlib-license.php This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution.''' # Disclaimer text will be used in the GUI. DISCLAIMER='''IMPORTANT - READ This program downloads random images from the internet, which may include pornography or any morally objectionnable or illegal material. Due to the random nature of this program, the author of webGobbler cannot be held responsible for any URL this program has tried to reach, nor the images downloaded, stored or displayed on the computer. In consequence: - this program may not be safe for kids. - this program is definitely NSFW (not safe for work). Use at your own risks ! You are warned. You are advised this program may use copyrighted images. Thus the images generated by webGobbler are only suitable for private use. If you want to use it for non-private purposes, you may have to requests grants from the original image rights owners for each image composing the whole image. (The URLs of the last pictures used to generate current image can be found in the last_used_images.html file in the image pool directory.)''' # Default list of blacklisted images (based on their content) # These images will be discarded, whatever the name of the file or the # website address. # (You can do a sha1sum on the image you want to blacklist and put the SHA1 here) # (This list may be overwritten by saved configuration (.INI file or registry.)) BLACKLIST_IMAGESHA1 = { '142da07c8cfd0aa9bebb0b2f5939ad636bd474e5' : 0, # deviantArt.com "Poetry" logo 'd6ee67a52d8fbef935225de1363847d30a86b5de' : 0, # FortuneCity hosting/domaine names logo '6a92790b1c2a301c6e7ddef645dca1f53ea97ac2' : 0, # Flickr "photo not available" GIF } # Default list of blacklisted URLs # If an image comes from one of those URL, it will be discarded. # You can use * in URLs. An implicit * will be added at end (?a AdBlock) # Examples: BLACKLIST_URL = [ 'http://*.doubleclick.net/', 'http://ads.*.*/', 'http://*.*.*/adserver/','*/banners/' ] # (This list may be overwritten by saved configuration (.INI file or registry.)) BLACKLIST_URL = [ 'http://www.flickr.com/images/photo_unavailable.gif', 'http://*.deviantart.net/*/shared/poetry.jpg'] # Accepted MIME type. # Only these types will be considered images and downloaded. # key=MIME type (Content-Type), value=file extension (which will be used to save # the file in the imagepool directory) ACCEPTED_MIME_TYPES = { 'image/jpeg': '.jpg', 'image/gif' : '.gif', 'image/png' : '.png', 'image/bmp' : '.bmp', # Microsoft Windows space-hog file format 'image/pcx' : '.pcx', # old ZSoft/Microsoft PCX format (used in Paintbrush) 'image/tiff': '.tiff' } # --------------------------------------------------------------------------------------- class applicationConfig(dict): ''' An object capable of storing program configuration (in the form of a dictionnary). This class is not generic and is tailored to webGobbler. It behaves like a dictionnary object, but it also has methods to save/load to/from .INI, file and Windows registry. Note that this dictionnary only supports: - key which are strings. - values which are strings, integers or booleans. Using other types will raise errors. Some keys are special case (specific (de)serialization). Example: myconfig = applicationConfig() # Create a configuration (with default values) myconfig["pool.nbimages"] = 100 # Change the value of an existing parameter. myconfig["myparameter"] = "toto" # Add a new parameter myconfig.saveToFileInUserHomedir() # save to file. conf2 = applicationConfig() conf2.loadFromFileInUserHomedir() # Load previous saved file. print conf2["pool.nbimages"] # This displays 100 print conf2["myparameter"] = "toto" # This displays "toto" print conf2["pool.keepimages"] ''' # Default configuration: # This dictionnary contains the default configuration of the whole program. # Each class/thread will read an instance of this class to get its parameters. # The instance will be altered by the main() according to command-line parameters. # (key=parameter name, value=value of this parameter) # Here are the default values: DEFAULTCONFIG = { "network.http.proxy.enabled" : False, # (boolean) If true, will use a proxy (--proxy) "network.http.proxy.address" : "", # (string) Address of proxy (example) "network.http.proxy.port" : 3128, # (integer) Port of proxy (example) "network.http.proxy.auth.enabled" : False, # (boolean) Proxy requires authentication (--proxyauth) "network.http.proxy.auth.login" : "", # (string) Login for proxy. "network.http.proxy.auth.password": "", # (string) Password for proxy. "network.http.useragent" : "webGobbler/1.2.8",# (string) User-agent passed in HTTP requests. "collector.maximumimagesize" : 4000000, # (integer) Maximum image file size in bytes. If a picture is bigger than this, it will not be downloaded. "collector.acceptedmimetypes": ACCEPTED_MIME_TYPES, # (dictionnary) List of image types which will be downloaded. "collector.localonly" : False, # (boolean) If true, will collect images from local disk instead of internet (--localonly) "collector.localonly.startdir" : "/", # (string) When using local disk only, the directory to scan for images (default="/"=Whole disk.) "collector.keywords.enabled" : False, # (boolean) Use keywords for image search. If False, random generated words will be used. "collector.keywords.keywords": "cats", # (string) Keyword(s) for keyword search. Can be a single word or several words separated with a space (eg."cats dogs") "pool.imagepooldirectory" : "imagepool", # (string) Directory where to store image pool (--pooldirectory) "pool.nbimages" : 50, # (integer) Minimum number of images to maintain in pool (--poolnbimages) "pool.sourcemark" : "--- Picture taken from ", # (string) String used to store image source in image files. # If you change this string, you will have to delete all images from your pool. "pool.keepimages" : False, # (boolean) Do not delete images from the pool after use (--keepimage) "assembler.sizex" : 1024, # (integer) Width of image to generate (--resolution). Ignored for wallpaper changer and screensaver. "assembler.sizey" : 768, # (integer) Height of image to generate (--resolution). Ignored for wallpaper changer and screensaver. "assembler.mirror" : False, # (boolean) Horizontal mirror of image (to render text unreadable) (--mirror) "assembler.invert" : False, # (boolean) Invert (negative) final picture before saving (--invert) "assembler.emboss" : False, # (boolean) Emboss the final picture before saving (--emboss) "assembler.resuperpose" : False, # (boolean) Rotates and re-superposes the final image on itself. "assembler.superpose.nbimages": 20, # (integer) Number of images to superpose on each new image (--nbimages) "assembler.superpose.randomrotation": True, # (boolean) Rotate images randomly (--norotation to disable) "assembler.superpose.variante": 0, # (integer) Variantes of the superpose assembler (this give different results) (--variante). # 0=Equalize (default, recommended), 1=Darkening+autoConstrast. "assembler.superpose.bordersmooth": 30, # (integer) Size of border smooth (0 to disable border smooth.) "assembler.superpose.scale": float(1.0), # (float) Scale images before superposing them (--scale) "persistencedirectory" : ".", # (string) Directory where classes save their data between program runs "program.every" : 60, # (integer) Generate a new image every n seconds (--every) "debug" : False, # (boolean) debug mode (True will display various activity on screen and log into the file webGobbler.log) (--debug) "blacklist.imagesha1" : BLACKLIST_IMAGESHA1, # (dictionnary: key=hex SHA1 (string), value=0) List of images to blacklist (based on their content) "blacklist.url" : BLACKLIST_URL, # (list of strings) List of blacklisted URLs. "blacklist.url_re" : [] # (list of regular expression objets) Same as blacklist.url, but compiled as regular expressions. # (blacklist.url_re is automatically compiled from blacklist.url) } # FIXME: Add explanations on each parameter ? (in another dictionnary with the same keys ? CONFIG_HELP ?)) # The following parameters will not be exported to INI or registry, nor imported. # (Mostly because they are code dependant.) NONEXPORTABLE_PARAMETERS = { "collector.acceptedmimetypes":0, "collector.maximumimagesize":0, "blacklist.url_re": 0, "pool.sourcemark":0, "network.http.useragent":0 } CONFIG_FILENAME =".webGobblerConf" # Name of configuration file. CONFIG_SECTIONNAME = "webGobbler" # Name of section in .INI files. CONFIG_REGPATH = "Software\\sebsauvage.net\\webGobbler" # Registry key containing configuration # Maybe I could do a better job on configuration with this: # http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303347 def __init__(self): dict.__init__(self) self.data = {} self.update( applicationConfig.DEFAULTCONFIG ) # Start with default configuration: def __setitem(self,key,value): if not isinstance(key,basestring): raise TypeError, "applicationConfig only accepts strings as keys." self.data[key] = value # Store the value # Recompile the regular expressions if the list of blacklisted URL is changed. # (We also bock assignment to blacklist.url_re.) if key in ('blacklist.url', 'blacklist.url_re'): self.data['blacklist.url_re'] = [] for s in value: # We escape all characters in s, except * which we replace with .+? # We also add an implicit .+? at end. # Example: 'http://*.xiti.com/' --> 'http\:\/\/.+?\.xiti\.com\/.+?' if not s.endswith('*'): s += '*' s = '.+?'.join([re.escape(ss) for ss in s.split('*')]) self.data['blacklist.url_re'].append( re.compile(s,re.IGNORECASE) ) def toINI(self): ''' Outputs the configuration as a .INI file. Output: a string containing the configuration. ''' cp = ConfigParser.SafeConfigParser() cp.add_section(applicationConfig.CONFIG_SECTIONNAME) # Export all parameters, except the non-exportable ones. for key in self: if key not in applicationConfig.NONEXPORTABLE_PARAMETERS: # Serialize some special parameters: if key == 'network.http.proxy.auth.password': cp.set(applicationConfig.CONFIG_SECTIONNAME,key,self._garble(self[key])) # Garble the password. elif key == 'blacklist.imagesha1': # Serialize the list of blacklisted images cp.set(applicationConfig.CONFIG_SECTIONNAME,key,'|'.join(self[key].keys())) elif key == 'blacklist.url': # Serialize the list of blacklisted URLs cp.set(applicationConfig.CONFIG_SECTIONNAME,key,'|'.join([url.replace('%','%%') for url in self[key]])) # (For ConfigParser, % must be escaped to %%) else: # else, simply store the parameter as is. cp.set(applicationConfig.CONFIG_SECTIONNAME,key,str(self[key])) # ConfigParser can only write to a file --> create a pseudo-file (inifile) inifile = StringIO.StringIO() cp.write(inifile) data = inifile.getvalue() inifile.close() # Beautify the file by sorting parameters: lines = data.split('\n') sectionname = [lines[0]] # The [webGobbler] section delimiter parameters = lines[1:] # The parameters below parameters = [p for p in parameters if len(p.strip())>0] # remove empty lines generated by configparser parameters.sort() data = '\n'.join(sectionname+parameters) return data def fromINI(self, inidata): ''' Imports configuration from a .INI file. inidata : a string containing the .INI file. ''' inifile = StringIO.StringIO(inidata) cp = ConfigParser.SafeConfigParser() cp.readfp(inifile) # FIXME: try/catch ConfigParser exceptions ? for (name, value) in cp.items(applicationConfig.CONFIG_SECTIONNAME): doCoerceType = True if name == 'blacklist.imagesha1': # Deserialize the list of sha1 ( 'xxxx,yyy,zzz' --> ['xxx','yyy','zzz']) value = dict([ (v,0) for v in value.split('|')]) doCoerceType = False # We already manually coerced the type of this parameter. elif name == 'blacklist.url': # Deserialize the list of URLs value = value.split('|') doCoerceType = False # We already manually coerced the type of this parameter. if name in applicationConfig.DEFAULTCONFIG and doCoerceType: # If parameter exists in default parameters, coerce its type. defaultvalue = applicationConfig.DEFAULTCONFIG[name] obj = None try: if isinstance(defaultvalue,basestring): obj = str(value).strip() elif isinstance(defaultvalue,bool): if str(value).lower()=='true': obj = True else: obj = False elif isinstance(defaultvalue,int): obj = int(value) elif isinstance(defaultvalue,float): obj = float(value) elif isinstance(value,dict): raise NotImplementedError,"applicationConfig.fromINI() : serialization of dictionnary objects is not implemented." elif isinstance(value,list): raise NotImplementedError,"applicationConfig.fromINI() : serialization of list objects is not implemented." else: raise ValueError, "Could not convert parameter %s. Oops. Looks like an error in the program." % name except ValueError: raise ValueError, "Error in configuration: Parameter %s should be of type %s." % (name,type(defaultvalue) ) if name == 'network.http.proxy.auth.password': obj = self._ungarble(obj) # Ungarbles the password. if obj != None: self[name] = obj else: # else store this unknown parameter as string self[name] = value def loadFromRegistryCurrentUser(self): ''' Load configuration from Windows registry. ''' # We manually build a .INI file in memory from the registry. inilines = ['[%s]' % applicationConfig.CONFIG_SECTIONNAME] try: import _winreg except ImportError, exc: raise ImportError, "applicationConfig.loadFromRegistryCurrentUser() can only be used under Windows (requires the _winreg module).\nCould not import module because: %s" % exc try: key = _winreg.OpenKey( _winreg.HKEY_CURRENT_USER, applicationConfig.CONFIG_REGPATH,0, _winreg.KEY_READ) # Now get all values in this key: i = 0 try: while True: valueobj = _winreg.EnumValue(key,i) # mmm..strange, Should unpack to 3 values, but seems to unpack to more. Bug of EnumValue() ? valuename = str(valueobj[0]).strip() valuedata = str(valueobj[1]).strip() valuetype = valueobj[2] if valuetype != _winreg.REG_SZ: raise TypeError, "The registry value %s does not have the correct type (REG_SZ). Please delete it." % valuename else: if valuename not in applicationConfig.NONEXPORTABLE_PARAMETERS: inilines += [ '%s=%s' % (valuename,str(valuedata)) ] # Build the .INI file. i += 1 except EnvironmentError: pass # EnvironmentError means: "No more values to read". We simply exit the 'While True' loop. self.fromINI('\n'.join(inilines)) # Then parse the generated .INI file. except EnvironmentError: raise WindowsError, "Could not read configuration from registry !" _winreg.CloseKey(key) def saveToRegistryCurrentUser(self): ''' Save configuration to Windows registry. ''' # Note: this uses the output of self.toINI() # This method expects the .INI file to contain a single section, # started on first line, and no comments. # eg.[webGobbler] # assembler.emboss = False # assembler.sizex = 1024 try: import _winreg except ImportError, exc: raise ImportError, "applicationConfig.saveToRegistryCurrentUser() can only be used under Windows (requires the _winreg module).\nCould not import module because: %s" % exc try: key = _winreg.CreateKey(_winreg.HKEY_CURRENT_USER, applicationConfig.CONFIG_REGPATH) # Create or open existing key for line in self.toINI().split('\n')[1:]: pname = line.split('=')[0] # pname : everything before the first = strvalue = '='.join(line.split('=')[1:]) # strvalue : everything after the first = _winreg.SetValueEx(key, pname.strip(),0, _winreg.REG_SZ, strvalue.strip()) except EnvironmentError: raise WindowsError, "Could not write configuration to registry !" _winreg.CloseKey(key) def saveToFileInUserHomedir(self): ''' Save the configuration in .webGobblerConf in user's home dir. ''' # Mainly for Unix/Linux. Windows users will probably prefer saveToRegistryCurrentUser() inidata = self.toINI() userhomedir = os.path.expanduser('~') # Get user home directory. filepath = os.path.join(userhomedir,applicationConfig.CONFIG_FILENAME) file = open(filepath,"w+b") file.write(inidata) file.close() def configFilename(self): ''' Returns the absolute path where the .ini file is supposed to be read/saved. ''' return os.path.join(os.path.expanduser('~'),applicationConfig.CONFIG_FILENAME) def loadFromFileInUserHomedir(self): ''' Loads the configuration from .webGobblerConf in user's home dir. ''' # Mainly for Unix/Linux. Windows users will probably prefer loadFromRegistryCurrentUser() userhomedir = os.path.expanduser('~') # Get user home directory. filepath = os.path.join(userhomedir,applicationConfig.CONFIG_FILENAME) file = open(filepath,"rb") inidata = file.read(50000) file.close() self.fromINI(inidata) def _garble(self, text): ''' Returns a garbled version of a string. ''' # This is no replacement for a good cipher ! # text IS NOT ENCRYPTED. Is it only self-garbled. h=hashlib.sha1(text).digest() hk=h*int(len(text)/20+1) et=''.join([chr(ord(text[i])^ord(hk[i])) for i in range(len(text))]) return binascii.hexlify(h+et) def _ungarble(self, text): ''' Un-garbles the text garbled with _garble(). ''' d=binascii.unhexlify(text) (h,t)=(d[0:20],d[20:]) hk=h*int(len(t)/20+1) return ''.join([chr(ord(t[i])^ord(hk[i])) for i in range(len(t))]) # == Classes =================================================================== class commandToken: ''' Command tokens used to send commands to threads. ''' def __init__(self, shutdown=None, stopcollecting=None, collect=None, collectnonstop=None,superpose=None): self.shutdown = shutdown # Collector and pool: Order to shutdown. The thread should stop working and quit (exit the run() method.) self.collect = collect # Collector: Collect n images and stop. (value = the number of images to collect) self.collectnonstop = collectnonstop # Collector: Collect images continuously self.stopcollecting = stopcollecting # Collector: The treads should stop collecting images, but not shutdown. self.superpose = superpose # Assembler_superpose: Superpose images now. class internetImage: ''' An image from the internet. Will download the image from the internet and assign a unique name to the image. Maximum image size: 2 Mb. (Download will abort if file is bigger than 2 Mb.) Used by: collectors. Example: i = internetImage("http://www.foo.bar/images/foo.jpg",applicationConfig()) if i.isNotAnImage: print "Image discarded because "+i.discardReason else: i.saveToDisk("c:\\my pictures") # Save the image to disk. i.getImage() # Get the PIL Image object. ''' def __init__(self,imageurl,config): ''' imageurl (string): url of the image to download. config (applicationConfig object) : the program configuration ''' self.imageurl = imageurl # URL of this image on the internet self.imagedata = None # Raw binary image data (as downloaded from the internet) self.filename = None # Image filename (computed from self.imagedata) self.isNotAnImage = True # True if this URL is not an image. self.discardReason = "" # Reason why self.CONFIG=config # If the URL of the image matches any of the blacklisted URLs, we discard the image. for regexp in self.CONFIG["blacklist.url_re"]: if regexp.match(imageurl): # FIXME : protect against maximum recursion limited exceeded exception ? self.discardReason = "URL is blacklisted" return # Discard the image. #FIXME: Handle passwords required on some pages (Have to use fancy_url opener or urllib2 ?) # (Those URLs have to be skipped) # Build and send the HTTP request: request_headers = { 'User-Agent': self.CONFIG["network.http.useragent"] } request = urllib2.Request(imageurl, None, request_headers) # Build the HTTP request try: urlfile = urllib2.urlopen(request) except urllib2.HTTPError, exc: if exc.code == 404: self.discardReason = "not found" # Display a simplified message for HTTP Error 404. else: self.discardReason = "HTTP request failed with error %d (%s)" % (exc.code, exc.msg) return # Discard this image. # FIXME: display simplified error message for some other HTTP error codes ? except urllib2.URLError, exc: self.discardReason = exc.reason return # Discard this image. except Exception, exc: self.discardReason = exc return # Discard this image. #FIXME: catch HTTPError to catch Authentication requests ? (see urllib2 manual) # (URLs requesting authentication should be discarded.) # If the returned Content-Type is not recognized, ignore the file. # ("image/jpeg", "image/gif", etc.) MIME_Type = urlfile.info().getheader("Content-Type","") if not self.CONFIG["collector.acceptedmimetypes"].has_key(MIME_Type): urlfile.close() self.discardReason = "not an image (%s)" % MIME_Type return # Get the file extension corresponding to this MIME type # (eg. "imag/jpeg" --> ".jpg") file_extension = self.CONFIG["collector.acceptedmimetypes"][MIME_Type] # Check image size announced in HTTP response header. # (so that we can abort the download right now if the file is too big.) file_size = 0 try: file_size = int( urlfile.info().getheader("Content-Length","0") ) except ValueError: # Content-Length does not contains an integer urlfile.close() self.discardReason = "bogus data in Content-Length HTTP headers" return # Discard this image. # Note that Content-Length header can be missing. That's not a problem. if file_size > self.CONFIG["collector.maximumimagesize"]: urlfile.close() self.discardReason = "too big" return # Image too big ! Discard it. # Then download the image: try: self.imagedata = urlfile.read(self.CONFIG["collector.maximumimagesize"]) # Max image size: 2 Mb except: self.discardReason = "error while downloading image" urlfile.close() pass # Discard image if there was a problem downloading it. urlfile.close() # Check image size (can be necessary if Content-Length was not returned in HTTP headers.) try: if len(self.imagedata) >= self.CONFIG["collector.maximumimagesize"]: # Too big, probably not an image. self.discardReason = "too big" return # Discard the image. except TypeError: # Happens sometimes on len(self.imagedata): "TypeError: len() of unsized object" self.imagedata = "no data" return # Discard the image. # Make sure image is not blacklisted. datahash = hashlib.sha1(self.imagedata).hexdigest() if datahash in self.CONFIG["blacklist.imagesha1"]: self.imagedata = "blacklisted" return # Discard the image. # Compute filename from file SHA1 imagesha1 = hashlib.sha1(self.imagedata).hexdigest() if self.CONFIG["blacklist.imagesha1"].has_key(imagesha1): # discard blacklisted images self.discardReason = "blacklisted" return self.filename = 'WG'+imagesha1+file_extension # SHA1 in hex + image extension self.imagedata += self.CONFIG["pool.sourcemark"] + self.imageurl # Add original URL in image file self.discardReason = "" self.isNotAnImage = False # The image is ok. def getImage(self): ''' Returns the image as a PIL Image object. Usefull for collectors to read image properties (size, etc.) Output: a PIL Image object. None if the image cannot be understood. ''' if self.isNotAnImage: return None imageparser = ImageFile.Parser() # from the PIL module image = None try: imageparser.feed(self.imagedata) image = imageparser.close() # Get the Image object. return image except IOError: # PIL cannot understand file content. self.isNotAnImage = True return None def saveToDisk(self, destinationDirectory='imagepool'): ''' Save the image to disk. Filename will be automatically computed from file content (SHA1). This eliminates duplicates in the destination directory. Input: destinationDirectory (string): The destination directory. Do not specify a filename (Filename is automatically computed). ''' if self.isNotAnImage: raise RuntimeError, "This is not an image. Cannot save." # Shame shame, the caller should have discarded this image already ! # FIXME: Should I implement try/except on the following file write operation ? try: file = open(os.path.join(destinationDirectory,self.filename),'w+b') file.write(self.imagedata) file.close() except IOError: pass # Ignore this image... nevermind. class collector(threading.Thread): ''' Generic collector class. Implements methods common to all collectors. (This class implements all the thread logic and message handling.) Must be derived. Derived classes must implement: self.name in the constructor (String, name of the collector (eg."self.name=collector_deviantart")) method self._getRandomImage(self) (downloads a random image.) _getRandomImage() will be called continuously. _getRandomImage() should terminate fast (ideally get only one picture) Used by: imagePool ''' def __init__(self,config,dictionnaryFile=None): ''' Download random images config (an applicationConfig object) : the program configuration dictionnaryFile (string): A filename+path to an optionnal word dictionnary. ''' threading.Thread.__init__(self) self.inputCommandQueue = Queue.Queue() # Input commands (commandToken objects) self.numberOfImagesToGet = 0 # By default, do not start to collect images. self.continuousCollect = False self.dictionnaryFile = dictionnaryFile # Optional word dictionnary self.name="collector" self.CONFIG=config self.statusLock = threading.RLock() # A lock to access collector status. self.status = ('Stopped','') # Status of this collector def _logDebug (self,message): logging.getLogger(self.name).debug (message) def _logInfo (self,message): logging.getLogger(self.name).info (message) def _logWarning (self,message): logging.getLogger(self.name).warning (message) def _logError (self,message): logging.getLogger(self.name).error (message) def _logCritical (self,message): logging.getLogger(self.name).critical (message) def _logException(self,message): logging.getLogger(self.name).exception(message) # Thread activity methods: def collectAndStop(self,n): ''' Ask this collector to collect n images and stop. ''' self.inputCommandQueue.put(commandToken(collect=n),True) def collectNonStop(self): ''' Ask this collector to collect images and never stop. ''' self.inputCommandQueue.put(commandToken(collectnonstop=1),True) def stopcollecting(self): ''' Ask the thread to stop collecting images ASAP (this may not be right now). ''' self.inputCommandQueue.put(commandToken(stopcollecting=1),True) # Thread life methods: def shutdown(self): ''' Ask this thread to die. ''' self.inputCommandQueue.put(commandToken(shutdown=1),True) def run(self): ''' Main thread loop. ''' while True: try: commandToken = self.inputCommandQueue.get_nowait() # Get orders # Handle commands put in the command queue: if commandToken.shutdown: self._logDebug("Shutting down.") self._setCurrentStatus('Shutting down','') return # Exit the tread. elif commandToken.collect: # Order to collect n images if self.numberOfImagesToGet==0: self._logDebug("Starting to collect %d images..."%commandToken.collect) self.numberOfImagesToGet = commandToken.collect self.continuousCollect = False elif commandToken.collectnonstop: # Order to collect continuously if not self.continuousCollect: self._logDebug("Starting to collect images non-stop...") self.continuousCollect = True self.numberOfImagesToGet = 0 elif commandToken.stopcollecting: # Stop collecting images if (self.numberOfImagesToGet>1) or self.continuousCollect: self._logDebug("Stopped") self._setCurrentStatus('Stopped','') self.numberOfImagesToGet = 0 self.continuousCollect = False else: self._logError("Unknown command token") pass # Unknown command, ignore. except Queue.Empty: # Else (if no command is available), do some stuff. try: if self.continuousCollect: # collect continuously self.numberOfImagesToGet = 1 self._getRandomImage() # This call must decrement self.numberOfImagesToGet time.sleep(0.25) elif self.numberOfImagesToGet > 0: self._getRandomImage() # This call must decrement self.numberOfImagesToGet time.sleep(0.25) else: time.sleep(0.25) except Exception, exc: self._logException(exc) # Log any unexpected exception def _setCurrentStatus(self,status,information): ''' Sets the current status so that it can be read by others. ''' # self.statusLock.acquire() self.status = (status,information) self.statusLock.release() def getCurrentStatus(self): ''' Returns the current status of the collector. Output: a tuple (status, information) status (string): 'Querying','Downloading','Stopped','Waiting','Error' or other string specific to a collector. (Note that collector may use different status.) information (string): 'abc','http://...','60 seconds' (information complementary to status.) ''' self.statusLock.acquire() status,information = self.status self.statusLock.release() return (status,information) def _getRandomImage(self): ''' Each derived class must implement this method. The method: - may perform several requests on the internet (but ideally only one) - should download at least one image (but has the right to fail) - should return as soon as possible (short execution time, ideally 1 second but can be much more ) - must decrement self.numberOfImagesToGet by 1 if successfully downloaded an image (and considers the image is to be kept.) This method will be automatically called again 0.25 seconds after completion, continuously (except when the pool decides there are enough images.) ''' self._logError("collector._getRandomImage() is not implemented.") raise NotImplementedError,"collector._getRandomImage()" def _generateRandomWord(self): ''' Generates a random word. This method can be used by all derived classes. Usefull to get random result from search engines when you do not have a dictionnary at hand. The generated word can be a number (containing only digits), a word (containing only letters) or both mixed. Output: string (a random word) Example: word = self._generateRandomWord() ''' # FIXME: To implement #if self.dictionnaryFile: # ...get word from dictionnary... #else: # ...the old standalone method below... word = '1' if random.randint(0,100)<30: # Sometimes use only digits if random.randint(0,100)<30: word = str(random.randint(1,999)) else: word = str(random.randint(1,999999)) else: # Generate a word containing letters word = '' charset = 'abcdefghijklmnopqrstuvwxyz' # Search for random word containing letter only. if random.randint(0,100)<60: # Sometimes include digits with letters charset = 'abcdefghijklmnopqrstuvwxyz'*2 + '0123456789' # *2 to have more letters than digits for i in range(random.randint(2,5)): # Only generate short words (2 to 5 characters) word += random.choice(charset) return word def _parsePage(self,url,regex=None): ''' Download a specified HTML page and optionnally runs a regular expression on it. Input: url (string) : The URL of the page to download. regex : Compiled regular expression (obtained with re.compile) If regex is None, the page will be returned as is. Output: A tuple (htmlpage,results) htmlpage is the raw HTML response page. (None in case of error) results is an array containing the regular expression results (as returned by re.findall()) or None Examples: # Just return the page: (htmlpage,results) = parsePage('http://google.com') # Just return the page. # Get cats image URLs: (htmlpage,results) = parsePage('http://images.google.com/images?q=cats&hl=en',re.compile('imgurl=(http://.+?)&',re.DOTALL|re.IGNORECASE)) if (!htmlpage): print "Error getting page" if (!results): print "No results." ''' htmlpage = '' results = None try: request_headers = { 'User-Agent': self.CONFIG["network.http.useragent"] } request = urllib2.Request(url, None, request_headers) # Build the HTTP request htmlpage = urllib2.urlopen(request).read(2000000) # Read at most 2 Mb. # FIXME: catch specific HTTP errors ? # FIXME: return HTTP errors ? except Exception, exc: self._logError('parsePage("'+url+'"): '+repr(exc)) return (None,None) if regex: results = regex.findall(htmlpage) return (htmlpage,results) class collector_local(collector): ''' This collector does not use the internet and only searches local harddisks to find images. Used by: imagePool. ''' def __init__(self,**keywords): ''' Parameters: config (applicationConfig object) : the program configuration ''' collector.__init__(self,**keywords) # Call the mother class constructor. self.directoryToScan = self.CONFIG["collector.localonly.startdir"] self.name="collector_local" self.remainingDirectories = [self.directoryToScan] # Directories to scan self.filepaths = {} # Paths to images def _getRandomImage(self): if len(self.filepaths) < 2000: # Stop scanning directories if we have more than 2000 images filenames for i in range(5): # Read 5 directories if len(self.remainingDirectories)>0: directory = random.choice(self.remainingDirectories) # Get a directory to scan self.remainingDirectories.remove(directory) self._logDebug("Reading directory %s" % directory) self._setCurrentStatus('Reading directory',directory) # Scan the directory: files = [] try: files = os.listdir(directory) except: pass # I probably do not have access rights to this directory. Skip it silentely. for filename in files: filepath = os.path.join(directory,filename) # FIXME: I should try/except isdir() and isfile() in case the directory/file # was removed (or I do not have access rights) if os.path.isdir(filepath): # Avoid /mnt /proc and /dev pathsToAvoid = ('/mnt/','/proc/','/dev/') # Paths to avoid under *nixes systems. if not (filepath.startswith('/mnt/') or filepath.startswith('/proc/') or filepath.startswith('/dev/')): self.remainingDirectories += [filepath] # This is a new directory to scan elif os.path.isfile(filepath): (name,extension) = os.path.splitext(filename) if extension.lower() in ('.jpg','.jpeg','.jpe','.png','.gif','.bmp','.tif','.tiff','.pcx','.ppm','tga'): self.filepaths[filepath] = 0 # Keep file path # If there are no more directories to scan, restart all over: if len(self.remainingDirectories) == 0: self.remainingDirectories = [self.directoryToScan] # Now choose a random image from scanned directories and copy it to the pool directory if len(self.filepaths) > 0: filepath = random.choice(self.filepaths.keys()) # Choose a random file path del self.filepaths[filepath] # Remove it from the list self._logDebug("Getting %s" % filepath) self._setCurrentStatus('Copying file',filepath) try: #... and copy the image to the pool directory file = open(filepath,'rb') imagedata = file.read(2000000) # Max 2 Mb for local images file.close() except: imagedata = '' # Discard image if there was a problem reading the file. if (len(imagedata)>0) and (len(imagedata) < 2000000): # Compute filename from file SHA1 imagesha1 = hashlib.sha1(imagedata).hexdigest() if not self.CONFIG["blacklist.imagesha1"].has_key(imagesha1): extension = filepath[filepath.rfind("."):].lower() # Get file extension outputfilename = 'WG'+imagesha1+extension # SHA1 in hex + original image extension imagedata += self.CONFIG["pool.sourcemark"] + filepath # Add original URL in image file # and save the image to disk. # FIXME: try/except file creation: file = open(os.path.join(self.CONFIG["pool.imagepooldirectory"],outputfilename),"w+b") file.write(imagedata) file.close() time.sleep(0.25) #Be gentle with other threads class collector_deviantart(collector): ''' This collector gets random images from http://deviantART.com, an excellent collaborative art website. Anyone can post its creations, and visitors can comment. Site contains photography, drawings, paintings, computer-generated images, etc. Used by: imagePool. ''' # Regular expression used to extract the image URL from a random deviant Art page. RE_IMAGEURL = re.compile('',re.DOTALL|re.IGNORECASE) # Regular expression to extract the maximum deviantionID from homepage RE_ALLDEVIATIONID = re.compile('href="http://www.deviantart.com/morelikethis/(\d+)"',re.DOTALL|re.IGNORECASE) def __init__(self,**keywords): collector.__init__(self,**keywords) # Call the mother class constructor. self.name="collector_deviantart" self.max_deviationid = -1 # We do not know yet what if the maximum deviationID self.deviationIDs = [] # List of deviantionIDs (DeviantArt picture identifier). Used only for keyword search. self.imageurltoget = "" # URL of image to get. self.waituntil = 0 # Wait until this date. def _getRandomImage(self): if time.time()< 0: # If we do not know the maximum deviationid: # Get the maximum deviationID from the Homepage: self._logDebug("Getting maximum deviationID from homepage.") self._setCurrentStatus('Querying','Maximum deviationID from homepage') request_url = "http://browse.deviantart.com/?order=5" (htmlpage,results) = self._parsePage(request_url,collector_deviantart.RE_ALLDEVIATIONID) if not htmlpage: # Error while getting page. self._logWarning("Unable to contact deviantArt.com. Waiting 60 seconds.") self._setCurrentStatus('Error','Unable to contact deviantArt.com. Waiting 60 seconds.') self.waituntil = time.time()+60 elif not results: # Regex returned no result in page. # If no deviationID was found in homepage, display an error message and stop collecting. self._logWarning("Could not find any deviationID from homepage. Website changed ?") self._setCurrentStatus('Error','Could not find any deviationID from homepage. Website changed ?') if self.CONFIG["debug"]: filename = "debug_deviantart_%s.html"%hashlib.sha1(htmlpage).hexdigest() self._logDebug("(See corresponding HTML page saved in %s)" % filename) open(filename,"w+b").write(htmlpage) # Write bogus html page to debug self.stopcollecting() return # Get the maximum deviationID: for result in results: try: self.max_deviationid = max(self.max_deviationid,int(result)) except ValueError: # could not convert to int pass # ignore value if self.max_deviationid > 0: self._logDebug("Max deviationid = %d"%self.max_deviationid) return if len(self.imageurltoget)==0: # If we do not have the URL of an image, get a random DeviantArt page. deviationid = 0 if self.CONFIG['collector.keywords.enabled']: # If we do not have enough deviations corresponding to the search word, # let's run a search on deviantArt search engine: if len(self.deviationIDs)<40: # If keyword search is enabled, we use the search engine of DeviantArt # to get a list of deviationID (images). wordToSearch = self.CONFIG['collector.keywords.keywords'] self._logDebug("Querying %s" % wordToSearch) self._setCurrentStatus('Querying',wordToSearch) # Get the search result page: request_url = "http://browse.deviantart.com/?order=5&q=%s&offset=%d" % (urllib.quote_plus(wordToSearch),random.randint(0,300)*24) (htmlpage,results) = self._parsePage(request_url,collector_deviantart.RE_ALLDEVIATIONID) if not htmlpage: self._logInfo("Unable to contact DeviantART.com. Waiting 60 seconds.") # Nevermind temporary failures self._setCurrentStatus('Error','Unable to contact DeviantART.com. Waiting 60 seconds.') self.waituntil = time.time()+60 return if not results: self._logWarning("Could not find any deviationID from homepage. Website changed ?") self._setCurrentStatus('Error','Could not find any deviationID from homepage. Website changed ?') if self.CONFIG["debug"]: filename = "debug_deviantart_%s.html"%hashlib.sha1(htmlpage).hexdigest() self._logDebug("(See corresponding HTML page saved in %s)" % filename) open(filename,"w+b").write(htmlpage) # Write bogus html page to debug self.stopcollecting() return else: # Search for devationIDs in the result page. for result in results: try: deviationid = int(result) if random.randint(0,2)<2: # We keep some of these images. self.deviationIDs.append(deviationid) except ValueError: # could not convert to int pass # ignore value # Pick a random deviationID corresponding to the search word: deviationid = random.choice(self.deviationIDs) self.deviationIDs.remove(deviationid) else: # Get a random deviation page: deviationid = random.randint(1,self.max_deviationid) # choose a random deviation self._logDebug("Getting deviation page %d" % deviationid) self._setCurrentStatus('Querying','Deviation page %s' % deviationid) request_url = "http://www.deviantart.com/deviation/%d/" % deviationid (htmlpage,results) = self._parsePage(request_url,collector_deviantart.RE_IMAGEURL) if not htmlpage: self._logInfo("Unable to contact DeviantART.com. Waiting 60 seconds.") # Nevermind temporary failures self._setCurrentStatus('Error','Unable to contact DeviantART.com. Waiting 60 seconds.') self.waituntil = time.time()+60 return if not results: if len(htmlpage.strip())==0: self._logInfo("Empty page - skipped.") self._setCurrentStatus('Skipped','Empty page. Skipped.') self.waituntil = time.time()+1 elif '