Posts Tagged ‘Google’

WebP, the Google use case

Tuesday, October 26th, 2010

When Google announced the WebP image file format, the immediate reaction in the blogosphere and media focussed on the chances of success to “unseat JPEG”, rather than the analysis of the compression approach or at possible ways to improve upon that. It is nothing new or surprising to see journalists behave that way. And even if WebP comes out as the better performing file system, there is little doubt over the existence of an “Chicken or egg”-problem:

a) Why should I store my images as WebP if no-one else and most web browsers won’t be able to display it?

b) Why should my software support WebP if no-one stores images in this format?

In order to break the circle, you rely on enthusiasts or you create smaller scenarios in which both sides of the problem are dealt with directly. Some of the enthusiasts have already started working and there is a growing number of programs or similar that can read or write WebP files. For the remaining part, it is now entirely up to Google to demonstrate its commitment to WebP, at least as a secondary lossy file format. The opportunities to deploy WebP are obvious and have been discussed already at other places:

1. Display WebP

Google is the manufacturer of the Google Chrome web browser, only two years old but already among the four largest browsers (depending on which statistics you forge). According to the public bug tracking system, WebP support is scheduled for the upcoming version 9 of Google Chrome. Very short from now, the first canary build or developer channel versions of 9.0 will be deployed to the user base, hopefully with WebP support enabled. This will actually be the first more or less convenient way to display the images after all. No more converting to PNG. Given the current pace of Chrome developement (and Google’s willingness to increment the version number faster than other people say M-S-I-E-Nine), we can expect a stable version 9 around new year, give or take a few months. So even without any support from other browser vendors, 1 in 10 web users will be able to display WebP images without any further tool to install. That’s a good start for deployment.

WebP is meant to be small and my early peek at its performance tells me this is promise they can keep. The everyday scenario in which size is usually most important is accessing the web via your smartphone from a crappy 3G network. And again, Google has its own software platform, Google Android. It is entirely up to Google to add WebP support to Android so that at least newer smartphone could access it. We might see WebM support for Android soon, so the library would already be there.

With Google Picasa, there is also a classic application, installed locally on the machine of the user. Support for WebP (both read and write) will help promote this file format. And Picasa.

2 Serve WebP images whenever possible.

If you look at a gallery of images at Picasaweb, you simply don’t care about the original file format and the method of transportation for looking at the thumbnails or the gallery as long as it’s fast. Google could replace JPEG with WebP images whenever the Browser indicates support for this image format. It could indicate the WebP usage with a tiny little icon in some corner. There is a precedence for both, the support for “HTML5+WebM” at Youtube.

3. Lower the hurdle

Most People will play with something if it does not involve installation of cygwin, downloading new packages and compiling your own webpconv converter tool.  Most people would enjoy a simple drag-drop file uploader that outputs a WebP file and offers a few sliders for quality adjustment. Google did the right thing in offering its comparisons in a file format other browsers could understand but there is much much more possible.

4. Promote it the way a image file format can be promoted

I note that google.com does not yet offer content in WebP where it would be possible. Where is the effort to offer .webp files at http://www.google.com/press/images.html next to the jpeg ones? I am willing to assume that many Apple Quicktime installations only happened because someone wanted to look at a movie trailer in Full HD. Someone at Google might do the calculation and see if a web site for Boston Globe’s “the big picture” in “HD” or some excellent flickr content or any other “news in pictures” stream might be worth the money and/or effort spent. Ask Canon to offer a special firmware to the Canon EOS 1Ds or 5D that produces .webp files already.

WebP, the Wikipedia use case

Saturday, October 23rd, 2010

Two days ago, I wrote about the WebP file format and some early impressions on its performance. I am grateful for the responses I got, including a conversation with Brion Vibber, former CTO at the Wikimedia Foundation.

As a result of this conversation, there is now one more feature request in Wikimedia’s bug tracking system: Bug 25611: Support optimized WebP thumbnails as alternative to JPEG

If you access a Wikipedia page, you are likely to see images in the article. Usually, these images are coming from Wikimedia Commons and we encourage people to upload these images in a high resolution. Thanks to the MediaWiki syntax, it is rather easy to adjust the image size via the thumbnail parameter.

The file format of thumbnails depends on the file format of the original image. JPEG and TIFF images are thumbnailed as JPEG, SVG and PNG are downsized to PNG files.

If you access a rather large Wikipedia articles, the size of the thumbnails will reach a single digit Megabyte level, for example the current version of the German language article on Germany.

Size of the Article "Germany" at de.wikipedia.org

If a single Wikipedia article (and not even the largest one) is 3.67 MB large and 2.7 MB of it are images, a reduction in size would make a difference to the readership where the bottleneck is bandwidth. In the case of the article on Germany, a larger part of the images are in PNG format. By the way, the screenshot is taken from Google Chrome’s developer tools, a great way to see what is going on between a web site and your browser.

For the purpose of a small experiment, I picked the article on Albrecht Dürer in the English language Wikipedia.

Total size of the article (including stylesheets and website logo and icons): 892.74 KB
Size of images: 352.51 KB.

Text compression will turn a 126.53 KB html file into 33.05 KB.

Several .js and .css files with slightly above 400 KB should remain in the browser cache if you have accessed Wikipedia before. Therefor, if you are a non-first time Wikipedia visitor, the thumbnails of the article will easily be the by far largest chunk of data you will have to download.

In this example, there are 18 JPEG thumbnails ranging from 5122 to 28788 Bytes. The image size ranges from 100*146 to 220*343 Pixels. In the current compression, Bytes-per-pixel starts at 1,42 and ends at 5,03.

All images were available at Wikimedia in a higher resolution from slightly-above-thumbnail (311*485) to reasonably-large (2192*2831).

The original thumbnail (13.586 Bytes)

In my test, I will load the original files and convert them to BMP files of the size of the thumbnails. I will convert these files to .webp using whatever quality the webpconv will pick and additionally at a quality setting that I consider to be still useful.

The quality parameter used by webpconv itself was always between 81 and 85, resulting in 18 files that sum up to 350 KB – 26 KB larger than the sum of the JPEG thumbnails created by the MediaWiki software.

With quality 50, I end up at 181 KB, 44% smaller than the JPEG image set.

WebP auto setting: 9.641 Bytes

With quality 20, we end up with a severe reduction in visual quality and a total size of 75,8 KB. Several images are distorted almost beyond recognition, others still serve the purpose of a thumbnail perfectly fine. If the purpose was a reduction of size of 50%, a quality setting of 45 would fit. In any way, the comparison is not entirely possible in my comparison since I did not use any image manipulation other than image size reduction and compression. Best of my knowledge, MediaWiki’s software tools apply contrast and sharpness adjustments during the thumbnailing process.

WebP at quality 50: 4.163 Bytes (a.k.a "your hair is gone")

I am going to end this post with a series of thumbnails both in the original thumbnail version and various stages of WebP compression. Since your browser is most likely unable to display .webp files directly, I converted the images to .png in a lossless manner.

WebP at Q=20 : 1.367 Bytes (a.k.a. "the entire Dürer gone")

In order to give a recommendation on WebP, you have to keep in mind the purpose of a thumbnail: A quick impression on the content of an image. You have to see what is in the image, you don’t have to see every tiny detail of it. Anyone who wants to see more can click on the image and look at the (hopefully larger) image without further compression and resizing. WebP at q=50 manages this job quite well with an impressive reduction in size. Keeping in mind that thumbnails are ususally the largest part of a new Wikipedia site (if you have already the css and js stuff in your browser cache), a size reduction here would severely affect the total size of a page. Contrast and Sharpening could also be applied here and might improve the visual quality as perceived by its users.

I welcome your feedback.

Preliminary observations regarding the WebP image file format

Friday, October 22nd, 2010

Less than a month ago, Google announced an early version of a new lossy file format called WebP. If that name sounds familiar to you, you are right, it shares the technology of the WebM video file format. The central claim of WebP is

WebP offers compression that has shown 39.8% more byte-size efficiency than JPEG for the same quality in a large scale study of 900,000 images on the Web.

Some description about the study can be found at Google.com. There is also a gallery with a couple of images taken from the media repository Wikimedia Commons. Since Google has released the tool to create .webp images, I was able to conduct some tests on my own.

Elke Twesten, member of the state parliament of Lower Saxony, Germany

I did not want to pollute my test with an image that had already been compressed in a lossy format before. Since my camera can output both JPEG and CR2 (canon raw) files, I could use one of the RAW images and convert them to a nice 61 MByte large image, 5616 * 3744 pixel in size. The first image was a portrait image of a German politician, Elke Twesten from the state parliament of Lower Saxony. A JPEG version of an image of this series can be found at Wikimedia Commons under a Creative Commons license. Since I used a flash for this image, the quality of the image is reasonably good for comparison purposes and some features of the image, such as hair and the eye can potentially reveal some strengths and weaknesses of the codecs.

I also produced a 1600 pixel and a 640 pixel wide version of the image, since most use cases in the web will not require images with more than 5000*3000 pixels. So for this first run, I end up with three files:

  1. twesten-5616.bmp – 63.078.966 Bytes – md5: b5ce657dc865adc9c16cf1d05d120681
  2. twesten-1600.bmp – 5.116.854 Bytes – md5: eba526672acb7261b2b3112376ace55a
  3. twesten-0640.bmp – 817.974 Bytes - md5: a68aae497299976254142f00be483c45

The webpconv converter tool allows to add a quality parameter ranging from 0 to 100. As the number increases, so does the size of the image and hopefully the quality, too.

Image size of the three twesten images depending on quality parameter

Right now, there is no software available to display .webp files directly. Google Chrome might be getting support for this file format soon, based on the changes in the SVN.

So in order to make easy comparisons, I went ahead and converted the WebP files back to PNG, not adding additional compression artifacts.

The conversion from .bmp to .jpg was done via ImageMagick 6.4.0 with just the quality setting adjusted from 1 to 100. The relationship between quality parameter setting and size is a bit different, please note that the y-axis is already using a logarithmic scale.

JPEG size with the three twesten images depending on the quality parameter setting.

So we basically end up with three times 100 pairs of images, one in JPEG, one in WebP each. Since the quality parameter is not something that can be directly compared to between these formats, there are two approaches for comparison:

  1. Compare the size of two images with the same visual quality.
  2. Compare the visual quality of two images with the same size.

While size is something that can be measured and reasonably be expressed in a single number, quality is in the eye of the beholder. The twesten-0640 images in JPEG range from 2.303 Bytes to 188.898 Bytes, the WebP images range from 1.552 Bytes to 46.638 Bytes. For the range between 2.303 to 46.638 Bytes, there are two reasonably similar files in each format available.

Let’s start with the smallest one: twesten-0640-q13.webp (2303 Bytes) vs twesten-0640-q1.jpeg:

Twesten-0640 at JPEG with quality 1: 2303 Bytes

Twesten 0640 at Webp-Quality 13: 2303 Bytes

Both images do not even remotely qualify as high quality but it is impressive to see how far both images are apart.

The next pair I picked was WebP-q39 with 5736 Bytes and JPEG-q17 with 5754 Bytes respectively.

JPEG Q:17 / Size: 5754 Bytes

WebP Q:39 / Size:5736 Bytes

I still feel confident to declare WebP a winner in this comparison, the quality in the WebP image has now reached a point that no longer causes eye cancer. It is my personal impression that WebP is able to keep the distance to JPEG, a format that is now more than 20 years old.

If anyone thinks there are jpeg converters out there who can produce less terrible results, please let me know. I am more than happy to make my original files available to anyone under a very permissive license term (CC-by or CC-by-sa).

Or you can simply download the twesten-1600.bmp file here.

If you think the image itself was particularly unfair to one format, feel free to release different images (as uncompressed as possible) to allow others to compare their findings. I would love to see a web application where you can upload uncompressed images and receive a selection of JPEG, JPEG2K, JPEG XR and WebP compressed versions in various qualities and sizes.

As Brion rightfully noted, the examples I gave in this post so far are far from realistic, unless you really want to squeeze every bit. When bandwidth is measured in Megabits per second, there is little reason to settle below a certain minimum standard of visual quality, so I would like to give some examples from the upper third.

The Twesten-1600 image starts at 6.647 Bytes (q:0) when compressed with WebP and it ends at 246.970 Bytes (q:100). JPEG is a different beast with q1= 11.733 Bytes to q100=1.125.070 Bytes. I am going to show parts of the images at 400%.

Uncompressed BMP at 400%

JPEG at q=70 / 88912 Bytes / 400%

WebP at q=80 / 77551 Bytes

While being 13% smaller, the visual impression of the WebP image is still better than with JPEG, where artifacts are clearly visible. WebP seems to cover some of these artifacts up by a rather aggressive blurring, detailed structures still survive this process.

Google hat heute seinen Microsoft-Tag

Saturday, January 31st, 2009

Seit 24 Stunden geht es bei Google hier und da durcheinander:

  1. Google News liefert regelmäßig Aussetzer
  2. Das halbe Internet wurde als “may harm your computer!” klassifiziert.
  3. Gmail sortiert Emails von Bekannten in den Spamordner (nichts neues) mit dem Hinweis, man vermute, die Email stammte gar nicht von dem jeweiligen Bekannten (neu).

Diese drei Phänomene sind vermutlich nicht zusammenhängend, für Punkt 2 gibt es inzwischen eine Erklärung: / wurde als Malware eingestuft.

Britannica wird Britannica

Saturday, January 24th, 2009

In den Medien geistert gerade eine Meldung herum über eine künftige Erweiterung der Software von britannica.com, einer Seite mit Inhalten von Encyclopaedia Britannica. (Zahlende) Leser sollen darüber etwas zielgerichteter Feedback geben können über die Inhalte dieses Nachschlagewerkes.

Vor vier Jahren hat sowas auch Encarta eingeführt (nicht, dass etwas daraus wurde). Meyers Lexikon hat ähnliche Methoden, um die Vorschläge von seinen Nutzern einzubinden, nach MediaWiki nutzt man dort inzwischen Confluence.

Im Blog von Britannica findet sich der zaghafte Versuch, diesem medialen Wind so etwas wie Substanz beizufügen. Unter anderem verweist man auf den Text der inzwischen recht alten Ankündigung, die weiterhin Gültigkeit habe – vor allem, weil bis heute nichts von den Ankündigungen in die Tat umgesetzt wurde.

Anlass für die Presse, wieder auf dieses Thema einzusteigen, war wohl ein Redebeitrag des Britannica-Frontmannes Jorge Cauz auf einer australischen Konferenz. Jorge, jener komische Kauz, ähnelt mitunter den Vertretern deutscher Nachschlagewerkversuche, indem er die Schuld für das bisherige Versagen auf Google schiebt, die in einem Akt von a) Unkenntnis b) Ignoranz c) Verschwörung lieber auf Wikipedia linken als auf die eigenen Seiten:

“If I were to be the CEO of Google or the founders of Google I would be very [displeased] that the best search engine in the world continues to provide as a first link, Wikipedia,” he said.”Is this the best they can do? Is this the best that [their] algorithm can do?”

If I were to be the CEO of Encyclopaedia Britannica or its editor in chief, I would be very displeased that one of the oldest modern encyclopedias continues to provide conspiracy theories instead of insight. Is this the best they can do? Is this the best their PR department can do?

In einer kurzen Analyse von Hitwise wird nahegelegt, dass die meisten Leute nicht bei spezifischen Suchbegriffen auf den EB-Seiten landen, sondern weil sie “Encylopedia” gesucht haben. Wer echte Fragen hat, geht wohl lieber auf die Seiten, die statt Werbebannern und Zahlungsaufforderungen Antworten bieten.

Wenn Google Projekte tötet

Saturday, January 17th, 2009

In der letzten Woche hat Google mehreren kleineren Projekten im Hause einen finalen Fußtritt gegeben. Und weil man nicht böse sein will, ist es kein Tod mit Ende, sondern beispielsweise die Freigabe des Sourcecodes, damit jemand anderes etwas damit anfangen könnte.

Dass es auch härter geht, zeigt Google bei “Lively”. Das gar nicht alte Projekt wurde eingestampft, die produzierten Inhalte sind verloren, Google gab den Kunden den Tipp, doch einfach Screenshots anzufertigen.

Und jetzt, wo Google Video, Notebook, Lively, Answers, Jaiku und Catalogs entfernt sind, reicht die geneigte Blogosphäre und IT-Nachrichtenszene so etwas wie “Todeslisten” herum, welche Projekte von Google als nächstes fällig sind.

Immer wieder wird “Knol” genannt. Vor dem Tod von lively hätte ich gesagt, dass man Knol alleine schon deswegen nicht jetzt töten wird, weil es ja noch etwas Zeit haben soll, sich zu bewähren. Was diesen Punkt angeht, bin ich jetzt nicht mehr so sicher.

In jedem Fall wäre es ein Verlust, Knol hat ein paar sehr interessante neue Ansätze bei der kollaborativen Erstellung von Inhalten gezeigt.

Wo ist eigentlich Gears für Gmail?

Thursday, January 1st, 2009

Nun ist Gears (ehemals “Google Gears”) seit einiger Zeit unterwegs, kommt bei Google Documents und auch bei WordPress zum Einsatz und ist bei Google Chrome auch fest eingebaut.

Meine Erfahrung soweit: Es funktioniert prima und tut das, was es soll.

Aber wo ist eigentlich Gears für Gmail? Wäre es denn nicht *die* beste Methode, endlich Gmail auch offline oder halb-offline (mit einer wackeligen Verbindung im Zug) nutzen zu können?

Die Berüchteküche brodelt seit 18 Monaten, alle Zeitpunkte für die Einführung von Gears for Gmail sind bisher ergebnislos verstrichen.

1. Juni 2007:  Google Gears to enable offline Gmail?
14. September 2007: Google Developing GMail Offline Version Using Google Gears 
14. September 2007: Bericht: GMail unterstützt demnächst Google Gears
24. September 2007: Gmail 2.0
17. Juli 2008: Calendar & Gmail ab Ende August mit Gears auch offline erreichbar?

PS: Gmail darf sich in Deutschland nicht Gmail nennen, Google vermarktet das Produkt hierzulande als Googlemail.

Die Google-Verschwörung

Saturday, November 25th, 2006

Ich war gestern im Süden bei einem Vortrag an einer Uni. Thema waren Plagiate und Web 2.0. Der Vortrag war relativ nett und vielleicht tauchen ja mal Folien oder Mitschnitte auf.

Eine Perle war ein kleiner rant über Google. Der Hintergrund ist relativ einfach: Ein mittelständischer Lexikonverlag hat eine überschaubare Menge Text online gestellt. Seit Monaten kommt Google kaum daher und indiziert nur Teile des Lexikons.

Aus Sicht des mittelständischen Lexikonverlages (so der Professor) kann das nur einen Grund geben: Google und Wikipedia kooperieren, Wikipedia wird durch Google gepusht und irgendwie wird künstlich der PageRank des Lexikons nach unten gedrückt.

Naja. Ich habe schon bessere Verschwörungen gesehen und auch solche, die besser ihre Spuren verwischt haben…