It's sunday and I'm kind of burnt out on hacking in C, so I thought I'd take today to do a little writeup on what went in to designing pngz, my very work-in-progress brute force PNG optimizer. This article represents the sum of my research on PNG optimization, and will hopefully give you a decent grasp on both the PNG file format and the types of optimizations that are possible.
Before I start, I'd like to acknowledge the people who's work has been indispensable in creating this article: Cédric Louvrier for css-ig.net which covers everything I'm about to talk about and more, Cosmin Truţa for his excellent collection of PNG-Tech articles, and Björn Höhrmann for his page on pngwolf.
Now let's get to it.
A (Very) Brief Introduction to PNG
When I started this project I was kind of intimidated by file formats; I'm usually doing things in higher level languages and I hadn't touched a struct in a long long time. If you're like me, don't be afraid; the PNG specification is incredibly accessible and anyone who understands bits and bytes shouldn't have any trouble picking it up.
I'm not going to restate all of the information in the specification, but briefly here are the steps to PNG encoding:
- A color type and bit depth are chosen and the raw image data is encoded to them.
- A filter is chosen for each row of encoded data and applied.
- The whole stream of filtered bytes is run through deflate.
All there is to it.
Why Optimization is Hard
When I first got interested in PNG optimization I was confused. Intuitively I felt like there should be an "optimum" encoding, and reaching it was just a matter of combining all of the disparate optimization techniques from all of the different optimizers into one.
And this would be mostly true if not for the deflate step. See, because it's impossible to predict exactly what size a stream will deflate to without actually deflating it, and because deflate is run last, you have to go through the entire encoding process before you know how your optimization strategy fared.
Deflate algorithms are competitive in their own right, and the most powerful sacrifice time for compression. That means that finding the final output size of your PNG can cost a ton of cycles.
So what can you do?
Well, while there's no perfect proxy for deflate-ability, there are some pretty good ones. Most of the competition in writing PNG optimizers is in finding good heuristics that create output which deflates well in a reasonable amount of time.
Instead of trying to compete with heuristics, pngz naïevely assumes that deflate is completely unpredictable and exhaustively passes every input encoding possible to the deflate step. Fortunately there are a finite number of encodings for every raw input. Unfortunately that number is very large. Nonetheless, pngz tries them all.
Before we get into the actual optimizations, we need to talk a little bit about what "lossless" really means. For this I'll defer to a quote from Cosmin Truta's guide to PNG optimization:
Losslessness in the strictest sense, where no information whatsoever is lost, can only be achieved by leaving the original file (any file) intact, or by transforming it (e.g. compressing it, encrypting it) in such a way that there is an inverse transformation which recovers it completely, bit by bit.
As he goes on to say, this definition is mostly useless for PNG optimization; we don't really care about maintaining all internal data, we just want the rendered image to look the same.
It's best, then, to think of lossless encoding as taking in raw image data, that is, an array of 16 bit RGBA pixels, and outputting a PNG that both adheres to the specification and represents that exact same raw image data. Indeed, expansion of the input PNG to 16 bit RGBA is the first thing that pngz does.
Viewing encoding in this way greatly simplifies the optimization process; we're not looking for the best way to convert one color type PNG to another, just the best way to encode raw image data into a PNG full stop. Instead of trying to work around the cruft left behind by a prior encoding as some optimizers will, you just expand the input PNG into its pure form and begin the optimization process from a unified starting point.
So now we're ready to dive in to actual optimizations. This collection was created by mapping out every place that the encoder makes a decision and following every path therefrom.
For each section below I'll try and give you an expression that quantifies the number of different paths that the optimization creates. I'll also try and cover non-naive techniques for optimizing each, and provide links to more information where applicable.
First up are ancillary chunks. I almost forgot to include them because the philosophy of pngz naturally makes them disappear (raw image data doesn't have a tEXt chunk!), but here they are. It's up to you whether or not to keep them around.
A listing of the standard chunks can be found here. In some cases they may be important. In most cases you probably don't care. Pngz throws them all away.
Color type is the first choice you really make in encoding and sets the stage for the types of optimizations you can do.
Because pngz tries to be exhaustive, it doesn't make any assumptions. That means that while it will rule out impossible color types, such as representing color pixels in a greyscale format, it won't rule out illogical ones like using color formats on all-grey images, or full alpha channel on images with no transparency.
I've listed below things that make lossless encoding to a certain color type impossible. Keep in mind that most smart optimizers will use the opposite of each of these statements as well when ruling out color types.
- Images with semi-transparent pixels (not completely transparent or completely opaque) can't be encoded in color types 0 or 2.
- Images with color pixels can't be encoded in color types 0 or 4.
- Images with a minimum bit depth greater than 8 can't be encoded in color type 3.
- Images with more than 256 unique RGBA colors can't be encoded in color type 3, with the caveat that all transparent pixels may be treated as the same color.
- Images with fully transparent pixels that also span the entire color space can't be encoded in color types 0 or 2, as the tRNS value can't be set to any color that's not already in the image. This is an edge case, but might be good to keep in mind.
Most optimizers will set the bit depth to the lowest that it can be while still not losing any data. This seems like a sound decision, but surprisingly it isn't always advantageous, the hypothesized reason being that deflate is optimized for byte-sized data.
Pngz will try all bit depths greater than or equal to the lowest bit depth possible.
Row Edge Bits (0,3)
Images encoded at bit depths below 8 can have rows with lengths that don't end directly on a byte boundary. The spec, however, states that rows must end on on byte boundaries. The contents of these wasted bits between where the image data stops and the row actually ends are unspecified.
Pngz exhaustively tries all values of these bits to the tune of
2^(height * (8 - ((width * bit_depth) % 8))) operations. A wise optimizer might take these bits into consideration in its filtering heuristics, though I don't know of any that currently do.
tRNS Color Choice (0,2,3)
Color types 0, 2, and 3 all partially support transparency via a separate tRNS chunk. For 0 and 2 this works by specifying a color to be the transparent color for the image. Everywhere that this color appears in the pixel data will be made fully transparent by the decoder.
The only limit on what color you choose is that it doesn't appear anywhere else in the image (else those pixels will also become transparent). This makes the number of possible values for the tRNS color equal to
2^bit_depth-total_unique_colors for color type 0 and
2^bit_depth*3-total_unique_colors for color type 2.
Color type 3 basically keeps a second alpha palette in the tRNS chunk that maps directly to the color palette. I'll get more into tRNS optimization with color type 3 in the section on PLTE arrangement.
Non-alpha Values for Transparent Pixels (4,6)
Color types 4 and 6 have a full alpha channel. Notice that, if a pixel is fully transparent, its non-alpha components are unnecessary. This means that for every fully transparent pixel in the image, we have a set of values we can freely manipulate.
Many optimizers just zero these values, the logic being that filtering will make them mostly disappear. A few others bleed the values in a way that improves filtering. Almost all will also try the original values, which can lead to huge gains if, say, the original image was a gradient.
Pngz tries all combinations. That's
2^(bit_depth * total transparent pixels) for color type 4 and
2^(bit_depth * 3 * total transparent pixels) for color type 6. Generally speaking, if an image has more than 1 transparent pixel pngz will never finish executing.
PLTE Arrangement (3)
Color type 3 images keep their color information in a PLTE chunk. Arrangement of the colors in the palette can have a large effect on final size. This post by Cédric Louvrier takes an in depth look at many different strategies for arranging the palette.
Something to take into consideration is the that paletted images with transparent or semi-transparent pixels keep a second alpha palette in the tRNS chunk. This tRNS palette doesn't need to be as long as the PLTE palette; any indexes outside of the tRNS palette are assumed to be opaque. For this reason a few optimizers opt to put all transparent and semi-transparent colors at the head of the PLTE palette in order to minimize the size of the tRNS chunk.
Pngz tries all possible palette arrangements. This is very hard (and something that I dread implementing some day) unless the palleted image has exactly
2^bit_depth colors (and then the number of arrangements is still
(2^bit_depth)!). The issue is that palette entries do not have to be unique: you could have one color in 200 different palette indexes and use them interchangeably for all pixels of that color in the original image. This makes the math for totaling the number of combinations hard. Too hard for me...
Now we get to the second stage, filtering. The specification offers some advice here, though it's been shown that the heuristic they promote doesn't work all that well. Instead I suggest you check out the page for pngwolf, which has an in depth analysis of different filtering heuristics.
Pngz luckily avoids all that and just tries them all. That's
5^height different combinations for each input to the filter stage.
And finally, deflation. Pngz does not mess with deflation.
The reasoning is that deflate is its own separate thing. From Ken Silverman's KSFlate, the implementation used by PNGOUT, to google's zopfli, entering that arena was not something I was eager to do.
That being said, pngz currently uses plain zopfli with 15 iterations, which is the state of the art when it comes to open source deflate implementations. Those looking for that extra edge should check out Deflopt, Huffmix, and maybe browse around the Encode forums for more depth on the subject.
At Long Last
PNG optimization gets far more powerful when you drop the lossless requirement, but this is unfortunately where our journey ends. Ultimately the bleeding edge of PNG optimization is just fighting over bytes; most tools wont consider half of these things and still be good enough. If you're looking for suggestions, ScriptPNG for Windows is a nice utility that combines all the best optimizers into one super-optimizer. For Linux I can't think of a single best choice, but likely some combination of optipng, pngwolf, and pngzopfli would create the smallest output.
Hopefully I've covered everything there is to know (and if I haven't please get in touch!) and you've learned, at the very least, to never try to make an optimizer yourself ;)
Thanks for reading.