Last month while testing our new S3 Tracker tool, we ran a data collection for our own production account. We received quite a shock when we saw that it reported close to half-a-terabyte of S3 “Shadow Versions”.
While we had suspected that the S3 versioning feature might create some extra versions, this unexpected amount was very disturbing, especially as AWS clearly states that regular “storage rates apply for every version stored”.
What exactly are shadow versions?
Now, let’s understand what these shadow versions are. Amazon S3 has a versioning feature, which when applied to a bucket, keeps old versions of your objects. When you add a new object with the same name, it does not replace the existing one, but rather creates a new version for it. That way when you replace something occasionally, you may always go back and restore the old version, as needed.
But there is just one little problem. If you delete an object, the default AWS Console view (and other tools) stop showing it. But the object still exists in the bucket, and you still get charged for it. If you had multiple versions of the same object in the past, all of them continue to be stored, but the object becomes invisible (well, almost invisible as we shall see shortly). We call these hard-to-see versions of objects “shadow versions”.
Finding a needle in the haystack
Back to our story, we were shocked at finding half-a-terabyte of these shadow versions in our bucket. In the beginning we thought it impossible. We, the experts on AWS analytics and optimization, could not have such an error …
But after taking a closer look (in the depth of the bucket configuration), we realized that by mistake the versioning had indeed been applied for this bucket. Here’s what we saw in the AWS console:
Below we see a normal bucket view with 8 objects (as we had expected). But incredibly, until December 2012, this is all AWS provided, and there was no way to view all your versions. Every time you created a new version, you got charged in full, but couldn’t track the cost back to it’s source.
When Amazon finally got around to providing users with visibility into their S3 versioning, this is what they came up with:
As you can see, shadow versions may be identified by a ‘Delete Marker’ as the first version element. It appeared that we had hundreds of these!
Even knowing that the shadow versions exist, does not make it easy to find them. All versions are shown as one long list. When you have hundreds or thousands of objects, shadow versions are virtually impossible to spot.
So while technically Amazon does provide the ability to view all versions, from a practical and actionable perspective, it’s almost as if these versions are hidden from sight…kinda in the shadows.
S3 shadow vision
Search and destroy
Another fun fact about versioning, is that once enabled, you can’t apply the lifecycle rules to the bucket. That means that all these shadow versions need to be manually deleted. I’ll shortly publish a separate tool, that searches and destroys shadow versions and you’ll be able to download and run it on your versioned buckets. Stay tuned…
Discover how Cloudyn can transform your AWS S3 deployment – Sign-Up for FREE Now!