May 24, 2011
I try to avoid the black/grey/white propaganda narratives surrounding the alleged Bin Laden assassination. The whole thing is so absurd that it’s just impossible for me to take it seriously at all. However, in my routine monitoring of stories about the NSA, I came across the claim that the agency is processing an amount of data that’s equivalent to what’s stored in the Library of Congress every six hours.
Ok, so I tried to figure out how much data a Library-of-Congress-every-six-hours represents.
It turns out that the claim is completely meaningless! It tells us nothing about how much data the NSA is intercepting/archiving/processing because there are no digital versions of vast portions of the full Library of Congress collection.
Read this: How ‘Big’ Is the Library of Congress?
So it begs the question, just how “big” is the Library of Congress, in terms of our content, but especially if one tried to equate it to the digital realm?
I won’t go into any of the specific claims that are being made, but they’re easy to find out there in the ether, and suffice it to say that the Library would stand behind very few if any of them. There are certain things we can quantify, but far more that are purely speculative.
For instance, we can as of this moment say that the approximate amount of our collections that are digitized and freely and publicly available on the Internet is about 74 terabytes. We can also say that we have about 15.3 million digital items online.
- A d v e r t i s e m e n t
Some may be tempted to extrapolate that those digital items represent a precise percentage of the nearly 142 million items in the Library’s physical collections, and then estimate some kind of digital corollary. But comparing digital and physical items is apples and oranges, at best. A simple example of that fallacy would be represented by a single photograph online depicting several physical objects.
Another source of digital estimates is likely based on the number of books and printed items in our collections, which is currently about 32 million. One could attempt to establish the average length of those items (pages, words, characters, etc.) and extrapolate the digital equivalent of those 32 million physical items.
Assuming one could do that with any degree of accuracy — and that’s a big assumption — it overlooks the fact that those 32 million books represent only about one-quarter of the entire physical collections. The rest are in the form of manuscripts, prints, photographs, maps, globes, moving images, sound recordings, sheet music, oral histories, etc. So how does that other three-quarters of the Library equate digitally? Can one automatically assume the digital resolution at which all maps or photographs, for instance, would be scanned? Those are major wildcards indeed.
And then there are our motion pictures, videos and sound recordings alone — around 6 million items stored at our new Packard Campus for Audio-Visual Conservation in Culpeper, Va. What is their digital equivalent? Most people who record television programs onto a computer or DVR know that a hard drive with hundreds of megabytes or even a terabyte or more can quickly fill up.
So, there you have it: A nonsensical claim, effortlessly woven into the tapestry of other nonsensical claims, to go with your morning coffee.
Via: The Baltimore Sun:
Parachini of RAND said the rule of thumb has been that every six hours, NSA collects an amount of information equivalent to the store of knowledge housed at the Library of Congress.
“The volume of data they’re pulling in is huge,” he said. “One criticism we might make of our [intelligence] community is that we’re collection-obsessed — we pull in everything — and we don’t spend enough time or money to try and understand what do we have and how can we act upon it.”
This article was posted: Tuesday, May 24, 2011 at 1:17 pm