Prompt
Read the article below and discuss how breaking the traditional RAID concepts helps Big Data deal with ever-growing needs of a storage system.
GoogleFile System Eval: Part I
GFS: Fundamental Paradigm Shift or One-off Oddity?
As regular readers know, I believe that the current model of enterprise storage is badly broken.
When I see something that blows that model away I like to learn more. In particular, I assess the
marketability of the innovation. So a cool technology, like GFS or backup compression, makes
me wonder if and how customers would buy it.
This article offers a 100,000 foot view of GFS by way assessing its commercial viability. If you
want a technical article about GFS, I can only recommend The Google File System by
Ghemawat, Gobioff, & Leung, from which this article draws heavily. The Wikipedia article,
oddly, isn’t very good on GFS (as of 5-16-06). If you don’t have a BS/CS or better you’ll likely
find Ghemawat et. al. a slog. Probably worthwhile. Good for temporary relief of insomnia.
Google From Space
GFS is one of the key technologies that enables the most powerful general purpose cluster in
history. While most IT folks lose sleep over keeping the Exchange server backed up for a couple
of thousand users, Google’s infrastructure both supports massive user populations and the
regular roll out of compute and data intensive applications that would leave most IT ops folks
gibbering in fear. How do they do it?
Partly they are smarter than you. Google employs hundreds of CompSci PhDs as well as many
more hundreds of really smart people. Partly it is their history: impoverished PhD candidates
can’t afford fancy hardware to build their toys, so they started cheap and got cheaper. And
finally, being really smart and really poor, they rethought the whole IT infrastructure paradigm.
Their big insight: rather than build availability into every IT element at great cost, build
availability around every element at low cost. Which totally changes the economics of IT, just
as the minicomputer in the ’70s, the PC in the ’80’s and the LAN in the ’90’s all did. Only more
so. When processors, bandwidth and storage are cheap you can afford to spend lots of cycles on
what IBM calls autonomic computing. With the system properly architected and cheap to build
out, it scales both operationally and economically.
All that said, Google hasn’t done anything unique with their platform that other people hadn’t
already. They just put it together and scaled it to unprecedented heights.
Note to CIOs: it isn’t going to take your users long to notice that Google can do this stuff and
you can’t. Your life isn’t going to get any easier.
GFS From Low-Earth Orbit
Despite the name, GFS (not to be confused (although how exactly I don’t know) with Sistina’s
GFS – maybe we should call it GooFS) is not just a file system. It also maintains data
redundancy, supports low-costs snapshots, and, in addition to normal create, delete, open, close,
read, write operations also offers a record append operation.
That record append operation reflects part of the unique nature of the Google workload: fed by
hundreds of web-crawling bots, Google’s data is constantly updated with large sequential writes.
Rather than synchronize and coordinate the overwriting of existing data it is much cheaper to
simply append new data to existing data.
Another feature of the Google workload is that it mostly consists of two kinds of reads: large
streaming reads and small random reads. As large reads and writes are so common, GFS is
optimized for sustained bandwidth rather than low latency or IOPS. As multi-gigabyte files are
the common case, GFS is optimized for handling a few million files, so, doing the math, a single
GFS should be able to handle a few petabytes of active data.
All this is built on very cheap components whose frequent failure, given the size of cluster, is
expected. The system monitors itself and detects, tolerates, and recovers quickly from
component failures, including disk, network and server failures.
GFS From An SR-71 Blackbird
A GFS cluster consists of a single master and multiple chunkservers, and is accessed by multiple
clients. Each of these is typically a dirt-cheap Linux box (lately dual 2 GHz xeons with 2 GB
ram and ~800GB of disk).
Files are divided into chunks, each identified by a unique 64-bit handle, and are stored on the
local systems as Linux files. Each chunk is replicated at least once on another server, and the
default is three copies of every chunk (take thatRAID-6 fanboys!). The chunks are big, like the
files they make up: 64MB is the standard chunk size. The chunkservers don’t cache file data
since the chunks are stored locally and the Linux buffer cache keeps frequently accessed data in
memory.
If, like me, you thought bottleneck/SPOF when you saw the single master, you would, like me,
have been several steps behind the architects. The master only tells clients (in tiny multibyte
messages) which chunkservers have needed chunks. Clients then interact directly with
chunkservers for most subsequent operations. Now grok one of the big advantages of a large
chunk size: clients don’t need much interaction with masters to gather access to a lot of data.
That covers the bottleneck problem, but what about the SPOF (single point of failure) problem?
We know the data is usually copied three times — when disk is really cheap you can afford that
— but what about the all-important metadata that keeps track of where all the chunks are?
The master stores — in memory for speed — three major types of metadata:
• File and chunk names [or namespaces in geekspeak]
• Mapping from files to chunks, i.e. the chunks that make up each file
• Locations of each chunk’s replicas
So if the master crashes, this data has to be replaced pronto. The first two — namespaces and
mapping — are kept persistent by a log stored on the master’s local disk and replicated on
remote machines. This log is checkpointed frequently for fast recovery if a new master is needed.
How fast? Reads start up almost instantly thanks to shadow masters who stay current with the
master in the background. Writes pause for about 30-60 seconds while the new master and the
chunkservers make nice. Many RAID arrays recover no faster.
The last type of metadata, replica locations, is stored on each chunkserver — and copied on
nearby machines — and given to the master at startup or when a chunkserver enters a cluster.
Since the master controls the chunk placement it is able to keep itself up-to-date as new chunks
get written.
The master also keeps track of the health of the cluster through handshaking with all the
chunkservers. Data corruption is detected through checksumming. Even so, data may still get
pooched. Thus the GFS reliance on appending writes instead of overwrites; combined with
frequent checkpoints, snapshots and replicas, the chance of data loss is very low, and results in
data unavailability, not data corruption.
GFS RAID
They don’t call it that, but StorageMojo.com cares about storage, and I find this part particularly
interesting. GFS doesn’t use any RAID controllers, fibre channel, iSCSI, HBAs, FC or SCSI
disks, dual-porting or any of the other costly bling we expect in a wide-awake data center. And
yet it all works and works very well.
Take replica creation or what you and I would call mirroring. All the servers in the cluster are
connected over a full duplex switched Ethernet fabric with pipelined data transfers. This means
that as soon as a new chunk starts arriving, the chunkserver can begin making replicas at full
network bandwidth (about 12MB/sec) without reducing the incoming data rate. As soon as the
first replica chunkserver has received some data it repeats the process, so the two replicas are
completed soon after the first chunk write finishes.
In addition to creating replicas quickly, the master’s replica placement rules also spread them
across machines and across racks, to limit the chance of data unavailability due to power or
network switch loss.
Pooling and Balancing
Storage virtualization may be on the downside of the hype cycle, and looking at GFS you can see
what simple virtualization looks like when built into the file system. Instead of a complex
software layer to “pool” all the blocks across RAID arrays, GFS masters place new replicas on
chunkservers with below average disk utilization. So over time disk utilization equalizes across
servers without any tricky and expensive software.
The master also rebalances replicas periodically, looking at disk space utilization and load
balancing. This process also keeps a new chunkserver from being swamped the moment it joins
the cluster. The master allocates data to it gradually. The master also moves chunks from
chunkservers with above average disk utilization to equalize usage.
Storage capacity is reclaimed slowly. Rather than eager deletion, the master lets old chunks hang
around for a few days and reclaims storage in batches. Done in the background when the master
isn’t too busy, it minimizes the impact on the cluster. In addition, since the chunks are renamed
rather than deleted, the system provides another line of defense against accidental data loss.
Cap’n, The Dilithium Crystals Canna Take ‘N’More! Oh, Shut Up, Scotty.
Google ran a couple of tests to test dilithium crystals GFS clusters.
We all know this must work in the real world since we all use Google everyday. But how well
does it work? In the paper they present some statistics from a couple of Google GFS clusters.
Google File System Eval: Part II
In yesterday’s post I ran through a quick (really, it was!) overview of the Google File System’s
organization and storage-related features such as RAID and high-availability. I want to offer a
little more data about the performance of GFS before offering my conclusion about the
marketability of GFS as a commercial product.
The Google File System (Links to an external site.)Links to an external site. by Ghemawat,
Gobioff, & Leung, includes some interesting performance info. These examples can’t be
regarded as representative since we don’t know enough about the population of GFS clusters at
Google, so any conclusions drawn from them are necessarily tentative.
They looked at two GFS clusters configured like this:
Cluster A B
Chunkservers 342 227
Available Disk Cap. 72 TB 180 TB
Used Disk Cap 55 TB 155 TB
Number of Files 735 k 737 k
Number of Dead Files 22 k 232 k
Number of Chunks 992 k 1550 k
Metadata at Chunkservers 13 GB 21 GB
Metadata at Master 48 MB 60 MB
So we have a couple of fair sized storage systems, one utilizing about 80% of available space,
while the other is close to 90%. Respectable numbers for any data center storage manager. We
also see that chunk metadata appears to scale linearly with the number of chunks. Good. The
average file size on A appears to be roughly 1/3 that of B. The average files sizes appear to be
about 75 MB for A and 210 MB for B. Much larger than the average data center file size.
Next we get some performance data for the two clusters:
http://labs.google.com/papers/gfs.html
Cluster A B
Read Rate – last minute 583 MB/s 380 MB/s
Read Rate – last hour 562 MB/s 384 MB/s
Read Rate – since restart 589 MB/s 49 MB/s
Write Rate – last minute 1 MB/s 101 MB/s
Write Rate – last hour 2 MB/s 117 MB/s
Write Rate – since restart 25 MB/s 13 MB/s
Master Ops – last minute 325 Op/s 533 Op/s
Master Ops – last hour 381 Op/s 518 Op/s
Master Ops – since restart 202 Op/s 347 Op/s
Just as the gentlemen said, there is excellent sequential read performance, very good sequential
write performance, and unimpressive small write performance. Looking at cluster A’s
performance, I infer that in the last minute it performed about 125 small writes, averaging about
8k each. Clearly, not ready for the heads-down, 500 desk, Oracle call center. Not the design
center either. It appears to me though, that this performance would compete handily with an
EMC Centera or even the new NetApp FAS6000 series on a large file workload. Not bad for a 3
year old system constructed from commodity parts.
Conclusion
The GFS implementation we’ve looked at here offers many winning attributes.
These include:
• Availability. Triple redundancy (or more if users choose), pipelined chunk replication,
rapid master failovers, intelligent replica placement, automatic re-replication, and cheap
snapshot copies. All of these features deliver what Google users see every day:
datacenter-class availability in one of the world’s largest datacenters.
• Performance. Most workloads, even databases, are about 90% reads. GFS performance
on large sequential reads is exemplary. It was child’s play for Google to add video
download to their product set, and I suspect their cost-per-byte is better than YouTube or
any of the other video sharing services.
• Management. The system offers much of what IBM calls “autonomic” management. It
manages itself through multiple failure modes, offers automatic load balancing and
storage pooling, and provides features, such as the snapshots and 3 day window for dead
chunks to remain on the system, that give management an extra line of defense against
failure and mistakes. I’d love to know how many sysadmins it takes to run a system like
this.
• Cost. Storage doesn’t get any cheaper than ATA drives in a system box.
Yet as a general purpose commercial product, it suffers some serious shortcomings.
• Performance on small reads and writes, which it wasn’t designed for, isn’t good enough
for general data center workloads.
• The record append file operation and the “relaxed” consistency model, while excellent
for Google, wouldn’t fit many enterprise workloads. It might be that email systems,
where SOX requirements are pushing retention, might be redesigned to eliminate deletes.
Since appending is key to GFS write performance in a multi-writer environment, it might
be that GFS would give up much of its performance advantage even in large serial writes
in the enterprise.
• Lest we forget, GFS is NFS, not for sale. Google must see its infrastructure technology as
a critical competitive advantage, so it is highly unlikely to open source GFS any time
soon.
Looking at the whole gestalt, even assuming GFS were for sale, it is a niche product and would
not be very successful on the open market.
As a model for what can be done however, it is invaluable. The industry has strived for the last
20 years to add availability and scalability to an increasingly untenable storage model of blocks
and volumes, through building ever-costlier “bulletproof” devices.
GFS breaks that model and shows us what can be done when the entire storage paradigm is
rethought. Build the availability around the devices, not in them, treat the storage infrastructure
as a single system, not a collection of parts, extend the file system paradigm to include much of
what we now consider storage management, including virtualization, continuous data protection,
load balancing and capacity management.
GFS is not the future. But it shows us what the future can be.
We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.
Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.
Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.
Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.
Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.
Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.
We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.
Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.
You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.
Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.
Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.
From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.
Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.
Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.
You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.
You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.
Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.
We create perfect papers according to the guidelines.
We seamlessly edit out errors from your papers.
We thoroughly read your final draft to identify errors.
Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!
Dedication. Quality. Commitment. Punctuality
Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.
We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.
We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.
We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.
We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.