click here if you're having trouble viewing this email







Queue E-Mail Newsletter

for the Week of Jul 4, 2008


Sponsored by
ACM
17th USENIX Security Symposium



Latest Articles:

BASE: An ACID Alternative
In partitioned databases, trading some consistency for availability can lead to dramatic improvements in scalability.
(scroll down to read an excerpt from this article)

ORM in Dynamic Languages
O/R mapping frameworks for dynamic languages such as Groovy provide a different flavor of ORM that can greatly simplify application code.


Join ACM

A Special Offer to Join ACM for Queue Readers http://www.acm.org/joinacm2


Latest Queuecasts:
A Conversation with Jason Hoffman, pt. 2
Queue's January/February 2008 issue features a conversation with Jason Hoffman, CTO of Joyent, a provider of scalable infrastructure for Web applications. Interviewed by Sun's Bryan Cantrill (of DTrace fame), Hoffman discusses a range of topics, from providing scalable infrastructure for Facebook apps, to virtualization, to Ruby on Rails.

A Conversation with Jason Hoffman, pt. 1
Queue's January/February 2008 issue features a conversation with Jason Hoffman, CTO of Joyent, a provider of scalable infrastructure for Web applications. Interviewed by Sun's Bryan Cantrill (of DTrace fame), Hoffman discusses a range of topics, from providing scalable infrastructure for Facebook apps, to virtualization, to Ruby on Rails.

The Ever Expanding Ecosystem for Embedded Computing
Mike Vizard from ACM Queue talks with Oracle's Mike Olson about the changing architecture of network-enabled applications. Olson explains the thinking behind the company's new focus on embedded database and middleware technology. He explores the technical, business and economic forces shaping this fast-growing market. Tune in to learn how Oracle plans to serve customers way outside the enterprise.



17th USENIX Security Symposium, July 28-August 1, 2008, in San Jose, CA

Join top security practitioners in San Jose, CA, for a 5-day program that includes in-depth tutorials; a comprehensive technical program including a keynote address by Debra Bowen, California Secretary of State; invited talks; the refereed papers track including 27 papers presenting the best new research; Work-in-Progress reports; and a poster session. Register by July 14 and save up to $250! http://www.usenix.org/sec08/acm


New article on ACM Queue:

BASE: An ACID Alternative



In partitioned databases, trading some consistency for availability can lead to dramatic improvements in scalability.

by Dan Pritchett, eBay

From the Object-Relational Mappers issue, vol. 6, no. 3 - May/June 2008
article excerpt:

Web applications have grown in popularity over the past decade. Whether you are building an application for end users or application developers (i.e., services), your hope is most likely that your application will find broad adoption—and with broad adoption will come transactional growth. If your application relies upon persistence, then data storage will probably become your bottleneck.

There are two strategies for scaling any application. The first, and by far the easiest, is vertical scaling: moving the application to larger computers. Vertical scaling works reasonably well for data but has several limitations. The most obvious limitation is outgrowing the capacity of the largest system available. Vertical scaling is also expensive, as adding transactional capacity usually requires purchasing the next larger system. Vertical scaling often creates vendor lock, further adding to costs.

Horizontal scaling offers more flexibility but is also considerably more complex. Horizontal data scaling can be performed along two vectors. Functional scaling involves grouping data by function and spreading functional groups across databases. Splitting data within functional areas across multiple databases, or sharding,1 adds the second dimension to horizontal scaling. The diagram in figure 1 illustrates horizontal data-scaling strategies.

As figure 1 illustrates, both approaches to horizontal scaling can be applied at once. Users, products, and transactions can be in separate databases. Additionally, each functional area can be split across multiple databases for transactional capacity. As shown in the diagram, functional areas can be scaled independently of one another.

Functional Partitioning

Functional partitioning is important for achieving high degrees of scalability. Any good database architecture will decompose the schema into tables grouped by functionality. Users, products, transactions, and communication are examples of functional areas. Leveraging database concepts such as foreign keys is a common approach for maintaining consistency across these functional areas.

Relying on database constraints to ensure consistency across functional groups creates a coupling of the schema to a database deployment strategy. For constraints to be applied, the tables must reside on a single database server, precluding horizontal scaling as transaction rates grow. In many cases, the easiest scale-out opportunity is moving functional groups of data onto discrete database servers.

Schemas that can scale to very high transaction volumes will place functionally distinct data on different database servers. This requires moving data constraints out of the database and into the application. This also introduces several challenges that are addressed later in this article.

CAP Theorem

Eric Brewer, a professor at the University of California, Berkeley, and cofounder and chief scientist at Inktomi, made the conjecture that Web services cannot ensure all three of the following properties at once (signified by the acronym CAP):2

Consistency. The client perceives that a set of operations has occurred all at once.

Availability. Every operation must terminate in an intended response.

Partition tolerance. Operations will complete, even if individual components are unavailable.

Specifically, a Web application can support, at most, only two of these properties with any database design. Obviously, any horizontal scaling strategy is based on data partitioning; therefore, designers are forced to decide between consistency and availability.

ACID Solutions

ACID database transactions greatly simplify the job of the application developer. As signified by the acronym, ACID transactions provide the following guarantees:

Atomicity. All of the operations in the transaction will complete, or none will.

Consistency. The database will be in a consistent state when the transaction begins and ends.

Isolation. The transaction will behave as if it is the only operation being performed upon the database.

Durability. Upon completion of the transaction, the operation will not be reversed.

Database vendors long ago recognized the need for partitioning databases and introduced a technique known as 2PC (two-phase commit) for providing ACID guarantees across multiple database instances. The protocol is broken into two phases:

  • First, the transaction coordinator asks each database involved to precommit the operation and indicate whether commit is possible. If all databases agree the commit can proceed, then phase 2 begins.
  • The transaction coordinator asks each database to commit the data.

If any database vetoes the commit, then all databases are asked to roll back their portions of the transaction. What is the shortcoming? We are getting consistency across partitions. If Brewer is correct, then we must be impacting availability, but how can that be?

The availability of any system is the product of the availability of the components required for operation. The last part of that statement is the most important. Components that may be used by the system but are not required do not reduce system availability. A transaction involving two databases in a 2PC commit will have the availability of the product of the availability of each database. For example, if we assume each database has 99.9 percent availability, then the availability of the transaction becomes 99.8 percent, or an additional downtime of 43 minutes per month.

An ACID Alternative

If ACID provides the consistency choice for partitioned databases, then how do you achieve availability instead? One answer is BASE (basically available, soft state, eventually consistent).

BASE is diametrically opposed to ACID. Where ACID is pessimistic and forces consistency at the end of every operation, BASE is optimistic and accepts that the database consistency will be in a state of flux. Although this sounds impossible to cope with, in reality it is quite manageable and leads to levels of scalability that cannot be obtained with ACID.

The availability of BASE is achieved through supporting partial failures without total system failure. Here is a simple example: if users are partitioned across five database servers, BASE design encourages crafting operations in such a way that a user database failure impacts only the 20 percent of the users on that particular host. There is no magic involved, but this does lead to higher perceived availability of the system.

So, now that you have decomposed your data into functional groups and partitioned the busiest groups across multiple databases, how do you incorporate BASE into your application? BASE requires a more in-depth analysis of the operations within a logical transaction than is typically applied to ACID. What should you be looking for? The following sections provide some direction.



Read the rest at acmqueue.com


See all the latest articles and audio interviews with Queue's RSS Feeds


To unsubscribe to this newsletter, send an email to
queuenews-request@acmqueue.com
with the words 'unsubscribe' in the subject line.

Change your email address

Subscribe to Queue in print

About Queue

Contact Us

Privacy policy


For advertising information, contact advertising@acmqueue.com



© 2008 ACM, Inc. All rights reserved.