SQLFire is a memory-optimized distributed SQL database from VMware. SQLFire is built for applications that need higher speed and lower latency than traditional databases can offer, but also require strong support for querying and transactions.
This webinar introduces the basics of SQLFire, including a discussion of why traditional databases are not scalable enough to deal with the demands of modern applications. I cover some of the extensions SQLFire makes to the SQL standard in order to be a truly horizontally-scalable SQL database.
The demo presented with the webinar shows how SQLFire can transparently scale to processes requests faster. In the demo a number of inserts are made, but not before a complex validation processes is done on the data being inserted. As a result the inserts are very slow. With SQLFire though you can simply add or remove nodes at any time, so if you anticipate a period where you need more processing power you can add a node and process inserts faster. SQLFire is designed to be horizontally scalable in all features, so you can scale not only inserts but also queries, transactions, etc.
Full source code for the demo is available (see the slides for details).
2. Speed Matters.
Users demand fast applications and fast websites.
The database is the hardest thing to scale.
3. What Is SQLFire?
• SQLFire is a memory-optimized, distributed SQL
database.
• SQLFire satisfies modern application needs by
adopting a more scalable design than traditional
RDBMS.
• SQLFire focuses on Speed, Scale and SQL.
4. SQLFire: Speed, Scale, SQL.
Speed Scale SQL
• Memory-optimized for • Horizontally scalable. • Familiar SQL interface.
maximum speed and • Add or remove nodes at • SQL 92 compliant.
minimum latency. any time for more • JDBC and ADO.NET
capacity or availability. interfaces.
5. SQLFire Speed Through Memory Optimization.
• Memory Optimization
– Latency is tracked to the millisecond.
– Use memory to get an advantage.
• SQLFire can use disk too.
– Data can be asynchronously persisted to disk.
– Append-only logfile format for best update
performance.
– Disaster prevention and for larger databases.
6. What’s Really Great About Memory?
Disk seeks are 100,000 times slower than memory reads!
(10,000,000 ns = 10 ms)
7. Who Needs That Kind Of Latency Anyway?
• Major online travel site.
• Minimizing page load time a must.
– Did all the usual tricks including minimizing server
round-trips, etc.
• Required all data access to complete in less than
20ms – Always!
• Selected a memory-optimized database from
VMware.
8. The Things People Do For Speed.
• Many architectures have appeared to make data faster.
• Many architectures
–
forMemcached on top of SQL Databases.
speeding up data
– Read-only slave nodes.
have appeared lately.
– Clustering / RAC.
• AllNoSQL. from
– stem
• Clear need to overcome traditional database limitations.
– Relational databases were not built to serve thousands (or
even hundreds) of users at once.
– RDBMS were not built for the low latency users expect.
9. Don’t Mask The Problem, Solve It!
Browser Tier Browser Tier
Application Tier Application Tier
Memcache + SQL: SQLFire
Multi-data model Horizontally Scalable
to overcome the DB. SQL Database
10. What Is Horizontal Scalability?
• Software that makes
multiple computers
appear as one system.
• Horizontal scale gives:
– Better performance.
– More capacity.
– Higher availability.
11. SQLFire Horizontal Scalability.
• SQLFire is a horizontally-scalable SQL Database.
• SQLFire is horizontally scalable from the ground up.
– Data inserts.
– Key lookup.
– Stored procedures / functions.
– Joins.
– Transactions.
– More.
12. SQLFire and SQL.
• SQLFire syntax is based on the SQL-92 standard.
• SQLFire extensions are to Data Definition
Language (DDL), e.g. CREATE TABLE.
• SQLFire ships with JDBC and ADO.NET drivers.
– Built-in, transparent high availability.
14. SQLFire Challenges Traditional DB Design, Not SQL.
• Traditional RDBMS: Too much focus on ACID consistency.
• Traditional RDBMS: Too much contention for disk.
15. The database world is changing. A lot!
• Many new data models (NoSQL) are emerging
– Key-value
– Column family (inspired by Google BigTable)
– Document
– Graph
• Most focus on making model less rigid than SQL
• Consistency model is not ACID
Low scale High scale Very high scale
STRICT – Full Tunable
Eventual
ACID (RDBMS) Consistency
• Different tradeoffs for different goals
17. SQLFire Versus Other SQL Databases.
Attribute SQLFire Other SQL DBs
DB Interface Standard SQL. Standard SQL.
Tunable. Mix of eventual High consistency.
Data Consistency
consistency and high consistency.
Transactions Supported. Very strong support.
Scaling Model Scale out, commodity servers. Scale up.
18. SQLFire Versus NoSQL.
Attribute NoSQL SQLFire
DB Interface Idiosyncratic (i.e. each is custom). Standard SQL.
Querying Idiosyncratic or not present. SQL Queries.
Data Consistency Tunable, most favor eventual Tunable, favors high consistency.
consistency.
Transactions Weak or not present. Linearly scalable transaction
model.
Interface Design Designed for simplicity. Designed for compatibility.
Data Model Wide variety of different models. Relational model.
Schema Flexibility Focus on extreme flexibility, SQL model, requires DB
dynamism. migrations, etc.
20. SQLFire Tables Are Replicated By Default.
1 CREATE TABLE sales
SQLFire Node 1
2 (product_id int, store_id int,
Replica
3 price float); sales
4
5
6 SQLFire Node 2
7 Replica
8 Best for small and
9 frequently accessed
data.
10
21. Partitioned Tables Are Split Among Members.
1 CREATE TABLE sales
SQLFire Node 1
2 (product_id int, store_id int,
Replica
3 price float) sales
Partition 1
4 PARTITION BY
5 COLUMN (product_id);
6 SQLFire Node 2
7 Replica
8
Best for large Partition 2
9
data sets.
10
22. Types Of Partitioning In SQLFire.
Type Purpose Example
Built-in hashing algorithm
Hash Partitioning
splits data at random across PARTITION BY COLUMN (customer_id);
(Default)
available servers.
Manually divide data across PARTITION BY LIST (home_state)
List servers based on discrete (VALUES (‘CA’, ‘WA’),
criteria. VALUES (‘TX’, ‘OK’));
Manually divide data across PARTITION BY RANGE (date)
Range servers based on continuous (VALUES BETWEEN ‘2008-01-01’ AND ‘2008-12-31’,
criteria. VALUES BETWEEN ‘2009-01-01’ AND ‘2009-12-30’);
Fully dynamic division of data
Expression based on function execution. PARTITION BY (MONTH(date));
Can use UDFs.
23. Redundancy Increases Availability.
1 CREATE TABLE sales
SQLFire Node 1
2 (product_id int, store_id int,
Replica
3 price float) sales
Partition 1
4 PARTITION BY
Partition 2*
5 COLUMN (product_id);
6 REDUNDANCY 1; SQLFire Node 2
7 Replica
8
All data is available Partition 2
9
if Node 1 fails. Partition 1*
10
24. Collocate Data For Fast Joins.
1 CREATE TABLE sales Related data placed SQLFire Node 1
2 (product_id int, store_id int, on the same node.
Replica
3 price float)
Customer 1
4 PARTITION BY C1 Customer 1 Sales
5 COLUMN (product_id);
6 COLOCATE WITH customers; SQLFire Node 2
7 C2 Replica
8
SQLFire can join Customer 2
9 tables without Customer 2 Sales
10 network hops.
25. Why Collocation Matters.
1 -- Biggest customers in CA.
SQLFire Node 1
2 SELECT sum(value) AS total
Since we collocated
Replica
3 FROM sales, customer
customers and sales Customer 1
4 WHERE sales.customerid = this query execute
independently Customer 1 Sales
5 customer.id AND
on each node.
6 customer.state = “CA” SQLFire Node 2
7 ORDER BY total DESC; Result:
The query scales Replica
8 linearly! Customer 2
9
Customer 2 Sales
10
26. Scaling Functions and Stored Procedures.
1 CALL maxSales SQLFire uses data- maxSales on
2 aware routing to local data
WITH RESULT PROCESSOR
route processing to
3 maxSalesReducer the data.
4 ON TABLE sales;
5
maxSalesReducer
6
7
8 Result Processors
9 give map/reduce maxSales on
functionality. local data
10
27. Some SQLFire Features Not Discussed Include…
• Server Groups:
– Granular control of data placement or restrict data to certain
servers.
• Disk persistence and overflow.
– For additional availability and when using SQLFire as a primary
database.
• Caching / Operational Data Store.
• Transactions.
• Indexes.
• Much more! Scan Me To Learn More
About These Features!
30. SQLFire Releases.
• SQLFire Professional 1.0:
– Release: December 13, 2011.
– Part of the vFabric Advanced suite. (Per VM)
– Also available standalone. (Per CPU)
– Limit of 2 connected nodes per database.
• SQLFire Enterprise 1.0:
– Coming 1H 2012.
– Unlimited nodes.
– Optional SQLFire Enterprise WAN Upgrade for active/active
global databases.
31. SQLFire WAN.
• Same database in multiple datacenters or the cloud.
• Fully active-active with asynchronous replication.
• Coming 1H 2012 with SQLFire Enterprise.
32. Download SQLFire Today!
• Beta available now on our community site.
– http://tinyurl.com/SQLFire
• General availability December 2011.
– vFabric Suite or standalone.
– Free for development up to 3 nodes.
• Follow us on Twitter:
– http://twitter.com/vfabricsqlfire Scan To Learn More
35. Demo Details
• 2 VMs.
– ubuntu
– livecd
• Schema:
– Table called record
• threadid: int, value int
• Each value is validated upon insert.
• The validate constraint check is extremely CPU intensive.
• Client: 5 threads inserting simultaneously.
• Total of 2500 records to be inserted.
37. Create Table Code:
CREATE TABLE record
(threadid int, value int
CONSTRAINT MY_CK CHECK (validate(value) = 1))
PARTITION BY COLUMN (threadid);
Note: The validate function takes a long time to
run.
Speed matters a lot these days. The major internet companies have known this for a long time, they find that websites and apps that load quickly and are responsive are the key to more satisfied users and ultimately to higher revenue.I took a couple of infographics here from a website optimization company called strangeloop and they show that internet companies measure speed in milliseconds and measure in very minute ways how each millisecond contributes to their bottom line.These days everybody is trying to make their applications richer and more interactive while at the same time delivering the great performance users expect.There are a lot of pieces that go into making an app fast, but the hardest problem to solve when you've got a lot of users or a lot of data is making the database fast enough. Databases remain the hardest thing to scale and the main inhibitor for people who want to build feature-rich applications that scale to thousands of users.
Let's talk about how SQLFire helps solve this problem. SQLFire is a memory-optimized, distributed SQL database. We'll talk more about what these things mean in a minute. But SQLFire is built from the ground up to be fast enough and scalable enough for the most demanding of modern applications. Though SQLFire is a SQL database, it is more scalable because it relaxes some of the constraints used by traditional databases, which we'll also discuss further.
SQLFire is built around the principles of Speed, Scale and SQL.- Speed through memory optimization. SQLFire's data structures and internal architecture take the view that data is intended to reside mostly in memory rather than on disk. SQLFire can also use disk but does it in a way that doesn't compromise in-memory speed.- Scale meaning SQLFire is built from the ground up to be horizontally scalable, meaning if you need more capacity, whether it's a bigger database or a faster database, you can simply add servers to get more capacity. Later on I'll show a demo that highlights exactly this point.- Then SQL. Love it or hate it, you probably already know SQL, it's by far the most widely used database access interface. SQLFire implements a subset of the SQL-92 standard and ships with JDBC and ADO.NET interfaces. In fact, SQLFire's SQL drivers are designed with horizontal scale in mind, the application doesn't need to be aware of how many servers are in the SQLFire cluster, and if a node in the cluster fails, applications are automatically re-routed to a working node without any disruption to the application.
Let's drill a bit more into each of these. The screenshot here is from a service called webpagetest.org and tracks every little detail down to the millisecond. Studies have shown that if an application fails to respond within two seconds, users start to get frustrated, and expectations about speed are growing every year, if an application consistently takes 2 seconds to respond, most users are going to start looking for alternatives. User facing applications need to be extremely fast.By optimizing for memory, latency is minimized and speed is maximized. Although SQLFire is memory-optimize it can also use disk, but SQLFire uses disk in a way that doesn't let it become a bottleneck, writing new data and even updates to disk in an append-only logfile format that won't let the disk become a bottleneck.
Key point to make: Mixed workloads of reads, writes and updates, memory really shines.Why is using memory such a big deal? Jeff Dean from Google put forth a very influential set of figures called numbers everyone should know, and these are useful rules of thumb to use when calculating tradeoffs in different approaches. The main thing to note is the huge differences that will arise in reading and writing to memory versus disk. Memory reads and writes will happen in constant time, while disk reads and writes can become unpredictable based on the number of seeks required. Disk-optimized databases really run into trouble when you have an application that needs to do a lot of simultaneous reading, writing and updating, whereas a memory-optimized database can handle any mixture with very fast and very predictable performance.
Does low latency really matter? When people optimize websites for example a lot of attention is typically given to things like minimizing server round-trips or writing more efficient javascript. What does this have to do with the database?VMware has a customer that found low latency to be an absolute must. This customer is a major online travel site and found that in order to deliver a good, pleasant, modern web experience to their users, all data access needed to be complete within 20ms, giving the rest of the application a comfortable amount of time to operate, and this was in the presence of reads, writes and updates. This webinar is talking about SQLFire, a product we're just releasing in 1.0 right about now, so this customer is not using SQLFire, but they are using another memory-optimized database called GemFire, which shares a lot of the same technology, and follows the same core principles of speed and low latency.So the answer is yes, user-facing applications really do need low latency, and when you've got a lot of data or a lot of users, in-memory architectures are the way to do it.
These days people are implementing a wide variety of architectures to overcome the limitations of traditional databases. It may be sharding or read-only replication, and many people are turning to NoSQL databases hoping for better scale. The problem is that relational databases are built for querying, not really built to support apps exposed to thousands of users who each expect extremely fast response.
So there are a lot of approaches. In this diagram we see a picture of memcache paired with a SQL database, which is a very common pattern.But as useful as these approaches are there is a problem with them, and that problem is these architectures attempt to hide the problem or defer the problem rather than actually solve the problem. In the process they introduce a lot of complexity and they only take you so far.SQLFire makes a different proposition. Instead of a multi-layered database with different data models, have one low latency database that is fast enough and scalable enough for modern online applications, and the result is a far simpler architecture.
What is horizontal scalability anyway? A horizontally scalable system makes multiple computers appear to be one computer as far as clients or users are concerned. With horizontally scalable systems you can add or remove computers at any time. Adding computers gives a number of benefits including better performance, more capacity and higher availability.
And SQLFire is built from the ground up to be horizontally scalable in every respect, including data access, transactions, function execution etc. Some tradeoffs have to be made to achieve this, specifically you need to give up the notion of full ACID consistency, something we'll talk more about, but SQLFire makes it possible to have a truly dynamic and scalable system without needing to give up the familiar SQL interface.
Speaking of SQL, SQLFire is based on the SQL-92 standard. SQLFire uses extensions to the standard, it has to to truly realize a horizontally scalable SQL database, but these extensions are limited to the so-called data definition language, things like creating tables. Once tables are set up the application uses the tables and doesn't need to know anything about these extensions. So for the most part applications don't need to worry about any nonstandard syntax.
Let's take a moment to compare SQLFire with other databases. First let's take a look at traditional SQL databases. The scalability limits of traditional RDBMS mainly flow from two factors, first an insistence on strict ACID consistency makes it almost impossible to run on more than one computer. These days everybody wants to run on commodity hardware but you quickly hit a limit with traditional RDBMS because you can't scale it out. Second is traditional databases are disk-optimized, their data structures, their optimizations are all with the idea that they need to optimize the relatively slow medium of disk. This is really holding databases back in this era of cheap memory, where terabyte servers can be bought for well under $50,000.
<-- Strict(Full ACID) ----FIFO(tunable) ---- Eventual ---> (Inpired by Amazon dynamo) RDBMS is synonymous with ACID Tunable: ACID transactions is a choice; by default it could be FIFO Eventual: All bets are off ... you may write and read back and get a different answer or multiple answers (netflix example)
Let's turn to the quickly changing world of new databases and NoSQL. There are a lot of new databases out there trying to solve the database problem in their own way, offering all sorts of different data models, and most of them moving away from the notion of strict ACID consistency.
Let's turn now to a hands-on look at some SQLFire features.On the left we're going to have the SQL code you can use in SQLFire and on the right we'll talk about what the code actually does.For starters we'll create a very simple table, in just the same way you would create it in other databases. By default tables in SQLFire are replicated across all nodes in the SQLFire cluster.That means, for one thing, that if a server crashes all the data in that table is still available. This approach is best for small datasets and data that is frequently accessed or used in joins.
Partitioning data is more sophisticated and more interesting. SQLFire has a keyword, "PARTITION BY", which tells SQLFire that the data in that table should be split up across all available nodes.This approach is a must for large datasets.
There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that's not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
Partitioning creates a challenge, by default data lives only on one node and if you lose that node the data is offline. We can solve that with the redundancy keyword. Using this causes SQLFire to keep multiple copies of the data on different servers so that if you lose a node, all the data is still available. Redundancy is usually a good idea and you can even keep data in 3 or 4 different servers at once. Most typically you're going to want a redundancy of 1.
Co-location is a key feature that allows SQLFire to be a real SQL database and horizontally scalable at the same time. When I talk to people who know distributed databases they usually ask "how do you do distributed joins?" The answer is, we don't. Instead we allow related data to be grouped together on the same physical node. This is done with the COLOCATE WITH keyword, which associates tables together based on a foreign key and keeps related rows on the same server. In this example we have customer 1 and customer 2 stored on different nodes. The COLOCATE WITH keyword lets me ensure that sales records from customer 1 end up on node 1 and records from customer 2 end up on node 2.
Why does this matter? Later if I need to run a query that joins the data from these tables, the join can run independently on each server in the SQLFire cluster without having to talk to the others. In other words we have linearly scaled the join query by allowing it to run on each node independently. When you talk about a SQL database built from the ground up to be horizontally scalable, these are the sorts of features you can't live without.
We can also look at functions and stored procedures. when you execute a function, SQLFire uses data-aware routing to execute the function on all the nodes that contains the relevant data. And in addition to that you can execute the job across all the nodes and if you need a result you can process results in a RESULT PROCESSOR. In this way you get a map/reduce like model which allows you to take full advantage of your horizontally scaled system in a SQL way.Good to point out that functions are in pure Java.