Thursday, August 22, 2013

NoSQL Now! talk summary and links: Polyglot Persistence in Windows Azure

Thanks to those who attended my session on Polyglot Persistence in Windows Azure today at the NoSQL Now! conference in San Jose. Here are slides and a few notes from today's session and follow-on questions.

 (download here)

How will Windows Azure provide NoSQL database support?
Today, Azure offers Table Storage as a NoSQL key/value store. Additionally, several partners have begun offering database "as a service" running in Windows Azure. For example, MongoLab and MongoHQ provide Azure-hosted MongoDB, while Cloudant provides Azure-hosted CouchDB.

Some of these databases are available directly through our partners' web portals, while others have also integrated into the Azure Store as part of the Windows Azure portal. Here's an example of MongoLab's MongoDB integrated in the store:
For self-hosting, several partners have built virtual machine images, installable via Azure's VM Depot. Here's Neo Technology's Neo4j 1.8:

What, exactly, is VM Depot?
VM Depot is a repository of community-created Linux-based virtual machine images. In terms of NoSQL, there are a few NoSQL database images available today. For example, you'll find Neo4j, MongoDB, Redis, and Riak.

What does VM Depot cost?
VM Depot is free: Free to publish images and free to download images to your Azure account.

What are the architectural considerations for integrating multiple NoSQL databases in my app? Are there standard practices?
As the Cloud Ninja Polyglot Persistence project demonstrates, you can choose to either make direct database calls or implement an abstraction layer, implementing such patterns as repository. When going with a repository pattern, this allows you to swap out database engines with reduced impact to your existing code base, although it's possible you'll need to make adjustments to your app's data access API.

How do I choose a specific NoSQL database implementation? Can you please recommend one?
For key/value storage, Azure Table Storage offers massive scale (200TB per namespace) and provides very fast storage and lookup. As for 3rd-party vendor offerings, I really cannot give specific recommendations, but I can offer some food for thought when making your decision:

  • Look at the company's longevity, financials, funding, etc.
  • Does the vendor provide Professional Services support?
  • How big / popular is the community? Consider forums, web presence, conferences, etc.
  • How robust is language support? does the product offer direct API's when using a non-supported language?
  • How active is the project? Are there frequent updates? Can you view the code (e.g. OSS)?
  • Will the database engine run on your target OS? Some databases may be Windows-only or Linux-only.
What was that super-cool zooming app you used during your demo???
I was using Zoomit, written by Mark Russinovich.

Where can I find more information about the stuff you talked about today?
Here are some informational links from today's talk:
There are a few more resources we didn't talk about, but should still be valuable:
  • Windows Azure Open Platform Video Series. This is a 6-part introductory video series I co-created with a coworker, looking at Windows Azure's openness around Compute, Data, and Dev. The data sessions cover a few NoSQL types (with demos).
  • Book: NoSQL Distilled
  • Book: Seven Databases in Seven Weeks
  • Neo4j data set: Cineast movie data. This is the basis for my Neo4j demo. Even though I only showed people and friendship relationships, the data also contains great movie info to navigate.
If you'd like to get in touch:

Thursday, August 15, 2013

Azure Open Platform video series - Episode 1: Open Compute Platform

Last year, my coworker Ricardo Villalobos and I visited several cities worldwide, delivering an all-day Windows Azure Open Platform Summit. This one-day event covered compute+networking, data, and the developer story from an open source perspective. This included several languages (.net, python, php, and node.js) as well as several NoSQL databases (Azure Table Storage, MongoDB, Cassandra, and Neo4j).

A few months later, we decided to record a 6-part video series covering the highlights of these topics. Each episode runs about 15 minutes. The first two segments were just published, and I'll update this post as the rest of the series comes online.

Episode 1, Part 1 - Open Compute Platform: PaaS, IaaS, and Virtual Networks

Episode 1, Part 2 - Open Compute Platform: Connectivity; Web Sites