Archive for August, 2009

26 Aug 09

Schema-less Storage in Dream: IDocStore

Note: The interface IDocStore and its implementations are preview code. We’ve only started playing around with it for production use and will likely introduce changes before we ship any features of MindTouch depending on it. Your feedback is welcome and appreciated.

As part of Dream 1.7 (which is used by MindTouch 9.08), we’ve introduced a schema-less Xml storage inspired by Bret Taylor’s blog post “How FriendFeed uses MySQL to store schema-less data” about a schema-less storage system for JSON blobs used by FriendFeed. Since MindTouch is very much Xml centric and we like the richness of XPath, we decided to adapt the idea to using Xml as the document format.

Why “schema-less” and what does that mean?

In a traditional database, data is split into columns stored in rows in a table. For data more complex than a flat set, other tables are used and referenced from the the first table. That means that to add a new field to your record, you have to create a new column, an operation that can have significant performance implications for that table. In addition, if you choose optimize query performance, you may choose to index that new column, again affecting the the table’s performance during the creation of the index.

But what if you just serialized your record and stuffed that blob into a single column? I mean aside from having just given your DBA an ulcer… Well, you would have the data stored, but in terms of a relational database, you’ve just removed all benefits typically bestowed on a row. How are you going to get this data back out? Pulling each record out and examining it for a match is not a realistic option. Ok, you create an id for the object, and now you have a key/value store on top of an RDBMS… Still lost the ability to query on any other part of your record.

Chances are that you generally only need to retrieve records based on some other field in that record. Adding more columns and populating each for fields to be queried lands you right back where you started. How about creating a self-contained “index” by creating key/value pair tables, one for each field, with values in the rows being taken from your record and linking back to the main storage table. Since each of these index tables is a table of its own index, creation and maintenance no longer affects the main table. You can add an index and populate it without affecting the use of any other key or the data itself.

Introducing IDocStore

This is the storage model that IDocStore follows. The blob record format used is xml, allowing you to store simple fields, or complex hierarchies. Fields to index are defined by using XPath expressions, which makes it possible to not only index single values in the Xml document, but even lists of values. For example, given the following xml:

<user id="123" >
  <name>bob</name>
  <groups xmlns:g="http://foo/groups">
    <group>admin</group>
    <group>contributor</group>
  </groups>
</user>

You could define an index xpath of g:groups/g:group and be able to find bob with a query for either admin or contributor.

The interface of IDocStore is simply this:

public interface IDocStore {
  bool Put(XDoc doc, bool force);
  void Delete(string docId);
  XDoc Get(string docId);
  IList<XDoc> Get(string keyName, string keyValue);
}

We don’t differentiate between Create and Update, simply using Put. We also limit Get to querying on either the primary document id returning at most a single document, or an xpath index, which may return more than one document.

In it’s current incarnation, IDocStore requires the docId already exist in in the document at a configured XPath (default is @id). Lacking a natural key, a Guid could always be used to define the id. Another consideration is to add auto-key generation.

Dream contains two implementations of the IDocStore, MysqlDocStore and SqliteDocStore.

Sqlite Implementation

Starting with Dream 1.7.0 (used by MindTouch 9.08), every service is provided a DataCatalog backed by its own sqlite database (although this can be swapped out for any other database provider via configuration). This can be used as the backend for the SqliteDocStore.

The sqlite implementation, SqliteDocStore, is the simpler of the two. It does not yet include optimistic locking, meaning that Put will force an overwrite of existing data, regardless what force is set to. Also, since the DB file that backs the database cannot be shared by multiple processes, an instance of SqliteDocStore assumes that is the authoritative owner of the storage path, and therefore does index definition and maintenance at instantiation time:

IDocStore store = new SqliteDocStore(_catalog, new XDoc("config")
  .Elem("name", "user")
  .Elem("id-xpath","@id")
  .Start("namespaces")
    .Start("namespace").Attr("prefix","g").Attr("urn","http://foo/groups").End()
  .End()
  .Start("indicies")
    .Start("index").Attr("name","group").Attr("xpath","g:groups/g:group").End()
  .End());

This initialization will either use or create a new store named user in the specified _catalog. It will then perform indicies, adding/removing indicies so that only the index is group.

Mysql Implementation

The mysql implementation, MysqlDocStore,, by virtue of using a remote resource, removes index maintenance from the IDocStore implementation, and adds MysqlDocStoreManager for index manipulation, including updating an index. For this reason, an instance MysqlDocStore requires an instance of MysqlDocStoreManager (really, an instance of an IMysqlDocStoreIndexer) so that it can update the indicies when the documents change.

MysqlDocStore also implements optimistic locking, meaning that the store will attach a revision attribute to any document returned and use that revision to halt an update should the revision in the database be newer (or force is set to true on Put).

Bringing IDocStore to production use

Although an RDBMS is used as the underlying storage, the schema-less store is not ACID compliant. A request based on an index may not result in all the documents that match that index, since the index is maintained asynchronously from the update action. However, the result set it pruned to at least never return a false positive. This is analogous to authoring a page in MindTouch and waiting for the search index to pick up the changes. For most document storage usages, this should be fine.

Schema-less storage is useful when the data to be stored may have unknown or frequently changing structure and simple access patterns, i.e. primary or single key queries, are sufficient. Since MindTouch services often deal with this type of storage and complex DekiScript templates could also benefit from arbitrary storage, we’re experimenting with this as backing store for future extensions, and maybe even expose the storage as a service via an extension so that DekiScript could read from it and JEM could write to it.

If you have use cases for this storage facility, we’d love to hear from you as we test out and mature the design for production. The code is available for preview purposes now in the trunk and 1.7 branches on our source control server, and is part of the binary distribution going out with mindtouch.dream.dll in MindTouch 9.08

Tags: , ,
24 Aug 09

Concurrent Podcast (Episode 2): Why Async matters

Concurrent Podcast 02: Why Async

In Dream and MindTouch 09, we try to follow an async pattern using our own Coroutine framework throughout all systems. This is necessitated by an architectural decision that components of the system are decoupled to use the Http Request/Response pattern between each other. While we optimize for calls within the system to never hit the wire, it is fundamentally constructed so that any component could live across a wire boundary. This means that calls between components could leave the current process. Traditionally that kind of call would block the calling thread. Since this is central to our pipeline, that blocking would drastically reduce the capacity of any install. By using asynchronous calls, we never block, but instead yield the current thread to do some other work while we wait for a response.

Clearly, we’re highly invested in async, but we started having discussion internally why should other programmers care about this pattern. It does add complexity, so is it really a case of YAGNI?

Episode 2 is a product of this debate. We try to cover the use cases developers presently encounter and the ones that are on the horizon, becoming central to future development. We also talk about the complexities and pain involved given the present tooling and infrastructure. We hope that our discussion will give you useful information as you venture into asynchronous programming.

As a sidenote, for Episode 2, we also tried out a different recording setup which did not work out all that well. Apologies for the sound quality this week; we’re still learning the best way of doing this and hope to have significant improvements in place for Episode 3.

As usual, you can find future topic listed here, and we always welcome suggestions on what should take precedence or what other topics to cover. If you want to subscribe to just the podcasts, you can find the feed here.

21 Aug 09

Building a Dashboard 101

Google ChartsGoogle charts

In this post, I will go over how I approached building a dashboard for a Corporate Intranet. For those who have read some of my earlier posts, they will know that I have been working on a system to capture and display MindTouch download statistics.

As somebody who has programmed primarily for terminals and auto-graders, the process of creating a dashboard was a challenge. My approach was to first break down the dashboard into smaller, more manageable pieces. I have come up with these three steps:

Displaying the Data – Trying to dump massive data sets and expecting people to look through them is asking for trouble. It is usually much better to display the data through charts and pretty tables. MindTouch includes some features to help speed this process along such as Google Charts, Visifire Charts, and dynamic tables.

Finding the Dimensions – The purpose of this step is to figure out how many permutations of the data there are. By knowing this, it makes it easier to create a dashboard that does not overwhelm the user, but still contains all the desired information. A dimension is essentially an option that when altered will change the returned data. The simplest example of this is time. Using the project that I have been working on as an example, if I wanted to chart the number of downloads, I need to know when the data should start and when the data should stop. So in my case, time is a dimension that can be removed. This step is especially important for more complex data sets, taking a look at the download statistics:

3 charting choices X 3 editions X 3 os X 11 flavors X 4 platforms X 2 servers X 5 versions = 11880 different graphs!

Laying out the Information – After removing the extra dimensions from the data, what should be left is a base set of data that can serve as a foundation for the rest of the data’s permutations. This basic set of data will be representative of the possible data so it is a good placeholder for designing a layout to present the data. When designing the layout, it is important to remember that there will likely need to be a space left open for an option bar (I discuss this further below) and that everything should fit on one page without too much scrolling. The goal of the dashboard is to present a simplified and user friendly interface where the end user can efficiently and easily get the information they are looking for. If the data set just calls for more then one page because it is too long to fit on one page or it just makes more sense on separate page, MindTouch has some useful features to help merge multiple pages into a single one. Look into things such as tabs, accordions, and web.toggle.

Tabs used for multiple pages.

Tabs used for multiple pages.

Adding Functionability – In this step, the goal is to add back the dimensions that were removed earlier. Now that the layout is done, we are going to add the ability for the end user to access the dimensions that were removed earlier. There are some fancy options out there, but for my project, I used a simple post form, but slicker menus are also available. MindTouch offers some excellent features for handling query parameters with DekiScript (See DekiScript globals).  I used these as parameters in the Download Statistics extension which allowed me to, through the use of a simple form, manipulate which dimension in the data set was returned.

Using these steps, I created a dashboard which harness the power of the Download Statistics extension without burdening the end user with memorizing method calls. The final product is clean, simple, and most importantly, presents the multitude of data in small, digestible doses.

The final product:

Download Statistics Dashboard

Download Statistics Dashboard

21 Aug 09

Community Discussion – A feedback loop

Are you an avid MindTouch user? Do you love MindTouch?

If you have answered yes to the above questions or are just curious, stop on by our forums and answer this week’s Community Discussion question:

What is the single most compelling MindTouch feature you use?

Look for more Community Discussions soon!

18 Aug 09

MindTouch Minneopa (9.08) Preview 1

Minneopa State Park
Photo by Jon Mierow

MindTouch Minneopa (9.08) is now available as a public preview. Minneopa is a much smaller release than our previous major release, Lyons, as it mostly contains bug fixes (122 in the preview!) and the key building blocks that we utilized in our next commercial solution (announcement coming soon!)

For you open-source users of MindTouch Core, you can look forward to three new features: import/export API, video collaboration through integration with Kaltura, and the hide revisions feature. On top of this, we’ve done a lot of work with expanding support in our editor for commonly requested features, including native table sorting, definition list support, and more!

Our documentation on this release is a bit weak – we’ll be hashing out more details about these features over the coming few weeks as we roll out more previews; until, download the source, give it a whirl, and share your thoughts!

Download MindTouch Minneopa (9.08) Preview 1

Copyright © 2011 MindTouch, Inc. Powered by