Sunday, October 11, 2009

An idea for a new Storage Engine for MySQL, Part II

So, a few days ago I blogged about an idea I had for a new storage engine for MySQL. The MySQL API is C++. so it would require me to actually learn C++. Which I tried, but it hurt my eyes. So I decided to just go ahead and write a prototype in C#, just to quickly test out the basic premise; reducing/eliminating storing multiple copies of the same data, using B-trees and hash-values. (And gaining performance in the process..)

Since then I've spent 4 days in Istanbul, which was brilliant, but before I went I made a quick prototype, using .Net 4.0beta1. The preliminary results were encouraging enough to motivate me to polish it up a bit. In essence, I took 70MB of logs, and imported it into the storage enginge in 18 seconds, resulting in about 22MB on disk. (That is 32bit-aligned, mind you, so if size was the only concern it could be trimmed down quite a bit) Fairly simplistic, and for now everything is stored as byte-arrays, it doesn't even support deleting data. But with a little brain-work, I guess it could be a useful way of quick and simple permanent storage on .Net. (I really don't like the idea of requiring SQL Server for trivial stuff)

Not the first project with that aim, I know, but it is an interesting learning experience nonetheless. Incidentally, memory mapped files are supported in .Net 4.0, which is cool. In time, I hope to also make a MySQL-version, which would make it possible to use the same datafiles from a .net-application and MySQL, which might be vaguely useful; for instance writing a small .Net snippet to create a database from some moderately complex dataset, and exporting it to MySQL with a simple file-copy. Which I would file under C, for "Cool".

So, this is where I am at the moment:
  1. I've got a crude XML-format for defining a row-format, and a parser for it.
  2. A class-hierarchy that works fairly well
  3. A b-tree implementation that seems to work (I'm sure theres a few off-by-one's, but that's hardly the point)
  4. A data loader which loads in data based on the supplied dataformat
  5. A cmd-line app that lets me play with it
I can load data, and retrieve it as arrays of strings based on row-numbers. Each column uses two files, both memory mapped. One datafile contains an entry for each row, which is simply the record-number in the b-tree where the data can be found. The B-tree file contains entries for nodes and data, stored in pairs. A node-entry, followed by a datablock of static size.

A variation of this is for really small fields, for instance int's, in which case the column simply contains the values directly in the row-entry, instead of a pointer to the b-tree. These columns does not have b-trees.

I didn't bring a laptop to Istanbul, but I did think about it every now and then. So here is the current TO-DO list:

1. Make stress-tests. Randomly retrieve lots of rows, timing it
2. Add more statistics, maximal search length, number of levels in the b-tree, timing searches etc
3. Single datafile:
 - Extents. Need map of extents, each column has one or two extents * n, data and row
 - Need logic to get new extend when one is full
 - rowCount on table-object, not column
 - Extents handled by table-object
 - mmap one extent at a time
4. Support deletes
 - Use free-lists
5. Support variable datasizes
 - Another segment for actual data, for a total of two or three:
 - Non-optimizing:
  - 1 segment for rowdata, with pointers to another segment with valuedata.
  - a single list for fixes sizes 
 - Optimizing:
  - 1 segment for rowdata, 1 with b-tree, another for value-data
  - the b-tree requires two reads anyway, so should not take longer (ValEntry+Data)
6. Create general library for .Net
 - Need decent way of retrieving data, apart from row-number
 - GetRow on Table-object, return row as object[]
 - RegEx'es on dataload
7. Create MySQL StorageEnginge using same dataformat
8. Indexing?
9. Store data as native datatypes, at least int and char[]
10. Profile it, optimize
11. Verify that 32bit-aligne actually improves performance.

1 comment:

  1. Suggest you look into how the ZIP or in general compression formats handles hashing and lookup wich might be interesting. I don't know if i remember the terminology correctly but imagine if you by crude force identify the optimal minimum block size, and that after that, you have a hash table with variable key size, where the blocks that appear most frequently have the smallest key size larger than the optimal minimum, and so on... If the table was an integral part of the filesystem - in the way that you stored actual data for the items that was below smallest optimal block size, instead of the reference to the block, then you would also start to combat one of the major issues of small file... or actually data block access. Of course, this would require rethinking the whole memory and disk cache structures as we know it today.

    ReplyDelete