Sunday, October 24, 2010

It's about time

TIMESTAMP and DATETIME is how dates and times, or temporal, data is represented ion MySQL. These types are combined with a bunch of operators on these types, such as ADDATE and TIME_FORMAT and such. So is this enough to effectively manage temporal data then? Nope, in my mind, far from it.

To begin with, time isn't such a simple type of data as we might look at it. Is a date just a count of the number of seconds since 00:00:00 on Jan 1 1970 or something like that? And we then convert it to a calendar and all is fine? Is time data really linear in that sense? No, it's not. Dec 24 2010 is Christmas Eve and when you get your Christmas gifts here in Sweden. Is this significant? Also, it's a Friday. That this data is a Friday can be calculated from the date or the seconds-since-jan-1-1970 counter, but that is a different thing. Looking at that counter, you cannot see that there is no way to look at that number and figure out that this day is special, but it is.

To manage this, we have the date and time functions in MySQL, which help us figure things like the above out, but there are issues with it still. To begin woth, the constructs quickly turn difficult. Secondly, you cannot really do an indexed search on temporal data, except pretty simple cases, such as searching for a data and time, before or later than a given data and things like that. A search for "all saturdays" will sure not use an index.

There are more issues though, like that fact that, as I have shown above, a temporal value as we know them above, contains hidden data, something that is considered bad (like the day of the week, the month, the fact that a data is a leap day etc).

And it doesn't end there, there are different calendars in use, in different parts of the world and in different inductries. If you have a mortage on your house, chances are that the interest you pay is the same each month, despite the fact that the interest in calculated over the full year and different months have different number of days. The reason is that interest uses a "360 day" calendar, where each month is assumed to have 30 days. And different financial instruments use diferent calendars (for example the one called 366 day, which is our normal calendar and the 365 day calendar. which is the normal calendar with the exception that there is no leap day). If you try to match one financial instrument to another, when they use different calendars, is not a lot of fun (I've done it, so I know).

But chances are that many of you now think that this is not a big problem for you, as you don't see people asking questions on data with complex temporal relationships. And to that I say that, there is a big chance that this is because:
  • Temporal data is difficult. More so than simple integers and strings, disregarding the simple before date X or something like that. But we just don't ask the questions like "how much of the sales per quarter happen on the last day of the quarter" and things like that. Which doesn't mean the information isn't interesting, but the questions is difficult to formalize and even more so to execute in a database. Not to mention checking for overlapping dates and things like that.
  • We are used to not being able to ask these kinds of questions.
But really we should be able to ask, query and get data with a lot more temporal information. When was the last time you asked a temporal data query in Google (not that you CAN do this). Like, how has the matching or IBM and Pressrelease changed over the last 10 years, i.e. how many Press releases did IBM post in 2001, in 2002 etc. Yes, the data is there somewhere, querying it is difficult. And using temporal data combined with data you get on the web should allow you to determine more than what you can today (i.e. IBM and Press release matches X number of documents in Google right now). Adding a temporal axis to web data should allow you to determine what might happen.

One issue with all this is that databases are hardly geared up to support this, and a second is that the data to support this isn't available in large quantities yet. This is one of the things we are looking at at Recorded Future, but there is more stuff coming. The web is there with masses of data, now we need to make sense of all that data. One aspect which is there but which hasn't yet been explored much is just the temporal aspect of data.

/Karlsson

Tuesday, October 19, 2010

MyQuery 3.4.0 Released!

I have not finished MyQuery 3.4.0, after a long time of weekend and spare-time development. This version adds some features that are basically Scintilla featires that I have integrated with. This includes error indicators, margin markers and some other things. In particular the margin markers changes a few things in the interface: When running up to, or continue from a point in teh script, then use the margin marker instead of the current line! If you don't like this, then don't set the margin marker and the current line will be used just like in old releases.

Another addition this release is Online help. I get the help contents from the database that you connect to, which is installed with most MySQL installations in the tables called help_xxx in the mysql schema.

External program has a few more options, and in this release the installer installs a few of those, just to show how it works, like the MySQL schema in an editor window and direct access to the MySQL commandline.

In addition to this, and a few more minor features, there is a whole bunch of bugfixes in this release, which doesn't mean that some doesn't remain. Above all, I have reengineered how I handle multiple access to the MySQL connection.

Download it from Soureforge! I will follow up this post with some more on how to use MyQuery, how to build extensions and plugins and some things like that!

Happy SQLing
/Karlsson
PS. If you have issues with MyQuery, drop me an email, but note that I am now at anders@recordedfuture.com!

Monday, October 18, 2010

Moving on, but not far

As of today I have a new employer, and I no longer work for MySQL / Sun / Oracle, instead I am now Database Architect at Recorded Future (http://www.recordedfuture.com). This is exciting stuff, really so. RF pulls data from teh Web, but instead of just allowing searching, RF will analyze what it find to spot trends and things like that.

It is still early days, we need to collect a lot more data than today, and managing that will be one of my challenges. But an interesting and fun challenge!

This doesn't mean I will not maintain MyQuery anymore, I sure will. And Recorded Future is by the way MySQL users, so I will probably work even more with MySQL related stuff than today. I will also continue to blog about what we are doing, and about other database related things as well. In particular, I will work with and blog about maintaining very large databases!

So watch this space, and see you soon
/Karlsson

Thursday, October 14, 2010

MyQuery 3.4 soon to be released

I've been spending some cycles now and then on MyQuery 3.4. One thing I wanted to do with this release was to fix some long-standing bugs I knew about, but hadn't identified (Are you seeing ReleaseSemaphore errors? That's one of them). Also, I had a couple of issues with the new Plugins, as well as some ideas for new features, such as OnLine Help.

In addition to that, I also felt that the Scintilla editor controls tht I was using wasn't used to it's full potential.

As I have the new 3.4 in front of me now, I have done something about all those things. I have cleaned up some of the more recent code, I have added OnLine help, in this case I get that from the MySQL help tables. I am pretty happy with how a few more steps to integration with Scintilla could add so much more usability, Scintilla really is a great editor control with a lot of potential, I haven't reallly explored it fully yet, and there is more to do. Also, Scintilla is really solid.

What remains to be done is to test, package and publish MyQuery 3.4. This will be a Beta, but I will release a GA based on it real soon. Before that, I have one or two plugins I want to add (Replication monitoring for example). Also, I have a plan to add some more advanced error handling, using Scintilla to show erroneous text in a better way than today. But all in all, we are now close to GA.

Cheers and see you soon!
/Karlsson