FOSDEM 2010 and the NoSQL Devroom

As I mentioned earlier, I was fortunate enough to be able to attend FOSDEM this year. The sheer scale of FOSDEM is amazing, with literally thousands of people in attendance, dozens of projects represented, and hundreds(?) of talks. It's doubly impressive when you consider that it is entirely volunteer driven and 100% sponsored (it's no cost to attend).

The NoSQL track organized by Steven Noels on Sunday turned out quite well too I thought, and it seemed to generate a lot of interest (the room was continually filled to capacity and the doors barred). There were talks from some of the usual players (MongoDB, HBase, and of course Cassandra), along with some less heard of projects (GT.M). Mine was the last talk of the morning and seemed to be pretty well received. I got a lot of great questions both during and after the session, and ended up talking shop with several attendees until the next session was starting.

So congrats to the organizers for an awesome conference, and thanks again to Steven for letting me come talk about Cassandra.

Finally, here is the video of my talk, or you can view it here with the slides.

Christmas Lights

We put up lights each year for the holidays, and while I don't mind having the house decorated, I do not like having to put them up. Despite this, I feel mounting pressure each year to Do Better, which by default means more lights and decorations, which in turn mean even more work.

The year before last I had the idea that if I worked smarter I might avoid working harder, and that one of those musically synchronized setups would be pretty sweet. Problem is I came to this conclusion in October of 2008, and that didn't leave enough time to properly procrastinate before throwing something together at the last minute, so I was forced to postpone. This past year though I was able to spend a solid 11 months procrastinating, which still left a couple short weeks to hurriedly throw something together.

So long story short, I did it, I put together a controller that sets christmas lights to music. And, providing that you weren't privy to all of the nasty hacks and ugly short-cuts, it was actually kind of neat of to watch. Obviously though, I'm not entirely happy with the results, so I'm considering this year a practice run, and hope that with this as a basis to build upon, next year it will be pretty sweet. So treat the rest of this post as more of a rough brain-dump than a recipe or step-by-step, and hopefully it will prove interesting for comparison purposes next year.

Hardware

Normally this is where I'd expound on some of my research, the options I investigated, costs, ease of use, etc. That's not going to happen because with all of the procrastination this project required, I simply didn't have the time. Instead I went straight for a pre-assembled parallel port relay board, and I took a page out of this guys book and mounted it in a plastic tool box.

I used some cheap extension cords, fixed to the ends of the toolbox with cable connectors, and wired it all together inside. The relay board needs a power supply, so there is an outlet inside for that.

A cord that exits the rear of the toolbox gets plugged into the mains to supply power to the whole thing, but since we're combining electricity and the great outdoors, a GFI is a must.

I scored this parallel cable in an old pile of hardware. It worked great once I got the zip drive that was attached to it off and in the trash.

Finally, I made use of an old PIII notebook.

Christmas light controll

I know, it's not much to look at. Sue me.

Software

There are 8 pins on a parallel port that are (were) used to send character data to printers, and it's these 8 pins that are used for outputs. That means controlling the outputs is as simple as writing to that byte. The pyparallel library makes this even easier, so for example, I was able to use something like the following, ran from a cronjob to start and stop the lights each day.

python -c 'import parallel; parallel.Parallel().setData(0)'

I wired everything up to the normally closed contacts of the relay board so the lights would fail-safe. In other words, you have to turn the relay on, in order to turn the corresponding lights off. The setData(0) above switches on all of the lights by turning the relays off, killing the lights is as easy as changing that to setData(255).

Initially I had the idea that I'd whip up something to analyze an audio track; that the light show would essentially be a visualization of the waveform. That, as it turns out isn't the panacea that it would seem. Sure, the lights will flash in a way that seems vaguely in response to the music, but the results are just not as coordinated, or ordered, as the samples you see on the Internet.

So I then moved on to the idea of creating a time-series of output states that could be "played" along with the audio, but I was naive to believe that I could hand-craft this data file, so before all was said and done, I'd also written a PyGame application for keying in the outputs as the music played, and visualizing it during playback.

Finally, only after getting everything working I found out that the way "professionals" do this is actually pretty similar to what I came up with, only using MIDI, so I will definitely be looking into that before next year.

Going to FOSDEM

Due to a scheduling conflict, Jonathan won't able to present on Cassandra in the NoSQL devroom at this years FOSDEM, so I'll be going in his stead.

I've always wanted to go to a FOSDEM, and getting to see Brussels will be a real treat as well. I can't wait!

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

NP: Black & White, In Flames

Lowest Common Denominator

From the git-svn manpage:

For the sake of simplicity and interoperating with a less-capable system (SVN), it is recommended that all git svn users clone, fetch and dcommit directly from the SVN server, and avoid all git clone/pull/merge/push operations between git repositories and branches. The recommended method of exchanging code between git branches and users is git format-patch and git am, or just 'dcommit’ing to the SVN repository.

Running git merge or git pull is NOT recommended on a branch you plan to dcommit from. Subversion does not represent merges in any reasonable or useful fashion; so users using Subversion cannot see any merges you’ve made. Furthermore, if you merge or pull from a git branch that is a mirror of an SVN branch, dcommit may commit to the wrong branch.

Or put another way. Because Subversion can't merge for shit, neither can Git if you expect to integrate the two.

Fail.

NP: Blood Milk and Sky, White Zombie

NoSQL: What's in a name?

Depending on the circles you travel in, you might be aware of the whole NoSQL "movement". If not, I'm not going try and explain it at this time (explaining it is sort of the problem), but you can get the general idea from wikipedia.

I've spent the last couple of days at nosqleast and one of the hot topics here is the name "nosql". Understandably, there are a lot of people who worry that the name is Bad, that it sends an inappropriate or inaccurate message. While I make no claims to the idea, I do have to accept some blame for what it is now being called. How's that? Johan Oskarsson was organizing the first meetup and asked the question "What's a good name?" on IRC; it was one of 3 or 4 suggestions that I spouted off in the span of like 45 seconds, without thinking.

My regret however isn't about what the name says, it's about what it doesn't. When Johan originally had the idea for the first meetup, he seemed to be thinking Big Data and linearly scalable distributed systems, but the name is so vague that it opened the door to talk submissions for literally anything that stored data, and wasn't an RDBMS.

I don't have a problem with projects like Neo4J, Redis, CouchDB, MongoDB, etc, but the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for. MongoDB and Voldemort for example set out to solve two very different problems and lumping them together under a single moniker isn't very meaningful. This is why people are continually interpreting nosql to be anti-RDBMS, it's the only rational conclusion when the only thing some of these projects share in common is that they are not relational databases.

The cat is out of the bag though, and the "movement" has enough momentum that I don't think it's going anywhere. And, I'm not really advocating that, it's had the effect of bringing a lot of attention to some very interesting projects, and that's a Good Thing. Maybe Emil Eifrem has the right idea by encouraging people to overload the term with Not Only SQL.

Upcoming travel

I have several trips lined up for the next few weeks:

There is also a NoSQL meetup on November 2 as a part of ApacheCon; I've offered to present on Cassandra there. I'm also thinking of giving a session at BarcampApache, and I'm scheduled to sit on a "SQL vs. NoSQL" panel at OpenSQL, though I'll probably submit a session idea or two there as well.

There are a lot of Cassandra people in the Bay Area, it'd be great if we could setup a hack-a-thon/bug squashing party/meetup/whatever during ApacheCon. Ping me or post something to the list if you are interested! :)

NP: Calling Dr Love, Kiss

Ooops, I did it again

I rewrote my blog software again (actually, it was done months ago but I just now got around to deploying it). The last one used Turbogears, but the 1.x branch is getting long in teeth, and 2.0 came a little too late. Besides, Django is the new hotness these days.

Somehow the rewrite resulted in about half as much code, which is always cool, and I finally got to make use of mod_wsgi, (it is everything that I had ever dreamed it would be, and more :)).

All of the old permalinks should still be valid, and with any luck I managed to avoid DoS'ing everyones feed reader.

Git repo for Cassandra packaging

I put the repository for my Cassandra package up on Github. The repo browser can be found here, and the wiki has a brief writeup of the build process for those unfamiliar with git-buildpackage.

Patches welcome!

NP: Take It Out On Me, Bullet For My Valentine

Day Trip Redux

For my last day in Spain I took Hector's advice and hopped on a high-speed train from Madrid to Toledo.

Toledo is another UNESCO World Heritage Site, a city dating back to the Bronze Age with Christian, Jewish, and Moorish influences. It's a beautiful place and the six or so hours I spent there was woefully inadequate.

There are a few pictures up on flickr, but I took quite a few more that will have to wait until after I'm home.

On the road again

Debconf is over. Boo. :(

Like those I've attended in the past, Debconf9 was well organized with plenty of interesting talks, in a great venue. I had loads of fun, learned a ton, and even managed to get a bit done. Many thanks to the organization team, the local team, the speakers, and the sponsors.

This year I managed to sneak an extra couple of days post-conference which will be spent in the general vicinity of Madrid. I'm going to continue dumping my camera daily so tune into my flickr stream if your interested.

Cassandra 0.3.0 for Debian

As announced here, I put a Debian package together for Cassandra 0.3.0.

I don't have any (immediate )plans to upload a Cassandra package to the Debian archive, (this package isn't even policy compliant), so consider this unofficial and report any packaging bugs directly to me.

   deb http://people.apache.org/~eevans/debian cassandra/
   deb-src http://people.apache.org/~eevans/debian cassandra/
   

Enjoy.

Debconf9: Day trip

Yesterday was the Day Trip at Debconf, an opportunity for folks to step away from their computers (usually), and leave the venue (always) for some sort of group activity or tourism.

When the organizers first started talking about this years Day Trip there were two candidates, Valle del Jerte and Teatro romano de Merida, or "Roman theater of Merida". I'm kind of a history junkie and generally get pretty excited at the idea of touring ruins so I was heartbroken when Merida lost out. The closer it got to the scheduled day the lower my enthusiasm sank, until eventually I just opted out entirely. The obvious by-product of skipping the Day Trip though was a need to find something else to do, and the obvious choice seemed like a trip to Merida. Michael was pretty keen on the idea too.

The entire thing was thrown together very last minute (we barely made it to the bus station), but luckily there was enough time to spot a generous offer via IRC from itais (a Merida local) for transportation from the bus stop to the theater.

As promised, itais was waiting for us at the bus station and took us on a quick tour to see, among other things, a Roman acquaduct and The Arch of Trajan. He then dropped us off in front of the theater with a recommendation for a place to eat.

After an awesome meal we walked the ruins for a couple of hours, tracked down The Temple of Diana, and then settled into the main square for a couple of beers before catching a cab back to the bus station.

Much fun.

A few of the pictures I took can be seen here, and I'll get the rest up eventually.

Cassandra 0.3.0

The Apache Cassandra team has managed the release of 0.3.0, its very first.

It took a lot longer than I had hoped to get a release in the can, (almost a month from the approval of the last release candidate). Part of this was a lack of familiarity with ASF's processes, but part of it was poor or incomplete documentation, or lack of consensus about what is required. In the end, it boiled down to a combination of carefully studying what other poddlings had done, and a few iterations of trial-and-error.

I'm confident that this will all go much smoother for 0.4.0, (which is progressing nicely, and should be ready Real Soon Now).

Debconf9

Going to Debconf9

A week from today and I'll be headed to Spain for Debconf9. Can't wait!

NP: Worlds Collide, Apocalyptica

Transitioning My GPG Key

A few months ago a group of researchers announced a fairly serious attack that shattered everyone's faith in SHA-1. It has frightening implications for anyone who relies on cryptographic signatures, and while consensus is that there is little danger in the near-term, most people agree that now is the time to start a move to something stronger.

So, I've begun my transition, (document here), and submitted my new key for the Debconf9 signing party later this month. I intentionally left out any mention of a time-line in the transition doc, and I'm in no big hurry. I'll retire the old key once I have enough signatures, or once there is evidence of a real threat, whichever comes first.

NOSQL 2009

Johan Oskarsson has organized a meetup for folks interested in distributed structured data storage and is calling it NOSQL. The event, being held June 11th in San Fransisco, will have subject matter experts presenting on Hypertable, HBase, Voldemort, Dynomite, and Cassandra.

There were 100 slots available slots to attend and they all went in a matter of hours, so if this is the first you've heard of it, it's probably too late. Fortunately I got mine and thanks to the support of my employer I'll be there. I'm looking forward to it.

Howto: Epson Perfection 3940 on Debian

Getting the Epson Perfection 3940 scanner setup on Linux requires jumping through just enough hoops that even if you have managed it before, it's easy to forget when it comes time to do it again.

I put this here in the hopes that it will make things easier for someone, (and it's entirely likely that someone will be me one day :).

  1. Make sure your user is a member of the scanner group, (adduser youruser scanner).
  2. Install xsane, (sudo aptitude install xsane).
  3. Get the firmware. It seems most people grab one of the Suse RPMS of iscan-firmware.
  4. Get the firmware out of the RPM (rpm2cpio iscan-firmware-1.18.0.1-10.i586.rpm | cpio -i --make-directories).
  5. And install it, (sudo install -m 644 usr/share/iscan/esfw52.bin /lib/firmware/)
  6. Edit /etc/sane.d/snapscan.conf and adjust the firmware directive to point to /lib/firmware/esfw52.bin
  7. Profit.

Thrift Packaging

My latest project at work is Cassandra, a distributed, eventually consistent, column oriented data store. It's somewhere between Dynamo (Cassandra's original author worked on Dynamo), and Google's BigTable. It was developed as an internal application at Facebook, later open sourced, and is now an Apache incubator project.

The external interface to Cassandra is thrift-based. Thrift is a framework for creating network services, services that communicate using a compact binary data format. It's similar to Google's Protocol Buffers, but with more of a focus on RPC, and greater language coverage, (much greater actually). The bottom line, any application that uses Cassandra for structured data storage is going to need Thrift. So, I filed an ITP (Intent To Package) and have started work on packaging it for Debian.

Thrift is an interesting project to package as it has an architecture specific application (C++), 6 architecture specific and 5 architecture independent libraries, and covers 12 different languages. That's right, 12.

I'm still somewhat undecided on a game plan; the options I've considered so far are:

  1. Convince upstream to split their source tree and distribute all of these libraries separately, allowing them to be packaged by people with the skills and/or motivation for each.
  2. Split the source myself and package the bits that are most important to me.
  3. One source package based on the official upstream release, with binary packages for each of the components that I need/am comfortable maintaining. Folks interested in the parts not packaged could step up to the plate and contribute their time.
  4. Best-effort packaging of most/all of the libraries upstream ships with the proviso that for any I'm not comfortable seeing in a release, and for which no one has stepped forward for, they would be removed prior to Squeeze.

I've already taken a stab at #1 and it didn't seem promising. #2 is an option I still consider on the table but I'm a little concerned that it could lead to a mess. #3 and #4 really boil down to the same thing, collaborating with others to package as much as possible while maintaining the standards everyone expects from Debian. I guess I'm currently leaning toward some variation of #3 or #4, probably through the use of collab-maint or a dedicated Alioth project.

For the time being, my efforts can be tracked in Git here, so drop me a line if you're interested in joining the fun!

NP: Sand and Mercury, The Gathering

Swine Flu

I don't have it. Do you?

Brain Rewiring

A co-worker of mine uses one of the stranger keyboards I've seen, a Kinesis Advantage.

Kinesis Advantage Keyboard

He picked it up his after a bout with tendinitis and was sold on it. He was kind enough to let me borrow his spare for about a week so I could try it out. It's been an interesting week. :)

The Advantage differs from conventional keyboards in a number of ways, the ones I think most relevant are:

  • The separation of the left and right sides of the keyboard, done to keep you from pivoting your hands side-to-side at the wrist as you type. A lot of keyboards address this by creating a break in the middle and angling the two sides outward (everyone has seen the MS Naturals), but not having to turn your arms inward feels more comfortable/natural to me.
  • Keys that are arranged into a concave surface as opposed to a flat one. This might seem strange, but the curvature lines up well with the arc your finger tips travel in, and positions the keys within closer reach of one another.
  • The keys are also arranged on a vertical axis to one another, as opposed to being staggered. So for example the C key is directly below D, not below and to the right. Moving your fingers from their keys on the home row to the corresponding keys above and below is a much more natural movement.
  • Key layout is different as well. You're expected to do quite a bit more with your thumbs. The Backspace, Delete, Home, End, and Control and Alt keys are positioned within reach of your left thumb, your right works Space, Enter, Page Up and Down, in addition to another Control key, and a Windows key (which I remap to Alt). This really makes sense if you think about, why waste two perfectly good fingers on the same key, when you could put them to use and eliminate all of that reaching.
  • The keys have outstanding tactile feedback, in addition to an audible feedback (something between a faint click and a beep emitted by a speaker somewhere inside). I find this feedback helpful in maintaining a light touch on the keys since I often catch myself banging keys pretty hard on normal keyboards.

I'm not going to lie though, it does take some getting used to. The biggest problem I had was Space vs. Backspace, which are the right-most thumb key, and left-most thumb key respectively. Prior to all of this I heavily favored my left thumb for striking the space bar, and muscle memory is a bitch when it causes you to Backspace when you meant Space.

Other points of frustration were the tilde/back-tick key (located bottom-left instead of top-left), and the left and right bracket/brace keys (located bottom-right). These keys are used a lot in a shell or when coding, which probably made the pain even more pronounced for someone like me.

I managed to force myself to use nothing else for several days, at which point I felt I was doing quite well. I still had the occasional problem here and there, but it seemed like I was well on my way to normalcy. Then I tried using the built-in keyboard on my laptop. Wow. Epic fail. It took a few more days and plenty of patience before I was able to move back and forth (and truth be told it's still a little awkward).

So was it worth it? Yeah, I think so. I've had RSI troubles of my own and a week of typing on this keyboard has felt pretty good. I've ordered one of my own to use at work, and I'll probably grab a second one for home.

Lenny Released On Time

Lenny released yesterday. This is great news, and congratulations all around to everyone that worked their asses off making it happen.

By my calculations this comes 677 days after the initial release of Etch, (or 22 months and change). I've said before, Debian releases When Ready and that (to the best of my observations), consensus seems to be that somewhere between 18 and 24 months is the sweet spot. Not only does this make for the second "on-time" release in a row, but there was an Etch-And-A-Half sporting new kernels and video drivers in the mean time.

With any luck the various "Debian is too old/can't release" memes will finally die.

Cat-a-log FAQ

I received a surprising number of questions related to the cat-a-log series. I've attempted to collect them all, and answer them here.

Q: Does your wife think all cats are sweet? What about dogs?

A: Any member of felis catus is by definition unconditionally sweet. She also views many dogs as sweet, though they tend to be held to a much higher standard, (for example, "humping" is unacceptable and any dog known to have committed such an act is Not Sweet).

Q: Could I interest you in a cookbook of Vietnamese recipes?

A: Sure. I always enjoy sampling new ethnic dishes.

Q: I'm moving soon and am unable to bring my pet anaconda with me, will you give it a home too?

A: Possibly. How big is it? What does it eat?

Q: These posts are fake, right? You don't really live with all those cats, do you?

A: I shit you not.

Q: Do you plan to update your blog if you get any new cats?

A: Absolutely not. If any more cats show up here I plan to kill myself.

That's all (the cats) folks

Apparently there is in fact a 15th cat (or a 14th for you purists out there). If cataloged, it would have been Unknown #3, an as-of-yet unnamed cat that comes to my porch to eat the food my wife puts out. I wasn't able to get a picture, and I'm not sure I recall ever seeing it (they all look the same to me anyway). A shame really, rumor has it that it's missing a paw, I could have had some fun with that.

Interestingly, I received quite a few comments on these articles, almost all of which could be described as schadenfreude. There were also a number of questions, I will attempt to answer those in another post.

Cat-a-log Day 14: Pepe

Last but not least, Pepe.

Pepe

Alright, granted this one might be borderline, but to be fair, many people refer to these guys as Polecats. It comes onto my porch to eat from a dish of food, and my wife and daughter have named it, I think that counts for something.

Other interesting facts:

  • Smells awful.

Cat-a-log Day 13: Unknown 2

Unknown #2... or is it Zorro?

Unknown 2

Yet another cat that I suspect has applied for citizenship under the terms of If it eats, we must greets. I refer to it here as Unknown #2, but someone let slip the name "Zorro" when the subject came up.

Other interesting facts:

  • None.