JOSHUA HOLBROOK flickr : github : twitter

SciPy2010, Day 3 (posted 30 Jun 2010)


Oh geez, I hardly slept last night. Jet lag is kicking my ass. Still, I absorbed some stuff from this morning.

Opening Stuff

  • Python 3 was a big theme this year. Numpy is weeks away from working well with 3.x, and scipy is next. Cython is also trying to make the move.

  • Matplotlib: Going gold? 1.0 is coming, and trunk has a lot of new, really cool-looking features! It looks like it hasn’t hit their front page, though. One of the features they’re really proud of is a new html5 front-end. This is kinda cool, though I think I’d use a native javascript library for that sort of thing.

  • Something I liked: Cython can rock generator expressions now.

  • A lot is going on in the iPython world. According to Fernando Perez, they spent a lot of time refactoring and componentizing the codebase. They are also looking to incorporate ZeroMQ, and seem pretty excited about it.

  • Github! Perez directly addressed the move of ipython’s source control from bzr/launchpad to git and github. They were even able to transfer their “issues” to github, which is pretty rad. There are also murmurs regarding moving numpy/scipy to github as well. There’s a meeting on switching to git tonight. I’m crossing my fingers, because I adore github.

  • EPD’s numpy version can make use of MKL, if you’re into that sort of thing.

  • ETS is starting to rock QT4, as an alternative to WX. Cool, cool.

David Beazley

You may know him as the creator of SWIG, but what they don’t tell you is that he’s REALLY FUNNY. Here are some quotes from his talk, paraphrased:

  • Sure, I could write everything in C++
  • And I could also poke myself in the eye

” Back when I worked at Los Alamos, we had two enemies: The soviets, and Lawrence Livermore National Laboratory. ”

“Please don’t reimplement CORBA … or SOAP … or any other overengineered solution that makes you want to quit programming and join a band. …I already did that once.”

His talk was on concurrency, and in particular on the attitudes that programmers have towards it. He called it, “DIY Concurrency.” Unfortunately I don’t think I could really explain where he was coming from with any justice, but his talk reminded me of my impression of ZeroMQ, in that ZMQ is less a fully-fledged solution to message-passing and more of a framework for “rolling your own.”

Some technologies he talked about though, which made for Pretty Awesome Demos:

  • xmlrpclib
  • listening with the multiprocessing module
  • using the logging module to post to Twitter:

(flickr)

Now that’s a myaverick!

Theano

Something Brian Granger talked about the other day was the “magic button” approach with regards to speeding up code. While in many cases this is a pipe dream, I think Theano is part-way successful when it comes to its domain.

I think you can get most of the talk’s content from Theano’s website, but it was still pretty cool having it explained to me first-hand.

Robot Vision!

3-D imaging action

This technique used pictures (3 near-orthogonal ones) of an object on a table and then used an algorithm to generate a points model of the object.

The description of the algorithm made me think of vacuum-sealing. It first finds a bounding sphere, and then shifts the points inward in 3-space until the points shrink-wrap the data. Cool stuff!

Reading lips!

Helge (the guy that suggested we team up for lunch the other day) is giving a talk on computers reading lips. Fascinating! Apparently he got the idea from watching 2001: A Space Odyssey. Unfortunately I was too busy trying to catch up to get some of the early details. :( But, as I’m sure you can imagine: really cool. This project used scipy and scikits extensively.

SHOGUN

It’s actually pronounced sho-GOON, not SHOW-gun. Which reminds me, when Stefan gave his tutorial, he says GNU Octave as “Oct-AYVE” instead of “OCK-tiv.” This is only really funny because saying it like “Oct-AYVE” was an inside joke, indicating a total willful ignorance regarding music.

Anyways: My battery is about to die and I haven’t found a plugin, so more later!

Edit: Here’s a link). Shogun is actually really neat, and I’d love to find a reason to use it. It’s apparently really good if you want to use the support vector machine algorithm with complex kernels.

Lunch

A fellow redditor offered to take me to Houndstooth, and I took him up on it. These guys seriously make a hell of a cup of coffee. This was the Godly experience I’ve been looking for to cancel out the stupid hotel coffee and clear out the cobwebs. From what I understand, their espresso machine is like a $25,000 dollar masterpiece.

Starcluster

Here’s the github repo. This is a pretty fast talk and poor Justin ran out of time, but the basic idea is that it does all the setup of a cluster on EC2 for you. Instead of spinning up custom instances, you can spin up an entire cluster, all set up and ready to rock, with the use of StarCluster. Definitely a cool idea! I kind of have this image of, well, a cluster in a box.

Pomsets

I don’t know if pomsets is my cuppa tea, but there was definitely a nice overview here on What You Need for a cloud computing setup. Michael Pan presented the following list:

  • VMs.
  • Dynamic provisioning. My impression is that this is what’s behing the whole, “Spinnin’ instances like crazy” part of cloud computing.
  • Task Partitioning. This is where you split your job up into parallelable chunks, using a technique such as MapReduce.
  • Data Distribution. This is how you send your job to the clusters. One way is with a shared filesystem, though it’s pretty well-known that a single NFS share with 60 PCs pulling from it at once tends to choke hard-core.
  • Messaging. Think MPI, RabbitMQ or ZeroMQ.
  • Workflow Management. This is what Pomsets tries to fill. My impression is that this is meant as a level of abstraction over task partitioning, data distribution, etc. where you tell your cluster what you want to do. Hadoop, for example, has pig, hive and cascading as options. It’s hard to tell if Pomsets is a compelling option, but Mr. Pan seems pretty optimistic, and apparently their start-up is working out pretty well. Interestingly enough, it sounded like they didn’t drink the y-combinator kool-aid, so to speak.

Disco

Disco is a MapReduce framework written in python and Erlang, utilizing python for writing functions and erlang on the back for sweet sweet concurrency action. It looks like you can monitor your Disco server and access your results via javascript/dhtml as well! I would really like to try this out!

StarFlow

The project hasn’t been officially released, but their Bitbucket repo is available. My impression of StarFlow is that it combines the paradigm of multiple applications of things like

cat in.csv | something > out.csv #probably sans-cat

with cluster computing. This is exciting because this really is how I work a lot of the time. I think this would cough mesh well with my current research, actually.

It also seems StarFlow and StarCluster work really well together. Awesome!

As an aside: By browsing about, I found that one of the devs made what looks like a fuckin’ sweet tabular data module for python, and saved me the effort of having to do it myself. Maybe I can combine it with my ideas involving built-in interpolation. Totally gonna fork that shit.

Friends:

ALL THE BLOGPOSTS STANDING IN THE LINE FOR THE BATHROOM: