Wednesday, October 14, 2009

biostuff

I've been trying in 2009 to write less throw-away code. I'm not sure how successful I've been at that, but at least I'm writing more code that I keep around.
Previously, I stuck anything of at least marginal quality and re-usability into my google code project bpbio. As of yesterday, I've moved a lot of stuff from there to bitbucket. "Biostuff" is where I'll put modules that are well documented and tested in hopes that using a distributed VCS and a project that doesn't contain my initials will foster any contribution. Currently, all the modules on bitbucket are also on pypi.

pyfasta provides pythonic access fasta sequence files. Previously, it had been a part of genedex (which I've stopped supporting since @hobu has done so much good work on Rtree that genedex is now pretty much obsolete) but it's been pulled out and simplified and improved. Check out the docs on pypi.

nwalign is a command-line or python interface to the Needleman-Wunsch global sequence alignment which I've blogged about previously. Whenever I need to do stuff with cython and numpy I use nwalign.pyx for reference (though there's probably better material out there).

simpletable, as the name suggest is a wrapper around pytables to remove some of the boiler-plate in creating a table and dataset and Description. That's it.

skidmarks is a small module to check for runs in data (get it skidmarks, runs?). It implements Wald-Wolfowitz, autocorrelation, serial, and gap tests. Each function (implementing one of those tests) returns a p-value which indicates the level of support to reject the null hypotheses that the sequence is random and the chi-square or z-score value as appropriate. I've been using this and monte-carlo simulations to see if runs in genomic data could be explained by random events.

Any contributions, suggestions, or bug reports are welcomed--the interface at bitbucket should make this easier to do, just fork and fix and pull-request.
Meanwhile, my less documented/ more crappy code will continue to live on google code--at least until it matures. I've got a couple modules in the pipeline that will be added once they're cleaned up and documented.

No comments: