With this option, a file like:
will overwritten (flattened in-place) to:
simple. When dealing with large files (where this is actually useful), the flattened file does not behave well when opened in an editor because the editor will attempt to read a number of lines into the buffer and a single line may be 200 mega-bases. So, this is a pain if you're planning to sit down with a cup of joe and read through the genome, but otherwise, the fasta file should be un-affected.
This method is not currently the default (though it may be so in future versions). But, it's possible to use the commandline:
pyfasta flatten some.fastawhich will create the flattened fasta (and the index file) and a placeholder some.fasta.flat, containing the text "@flattened@" as a marker to pyfasta that it's ok to use the original (now-flattened) fasta. Once the file is flattened, there is no performance loss compared to having a separate flat file containing no headers.
pyfasta was a fun project for me in 2009. It's a ridiculously simple little module, but when I started it, I didn't think there was a good alternative. (Though discriminating Fasta-ers should look at the sequence module in pygr, and the Bio.Seq module in BioPython which I think has improved quite a lot recently). It has over 100 tests and very close to 100% test coverage for the modules in pyfasta, and much of the code is run once for each of the 4 backends.
the source is on bitbucket.