I added a document to my github repo describing how to display a bed file in the browser. That rst is here and displayed in inline below.
It uses the UCSC binaries for creating BigWig/BigBed files because dalliance can request a subset of the data without downloading the entire file given the correct apache configuration (also described below).
This will require a recent version of dalliance because there was a bug in the BigBed parsing until recently.
Dalliance Data Tutorial
dalliance is a web-based scrolling genome-browser. It can display data from
remote DAS servers or local or remote BigWig or BigBed files.
This will cover how to set up an html page that links to remote DAS services.
It will also show how to create and serve BigWig and BigBed files.
Note
This document will be using hg18 for this tutorial, but it is applicable to
any version available from your favorite database or DAS .
Creating A BigBed
Getting a bed file from UCSC
From the UCSC table browser choose
- genome: Human
- assembly: NCBI36/hg18
- group: Genes and Gene Prediction Tracks
- track: UCSC Genes
- table: knownGene
- output format "selected fileds from primary and related tables"
- in text box, name it "knownGene.hg18.stuff.txt"
- click "get output"
- check kgXref under 'Linked Tables'
- click 'Allow Selection From Checked Tables' at bottom of page.
- check 'geneSymbol' from hg18.kgXref fields section
- click 'get output' and a file named 'knownGene.hg18.stuff.txt' will be saved to your downloads directory. move it to your current directory.
To get this into bed format copy and paste this onto the command-line:
grep -v '#' knownGene.hg18.stuff.txt | awk 'BEGIN { OFS = "\t"; } ; { split($9, astarts, /,/); split($10, aends, /,/); starts="" ends="" for(i in astarts){ if (! astarts[i]) continue ends=ends(aends[i] - astarts[i])"," starts=starts(astarts[i] = astarts[i] - $4)"," } print $2,$4,$5,$1","toupper($13),1,$3,$6,$5,".",$8,ends,starts }' | sort -k1,1 -k2,2n > knownGene.hg18.bedTo create a BigBed from this, do (note if you're not on a 64 bit
machine, you'll have to find the 32bit binaries:wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/fetchChromSizes wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed chmod +x fetchChromSizes bedToBigBed ./fetchChromSizes hg18 > data/hg18.chrom.sizes ./bedToBigBed knownGene.hg18.bed data/hg18.chrom.sizes knownGene.hg18.bb
now knownGene.hg18.bb is a BigBed file containing both the UCSC and the common
name in the name column.
SQL
UCSC also has a public mysql server so the process of downloading to a bed can be simplified to:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -P 3306 -e "select chrom,txStart,txEnd,K.name,X.geneSymbol,strand,exonStarts,exonEnds from knownGene as K,kgXref as X where X.kgId=K.name;" > tmp.notbed
grep -v txStart tmp.notbed | awk 'BEGIN { OFS = "\t"; } ;
{ split($7, astarts, /,/);
split($8, aends, /,/);
starts=""
sizes=""
exonCount=0
for(i in astarts){
if (! astarts[i]) continue
sizes=sizes""(aends[i] - astarts[i])","
starts=starts""(astarts[i] = astarts[i] - $2)","
exonCount=exonCount + 1
}
print $1,$2,$3,$4","$5,1,$6,$2,$3,".",exonCount,sizes,starts
}' | sort -k1,1 -k2,2n > knownGene.hg18.bed
then proceed as the last steps above to create the big bed file.
Displaying A BigBed in Dalliance
From there, download dalliance:
$ git://github.com/dasmoth/dalliance.git cd dalliance
and edit test.html, adding:
{name: 'UCSC Genes',
bwgURI: '/dalliance/knownGene.hg18.bb',
},
before the line that looks like:
{name: 'Repeats',
at around line 55.
Then edit your apache.conf (e.g. /etc/apache2/sites-enabled/000-default)
and put the following
(here i assume you cloned dalliance into /usr/usr/local/src/dalliance-git):
Alias /dalliance "/usr/local/src/dalliance-git"
<Directory "/usr/locals/src/dalliance-git">
Header set Access-Control-Allow-Origin "*"
Header set Access-Control-Allow-Headers "Range"
Options Indexes MultiViews FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all
</Directory>
Then enable mod-headers apache module. On Ubuntu, that looks like:
sudo a2enmod headers
Then point your browser to:: http://yourhost/dalliance/test.html
And you should see the your 'UCSC Genes' track in full glory along
with the other niceties of the dalliance browser.
6 comments:
Thanks for the post. I was wandering how Dalliance compares to JBrowse? I'm new to genome browsers and I can see that both of these tools have a great potential in becoming really useful tools for personal genomics.
@Sarkis, I'm not much of a perl user and definitely not gmod, so I don't know much about jbrowse. If you're into gmod, i think jbrowse is probably a great choice.
I think that the code in dalliance is a bit hard to extend (I looked into it some--I reported and fixed the bug in the BigBed parsing) so it's not impenetrable.
But, it does give you a nice browser for both numerical and feature data and allows you to access remote servers.
Basically all you need is a web-server and a javascript and you're ready to go with dalliance. With jbrowse, you'll need quite a bit more, I suspect.
Jbrowse performs a bit better for me, but you have to preprocess your data.
Dalliance needs the ability to handle click events through a callback or something.
@casbon What kind of callbacks are you after? Dalliance should do sensible things with LINK elements in DAS data. We don't currently have a way of linking from bigbed data but it's a fairly straightforward extension and something we can certainly implement quickly if there's demand.
Or do you mean a way of channeling feature-click events to a fragment of javascript provided by the page in which Dalliance is embedded? This is currently missing from our embedding API, but again wouldn't be too hard to add, and if you've got a use case I'd be very interested to discuss.
Nice post. Actually though, you don't need to configure a webserver to use dalliance if you just want to browse locally on your own machine. Just open the test.html file directly in your favour browser. Dalliance can access indexed binary files directly from your hard disk as well as those on remote webservers and DAS sources. Tim
@Tim good point, it's definitely simple set up.
Post a Comment