On Thu, 6 Jul 2017, Raja Nandakumar wrote:
Hi Ivan,
For my curiosity, how do you actually submit a job? Do you use Ganga or the DIRAC api directly?
Currently I'm using Ganga, tho' Simon tells me he finds it easier to use direct commands. What I'm trying to do (and have a small amount of money towards my salary for) is set up a pilot project for "Big Data" for a bioinformatics group here at Brunel. Trying to think back to what I've heard about such things from GRIDPP meetings, I dredged up the keywords "ganga" and "dirac" so that's what I'm going for at present. The aim is to make it as simple as possible for non-Grid scientists (who seem to know python). As a (totally non-debugged as yet!) example, here's my first pass that I came up with for this multi-core submission just before lunch:
cat submit_spades.py import os exefile=('spades.sh') os.system('chmod +x %s' % exefile) j=Job() j.backend=Dirac() j.application=Executable() j.application.exe=File(exefile) j.inputfiles=[ LocalFile('bin.tgz'), DiracFile('LFN:gridpp/user/i/ivan.reid/embl/SRR2099924_1.fastq.gz', DiracFile('LFN:gridpp/user/i/ivan.reid/embl/SRR2099924_2.fastq.gz') ] j.outputfiles= [ LocalFile('contigs.fasta.gz'), LocalFile('scaffolds.fasta.gz') ] j.submit()
cat spades.sh #!/bin/sh tar -xvzf bin.tgz time bin/spades.py -k 21,33,55,77 --careful --pe1-1 SRR2099924_1.fastq.gz --pe1-2 SRR2099924_2.fastq.gz -o spades_output -t 8 gzip spades_output/contigs.fasta gzip spades_output/scaffolds.fasta mv spades_output/*.gz .
Of course, if anyone spots some obvious bug/miscomprehension in that, it could save me some debug time...
Regards, Raja.
Cheers, ivan -- Ivan Reid (ivan.reid@[brunel.ac.uk|cern.ch]) Engineering, Design & Physical Sciences CMS Collaboration, Brunel University London. Room TOWD405 CERN, Room 40-1-B12