irods_testbench_training_py.../covid_genome_upload/README.md

3.1 KiB
Executable File

Arctic nanopolish

This is a modified bash script to upload fast5/fastq results into iRODS. Most sections are commented as it has not been tested on the GRID genome sequencing machines. This is an example based on the customers existing script and an .odt document outlining the workflow.

Install iRODS client on GRID server, icommands are used in script

https://packages.irods.org/ setup package repositories https://github.com/irods/irods_client_icommands do not follow this use for client host reference sudo apt-get install irods-icommands irods-dev irods-runtime

  • The irods client will require a service account for this host, the host cannot join the domain via SSSD as it is sold as an appliance and updated by the vendor under a service contract.
  • The irods config will require a resource for the data, the resource is loosely a network disk
  • The irods config will require a top level collection, this is akin to a directory and can have permissions granted recursively for whomever requires access to the data
  • Data objects (files) maybe uploaded to the collection and then tagged with metadata or can be tagged on upload with metadata using the iput command

Sample client config file follows. In this case the user_name is an LDAP user (windows active directory) authenticated systemwide via the pam auth stack. A local to iRODS service account and password pair will likely reside within this file with irods_authentication_scheme set to native.

[toby.seed@phe.gov.uk@smedmaster02 ~]$ cat << EOF > ~/.irods/irods_environment.json

{
    "irods_host": "irodscol01.unix.phe.gov.uk",
    "irods_port": 1247,
    "irods_user_name": "toby.seed@phe.gov.uk",
    "irods_zone_name": "PHE",
    "irods_default_resource": "s3_compound",
    "irods_authentication_scheme": "PAM"
}

With a working config the client will be authenticated against the iRODS server with the iinit command and checked with the ienv command. Depending on the irods server configuration the token may last for up to two weeks, it maybe necessary to ensure the bash_rc login script runs iinit on login or to be atomic, run iinit at the top of the various workload scripts.

rough requirements to test in a live environment

  • a resource
  • a service account for this host
  • network connectivity to the target irods server @ tcp 1247
  • a top level collection with some recursive permissions for users requiring access fast5/fastq data

Using iCommands to upload files to iRODS

resource ~ network disk collection ~ directory object ~ file

  • generally you would create a collection on your resource to put your files
  • it is likely you would create a collection (runnameXYZ) within a collection that already has recursive permissions for a group of users; /PHE/projectXYZ/runnameXYZ
  • the iput command will push objects, it may also push collections recursively, irsync is much preferred for this task to ensure data integrity
  • the imeta command will list/add/remove metadata for collections and objects that have been uploaded
  • the ils command will list file attributes and permissions
  • the irm command will remove files from the irods storage
  • https://docs.irods.org/master/icommands/user/