initial commit

2022-10-26 17:52:23 +01:00 · 2022-10-26 17:52:23 +01:00 · d1e202efb6
commit d1e202efb6
20 changed files with 2592 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,6 @@
+# What is this?
+
+A collection notes to build an iRODS testbanch with S3 compatible back end.
+Some modifications to bash scripts used on gene sequencing appliances, tag metadata to results.
+Training examples of the iRODS rules language for searching and tagging.
+Python script to find files on a SMB network share, search by MS Windows creation/modified dates, build a modifiable toml config file listing all files in share and allow adding additional metadata on a per file basis, run the batch job to upload files directly to iRODS.
--- a/client_metadata_example/README.md
+++ b/client_metadata_example/README.md
@ -0,0 +1,83 @@
+# Example client side iRODS rule to show metadata operations
+
+- The rule requires the local host be connected/authenticated to iRODS server with the appropriate ~/.irods directory and contained connection profile
+- The rule does not require any input
+- The rule is written in the iRODS native rule engine language
+- The rule is intended to be run client side only, most functions and microservices will work in server side rules, some will not
+- The rule intends to assist users with common metadata manipulation tasks
+- To augment existing workflow scripts, the user could call their custom metadata rule passing an iRODS data-object as INPUT
+
+## Run the rule
+
+- remove the data-object generated from iRODS rule
+`irm test.txt`
+- clear the trash to ensure there is no history of the data-object
+`irmtrash`
+- run the rule, pass -t for test mode,-F for file (data-object), specify the rule engine with -r (this is only required if multiple rule engines are present on the server - such as the python rule engine)
+`irule -r irods_rule_engine_plugin-irods_rule_language-instance -F ~/test-meta.r`
+- check test.txt now exists
+`ils test.txt`
+- check metadata of test.txt, -d for file (data-object)
+`imeta ls -d test.txt`
+- the rule output should resemble
+```
+a = 10
+zebra = horse
+hula = dance
+art = crayon
+timestamp = 2021-06-30.12:09:54
+a || 10
+zebra || horse
+hula || dance
+art || crayon
+timestamp || 2021-06-30.12:09:54
+a
+zebra
+hula
+art
+timestamp
+1: a = 10
+1: zebra = horse
+1: hula = dance
+1: art = crayon
+1: timestamp = 2021-06-30.12:09:54
+2: a = 5
+2: zebra = horse
+2: no match
+2: no match
+2: timestamp = 2021-06-30.12:09:54
+3: AND a = 10
+3: zebra = horse
+3: art = crayon
+3: timestamp = 2021-06-30.12:09:54
+4: collection exists: /OCF/home/rods true
+4: file created: /OCF/home/rods/test.txt
+4: metadata to keep on /OCF/home/rods/test.txt, a=10
+4: metadata to keep on /OCF/home/rods/test.txt, art=crayon
+4: metadata to remove from /OCF/home/rods/test.txt, hula=dance
+4: metadata to remove from /OCF/home/rods/test.txt, timestamp=2021-06-30.12:09:54
+4: metadata to remove from /OCF/home/rods/test.txt, zebra=horse
+4: year: 2021
+4: file: test.txt created between 2020 - 2030
+5: writeoff: catC saab £1200
+```
+
+## What does the script do?
+
+- creates a metadata array like object from a delimited string of key value pairs
+- adds key value pairs to the metadata object
+- finds values from keys and keys from values
+- example of conditional behaviour based upon key value match
+- tests for existence of file in iRODS
+- add/remove metadata conditionally where file (data-object) iRODS exists
+- creates a file (data-object) in iRODS with simple string content, the file did not pre-exist on a local filesystem
+- convert metadata object to metadata structure and apply to iRODS file (data-object)
+- use inbuilt function to list iRODS path (collection) and iRODS file (data-object) from a full iRODS path (ZONE + collection + data-object)
+- perform SQL like query against the iCAT database to find a file (data-object) with specific keys or values, output specific metadata keys to metadata structure from query result
+- loop metadata structure conditionally and remove unwanted key value pair
+- remove key value pair without looping using more recent inbuilt function
+- use inbuilt function to get and format date/time, use inbuilt split function to create year/month/day key value pairs
+- accociate metadata structure directly to an iRODS file (data-object)
+- perform SQL like query against the iCAT database to find a file (data-object) with a key value between a range of values
+- illustrate usage of the new microservice msiModAVUMetadata to perform all metadata operations that previously required many intermediatory steps
+- car lot example, use existing metadata to decide if a car is to sell or writeoff, the *new* unit attribute of a key value pair is used more granular metadata evaluation 
--- a/client_metadata_example/test-meta.r
+++ b/client_metadata_example/test-meta.r
@ -0,0 +1,221 @@
+main() {
+  #### metadata examples ####
+
+  # before version 4.2.7 several microservices(functions) are required achieve metadata manipulation
+  # generally data structures of key pairs are created, applied or removed from the data-object/collection/resource
+  # from version 4.2.7 the msiModAVUMetadata microservice greatly simplifies metadata transformation
+  # for any version - retrieve metadata with sql like queries against the iRODS database, this is equivalent to the 'imeta ls' command
+
+  ### metadata structures
+
+  # sample metadata in the % delimted format for msiString2KeyValPair
+  *str = "a=10%zebra=horse%hula=dance"
+
+  # load kvp structure from string
+  msiString2KeyValPair(*str, *kvp);
+
+  # add some keys with values
+  msiAddKeyVal(*kvp, "art", "crayon");
+  msiGetSystemTime(*Time, "human");
+  msiAddKeyVal(*kvp, "timestamp", *Time);
+  
+  # print all keys and values
+  msiPrintKeyValPair("stdout", *kvp);
+  writeKeyValPairs("stdout", *kvp, " || ")
+
+  # print all keys
+  foreach(*kvp) {
+    writeLine("stdout", *kvp);
+  }
+
+  # print all keys with values
+  foreach(*i in *kvp) {
+    msiGetValByKey(*kvp,*i,*result);
+    writeLine("stdout", "1: *i = *result"); 
+  }
+
+  # print key=value where key like zeb*
+  # print key=value where value matches *Time (set with msiGetSystemTime above)   
+  # print key=value where string value matches 10, convert to int and divide
+  foreach(*i in *kvp) {
+    msiGetValByKey(*kvp,*i,*result);
+    if (*i like "zeb*") then {
+      writeLine("stdout","2: *i = *result")
+    } else if (*result == *Time) then {
+      writeLine("stdout","2: *i = *result")
+    } else if (*result == "10") then {
+      *number=int("*result")/2
+      writeLine("stdout","2: *i = *number")
+    } else {
+      writeLine("stdout","2: no match")
+    }
+  }
+
+  # more conditional behaviour
+  foreach(*i in *kvp) {
+    msiGetValByKey(*kvp,*i,*result);
+    #if (*result == "dance" || *i == "art" || *i == "zebra") then { # this is a valid condition, however multiple OR with != / "not like" operators are not pre-evaluated correctly
+    if (*i == "a" && *result == str(10)) then { # must preceed the else statement or will not be matched
+      writeLine("stdout","3: AND *i = *result")
+    } else if (*result not like "dance") then {
+      writeLine("stdout","3: *i = *result")
+    }
+  } 
+
+  ## Add/Remove metadata for files(data objects) or directories(collections) ##
+ 
+  # print all session variables (rei) with msiGetSessionVarValue, when running the rule locally without script parameters the only useful variables are userNameClient= / rodsZoneClient=
+  # when run from the server rules engine, there are many more useful session variables
+  #msiGetSessionVarValue("all", "client");
+
+  # access the variables as $<variable name>
+  *testpath = "/$rodsZoneClient/home/$userNameClient"
+  *newfile = "/$rodsZoneClient/home/$userNameClient/test.txt"
+  
+  # test for valid path, errorcode microservice is used to ensure the script does not exit on failure instead return boolean
+  *a = errorcode(msiObjStat(*testpath,*status)) >=0
+  writeLine("stdout","4: collection exists: *testpath *a")
+
+  # if path exists add/remove metadata
+  if (errorcode(msiObjStat(*testpath,*status)) >=0) then {
+    # remove file without sending to trash and unregister from database
+    if (errorcode(msiObjStat(*newfile,*status)) >=0) then {
+      msiDataObjUnlink("objPath=*newfile++++forceFlag=++++unreg=",*status)
+      writeLine("stdout","4: file removed: *newfile")
+    }
+
+    # create a file, forceFlag attribute required to overwrite file, resource can be specified here, note the field delimiter "destRescName=demoResc++++forceFlag="
+    *content = "test.txt content"
+    msiDataObjCreate(*newfile,"forceFlag=",*file_descriptor)
+    msiDataObjWrite(*file_descriptor,*content,*write_length)
+    msiDataObjClose(*file_descriptor,*status)
+    writeLine("stdout","4: file created: *newfile")
+
+    # apply metadata to object from kvp structure
+    msiAssociateKeyValuePairsToObj(*kvp,*newfile,"-d")
+
+    # get data object and collection from a full path string
+    #*filepath_element = ( size( (split(*newfile,"/")) ) )
+    #*file = (elem((split(*newfile,"/")), (*filepath_element - 1) ))
+    #*data_object = (elem( (split(*newfile,"/")), ( (size((split(*newfile,"/")))) - 1) ))
+    msiSplitPath(*newfile,*collection,*file)
+    
+    # query iRODS db for metadata of file, load into a new key pair structure
+    *query = SELECT META_DATA_ATTR_NAME,META_DATA_ATTR_VALUE WHERE DATA_NAME = '*file' AND COLL_NAME = '*collection'
+    foreach(*row in *query) {
+      #msiPrintKeyValPair("stdout",*row)
+      #writeLine("stdout","next row")
+      msiGetValByKey(*row,"META_DATA_ATTR_NAME",*key);
+      msiGetValByKey(*row,"META_DATA_ATTR_VALUE",*value);
+      msiAddKeyVal(*query_kvp, *key, *value);
+    }
+
+    # create a new 'trimmed' metadata structure including the key pairs to be removed
+    foreach(*i in *query_kvp) {
+      #writeLine("stdout", "key is *i")
+      if (*i == "a" || *i == "art") then {
+        msiGetValByKey(*query_kvp,*i,*result)
+        writeLine("stdout","4: metadata to keep on *newfile, *i=*result")
+      } else {
+        msiGetValByKey(*query_kvp,*i,*result)
+        writeLine("stdout","4: metadata to remove from *newfile, *i=*result")
+        msiAddKeyVal(*new_kvp, *i, *result)        
+      }
+    }
+    # remove key pairs listed in the new metadata structure from the data object 
+    msiRemoveKeyValuePairsFromObj(*new_kvp,*newfile,"-d")
+
+    # create a new kvp structure, add key pairs
+    msiAddKeyVal(*kvp2, "company", "OCF");
+    msiAddKeyVal(*kvp2, "department", "Cloud");
+    msiGetSystemTime(*created_epoc, "unix")
+    msiAddKeyVal(*kvp2, "create_date_epoc", *created_epoc );
+
+    # get system time, load into list and grab elements based on position
+    msiGetFormattedSystemTime(*created,"human","%d-%02d-%02d-%02d-%02d-%02d")
+    writeLine("stdout", "4: year:" ++ " " ++ (elem((split(*created,"-")),0)) )
+    *year = elem((split(*created,"-")),0)
+    *month = elem((split(*created,"-")),1)
+    *day = elem((split(*created,"-")),2)
+    msiAddKeyVal(*kvp2, "create_year", *year );
+    msiAddKeyVal(*kvp2, "create_month", *month );
+    msiAddKeyVal(*kvp2, "create_day", *day );
+
+    # add meta data to the data object; -d file(data object), -C directory(collection)
+    msiAssociateKeyValuePairsToObj(*kvp2,*newfile,"-d");
+
+    # find files with metadata between an epoc date range
+    # supported operators
+    #>=
+    #<=
+    #=
+    #<
+    #>
+    #'1' '100'
+    #
+    # 2020(1575072000) - 2030(1890691200)
+    *query = SELECT DATA_NAME WHERE COLL_NAME = '*collection' AND META_DATA_ATTR_NAME = 'create_date_epoc' AND META_DATA_ATTR_VALUE BETWEEN '01575072000' '01890691200'
+    foreach(*row in *query) {
+      msiGetValByKey(*row,"DATA_NAME",*data_name)
+      writeLine("stdout", "4: file: " ++ "*data_name" ++ " created between 2020 - 2030" )
+    }
+
+  }
+
+  ### msiModAVUMetadata - change metadata directly on the object/collection/resource ###
+
+  # this is new microservice as of version 4.2.7 and easy to use
+  # msiModAVUMetadata allows key, value and unit (AVU) manipulation, much like the imeta icommand
+  # assigning an additional attribute 'unit' to the key pair is useful and can be treated as a secondary value or left empty ""
+  
+  # remove all key pairs directly from the data object
+  msiModAVUMetadata("-d","*newfile","rmw", "%", "%", "%")
+
+  # add new key pair directly to the data object
+  msiModAVUMetadata("-d","*newfile","add", "car", "ford", "string")
+
+  # change value for key directly on the data object
+  msiModAVUMetadata("-d","*newfile","set", "car", "toyoda", "string")
+
+  # remove key pair directly on the data object
+  msiModAVUMetadata("-d","*newfile","rm", "car", "toyoda", "string")
+
+  # wildcard remove key pairs directly on the data object
+  msiModAVUMetadata("-d","*newfile","add", "car", "subaru", "string")
+  msiModAVUMetadata("-d","*newfile","add", "car", "suzuki", "string")
+  msiModAVUMetadata("-d","*newfile","add", "car", "saab", "string")
+  msiModAVUMetadata("-d","*newfile","rmw", "car", "su%", "%")
+  #msiModAVUMetadata("-d","*newfile","rmw", "ca%", "%", "%")
+
+  # add some meta data with arbitrary unit types
+  msiModAVUMetadata("-d","*newfile","add", "catC", "yes", "damage")
+  msiModAVUMetadata("-d","*newfile","add", "price", "1200", "sterling")
+
+  ## searching with metadata
+
+  # search for files in a collection where the key unit matches damage and the key value matches yes, return the filename key value key name with value yes and unit damage 
+  *query = SELECT DATA_NAME,META_DATA_ATTR_NAME WHERE COLL_NAME = '*collection' AND META_DATA_ATTR_UNITS = 'damage' AND META_DATA_ATTR_VALUE like 'y%'
+  foreach(*row in *query) {
+    msiGetValByKey(*row,"DATA_NAME",*target_file)
+    msiGetValByKey(*row,"META_DATA_ATTR_NAME",*damage_type)
+    # search for car key value using the file name 
+    *sub_query = SELECT META_DATA_ATTR_VALUE WHERE COLL_NAME = '*collection' AND DATA_NAME = '*target_file' AND META_DATA_ATTR_NAME = 'car'
+    foreach(*sub_row in *sub_query) {
+      msiGetValByKey(*sub_row,"META_DATA_ATTR_VALUE",*car)
+      # search for the price key value under threshold (string is dynamically evaluated as numeric)
+      *sub_query = SELECT META_DATA_ATTR_VALUE WHERE COLL_NAME = '*collection' AND DATA_NAME = '*target_file' AND META_DATA_ATTR_NAME = 'price' AND META_DATA_ATTR_VALUE < '1201'
+      foreach(*sub_row in *sub_query) {
+        msiGetValByKey(*sub_row,"META_DATA_ATTR_VALUE",*price)
+        #writeLine("stdout", *price)
+      }
+      # if price variable set, its value below 1201
+      if (errorcode(*price) >=0) then {
+        writeLine("stdout","5: writeoff: *damage_type *car £*price")
+      }
+    }
+  }
+
+}
+INPUT null
+OUTPUT ruleExecOut
+
--- a/covid_csv_upload/README.md
+++ b/covid_csv_upload/README.md
@ -0,0 +1,57 @@
+# PHE python script example
+
+See Run phe_upload.py.
+
+- Designed to run on a windows host with Python 3
+- TOML format config file initially acts as a rudimentary ruleset for generating lists of files with metadata to upload to iRODS
+- Windows filesystem creation date is provided as an optional metadata key value pair
+- Add custom key value pairs to the ruleset
+- Generated upload_files.toml list is used for subsequent run of the script, this maybe used as a template to target any files with any key value pairs
+- Where upload_files.tml exists the config file will only be used iRODS connectivity  
+
+## Python iRODS client
+
+> https://github.com/irods/python-irodsclient
+
+## Windows 10 vitual env setup
+
+```
+# setup python venv
+cd C:\PHE_iRODS
+mkdir venv
+python -m venv C:\PHE_iRODS\venv
+
+# enter virtual env
+C:\PHE_iRODS\venv\Scripts\Activate.ps1
+
+# install pip packages
+pip install --upgrade pip
+pip install python-irodsclient
+pip install toml
+
+# to save all pip module requirements
+pip3 freeze > venv-requirements.txt
+
+# to install requirements on a new venv
+pip3 install -r venv-requirements.txt
+
+# exit venv
+deactivate
+```
+
+## Required PIP packages, should already be installed from venv-requirements
+
+```
+pip install python-irodsclient
+pip install toml
+```
+
+## Run phe_upload.py
+
+* run once to create sample config file
+* modify config for your environment
+* run again to produce a sample file list based on your modified config
+* modify the file list - this maybe wiped and used as a template to include files from any local/mounted path, each entry must have a unique numerical top level key, this is arbritrary and no sequence is required
+* run again to upload files
+* a log file is produced for each run of the script, containing skipped files that did not meet validation and failed/sucessful uploads
+* to run the script without confirmation, pass the argument --confirm
--- a/covid_csv_upload/phe_upload.py
+++ b/covid_csv_upload/phe_upload.py
@ -0,0 +1,382 @@
+#import os
+import argparse
+import pathlib
+import toml
+from datetime import datetime
+from irods.session import iRODSSession
+from irods.column import Criterion
+from irods.models import DataObject, Collection
+
+parser = argparse.ArgumentParser(__file__)
+parser.add_argument("--confirm", help="auto continue script when a file entry fails validation", action="store_true")
+args = parser.parse_args()
+configFile = 'config.toml'
+fileList = 'upload_files.toml'
+logFile = str(pathlib.Path(__file__).name.strip('.py') + '.' + datetime.now().isoformat().split(".")[0] + '.log').replace(':','.')
+
+exampleConfig = """\
+# required connect
+[connect]
+iRODS_host = '192.168.150.56'
+iRODS_port = '1247'
+iRODS_user = 'rods'
+iRODS_password = 'password'
+iRODS_zone = 'OCF'
+# required files
+[files]
+  # required at least one entry
+  [files.irods]
+  path = 'C:\PHE_iRODS'
+  extension = [".md", ".exe"]
+  recursive = true
+  collection = '/OCF/home/rods/test'
+    # required for windows_create_date
+    [files.irods.metadata]
+    windows_create_date = true
+      # optional additional metadata, U not required
+      [files.irods.metadata.company]
+      A = 'company'
+      V = 'OCF'
+      U = 'string'
+      [files.irods.metadata.department]
+      A = 'department'
+      V = 'Cloud'\
+"""
+
+def findConfig(exampleConfig, configFile):
+    if not pathlib.Path(configFile).exists():
+        with open(configFile,'w') as config:
+            config.write(exampleConfig)
+        print('config missing, wrote example to ' + configFile + ' please modify this config and re-run')
+        quit()
+    else:
+        try:
+            configDict = toml.load(configFile)
+        except:
+            quit('invalid TOML format: ' + configFile)
+    return configDict
+
+def parseConfig(config):    
+    # check missing entries
+    reqEntry = ['connect', 'files']
+    for i in reqEntry:
+        try:
+            test = config[i]
+        except:
+            quit('missing config entry: ' + i)
+    # check empty entries
+    for i in reqEntry:
+        if not len(config[i]):
+            quit('empty config entry: ' + i)
+    # check connect entry
+    try:
+        iRODS_host = config['connect']['iRODS_host']
+        iRODS_port = config['connect']['iRODS_port']
+        iRODS_user = config['connect']['iRODS_user']
+        iRODS_password = config['connect']['iRODS_password']
+        iRODS_zone = config['connect']['iRODS_zone']
+    except:
+        quit('missing config entry: connect')
+    # check files entries
+    for i in config['files']:
+        # check search and put entries
+        try:
+            filesPath = config['files'][i]['path']
+            filesExtension = config['files'][i]['extension']
+            filesRecursive = config['files'][i]['recursive']
+            filesCollection = config['files'][i]['collection']
+        except:
+            quit('missing config entry in: files.' + i)
+        # check extension is list
+        if type(filesExtension) is not list:
+            quit('extension is not a list entry in: files.' + i + '.extension')
+        # check if recursive is bool    
+        if type(filesRecursive) is not bool:
+            quit('recursive is not a boolean entry in: files.' + i + '.recursive')
+        # check metadata windows_create_date
+        try:
+            files_metadata = config['files'][i]['metadata']
+        except:
+            quit('missing config entry in: files.' + i + '.metadata')   
+        if not len(config['files'][i]['metadata']):
+            quit('empty metadata entry in: files.' + i + '.metadata')
+        try:
+            test = config['files'][i]['metadata']['windows_create_date']
+        except:
+            quit('missing config entry: files.' + i + '.metadata.windows_create_date')
+        if type(config['files'][i]['metadata']['windows_create_date']) is not bool:
+           quit('windows_create_date is not a boolean entry in metadata')        
+        # check metadata contain requisite fields
+        for j in config['files'][i]['metadata']:
+            if j != 'windows_create_date':
+                try:
+                    meta_att = config['files'][i]['metadata'][j]['A']
+                    meta_val = config['files'][i]['metadata'][j]['V']
+                except:
+                    quit('missing config entry in: files.' + i + '.metadata.' + j)            
+
+def findFiles(dir, ext, recursive):
+    files = []
+    for item in pathlib.Path(dir).iterdir():
+        if item.is_dir() and recursive:
+            files = files + findFiles(item.absolute(), ext, recursive)
+        elif item.is_file() and item.suffix in ext:
+            files.append(str(item.absolute()))
+    return(files)
+
+def createFileList(config, fileList):
+    if not pathlib.Path(fileList).exists():
+        files = []
+        with open(fileList, "a") as file_object:
+            output = {}
+            record = 0
+            for i in config['files']:
+                # collect required config params
+                filesPath = config['files'][i]['path']
+                filesExtension = config['files'][i]['extension']
+                filesRecursive = config['files'][i]['recursive']
+                filesCollection = config['files'][i]['collection']
+                # check path exists, may want to check collection exists
+                if not pathlib.Path(filesPath).exists():
+                    quit('invalid path entry in: files.' + i + '.path')            
+                # find all files that match the rules
+                files = files + findFiles(filesPath,filesExtension,filesRecursive)
+                metas = []
+                for j in config['files'][i]['metadata']:
+                    if j != 'windows_create_date':
+                        meta = []
+                        meta.append(config['files'][i]['metadata'][j]['A'])
+                        meta.append(config['files'][i]['metadata'][j]['V'])
+                        if 'U' in config['files'][i]['metadata'][j]:
+                            meta.append(config['files'][i]['metadata'][j]['U'])
+                        metas.append(meta)
+                # update file /rule dict
+                for k in files:
+                    record += 1
+                    # add windows_create_date metadata
+                    if config['files'][i]['metadata']['windows_create_date']:
+                        winMeta = []
+                        localFile = pathlib.Path(k)
+                        win_epoc = str(localFile.stat().st_ctime).split(".")[0]
+                        win_ctime = datetime.fromtimestamp(localFile.stat().st_ctime)
+                        win_date = str(datetime.date(win_ctime))
+                        win_time = str((datetime.time(win_ctime))).split(".")[0]
+                        winMeta = [['date', win_date], ['time', win_time], ['date_epoc', win_epoc], ['year', str(format(win_ctime.year, '04'))], ['month',str(format(win_ctime.month, '02'))], ['day',str(format(win_ctime.day, '02'))]]
+                        newMeta = metas + winMeta
+                        output.update({ str(record): {'file': k, 'collection': filesCollection, 'metadata': newMeta}})
+                    else:
+                        # build file dict { 1: { 'file' : 'C:\\file.py, 'collection': '/OCF/home/rods', 'metadata': [['A','V','U'],['A','V','U']] }}
+                        output.update({ str(record): {'file': k, 'collection': filesCollection, 'metadata': metas}})
+            # write file list as toml to easily be edited manually
+            file_object.write(toml.dumps(output))
+        print('\nfile list did not exist, created ' + str(record) + ' entries: ' + fileList + '\n' + '\ncheck content, add/remove or use as a template for your own file list\n' + '\nrerun this script to continue\n')
+        return 0
+    else:
+        print('\nfile list exists: ' + fileList + '\n')
+        return 1
+        # may want continue confirmation here
+
+def writeLog(message, entry):
+    with open(logFile,'w') as log:
+        log.write(message + toml.dumps(entry))
+    #print(message + toml.dumps(entry))
+
+def getConnect(config):
+    # create iRODS connection session object
+    iRODS_host = config['connect']['iRODS_host']
+    iRODS_port = config['connect']['iRODS_port']
+    iRODS_user = config['connect']['iRODS_user']
+    iRODS_password = config['connect']['iRODS_password']
+    iRODS_zone = config['connect']['iRODS_zone']
+    sessioniRODS = iRODSSession(host=iRODS_host, port=iRODS_port, user=iRODS_user, password=iRODS_password, zone=iRODS_zone)
+    sessioniRODS.connection_timeout = 300
+    return sessioniRODS
+
+def uploadFiles(fileContent, config):
+    failedUpload = []
+    successUpload = []
+    logFailedUpload = {}
+    logSuccessUpload = {}
+    sessioniRODS = getConnect(config)
+    with sessioniRODS as session:
+        for i in fileContent:
+            filePath = fileContent[i]['file']
+            objName = pathlib.Path(filePath).name
+            objCollection = fileContent[i]['collection']
+            objPath = objCollection + '/' + objName
+            metaData = fileContent[i]['metadata']
+            try:
+                session.data_objects.put(filePath, objPath) # upload
+                obj = session.data_objects.get(objPath)
+                for j in metaData:
+                    A = j[0]
+                    V = j[1]
+                    if len(j) > 2:
+                        U = j[2]
+                        obj.metadata.add(A, V, U)
+                    else:
+                        obj.metadata.add(A, V)
+                successUpload.append(i)
+                print('\nuploaded file to iRODS :' + objPath)
+            except:
+                failedUpload.append(i)
+                print('\nfailed to upload file to iRODS :' + objPath)
+            #print(session.get_connection_refresh_time())
+        if len(successUpload):
+            for i in successUpload:
+                logSuccessUpload.update({i:fileContent[i]})
+            writeLog('\n#### iRODS successful upload / metadata tag ####\n\n', logSuccessUpload)
+        if len(failedUpload):
+            for i in failedUpload:
+                logFailedUpload.update({i:fileContent[i]})
+            writeLog('\n#### iRODS failed upload / metadata tag ####\n\n', logFailedUpload)
+
+def prepUploadFiles(fileList, config):
+    # validation / logging function
+    fileContent = toml.load(fileList)
+    sessioniRODS = getConnect(config)
+    with sessioniRODS as session:
+        # check missing iRODS collection / existing iRODS object / missing local directory / missing local file
+        colNotFound = []
+        objFound = []
+        directoryNotFound = []
+        fileNotFound = []
+        for i in fileContent:
+            file_path = fileContent[i]['file']
+            directory = pathlib.Path(file_path).parent
+            obj_name = pathlib.Path(file_path).name
+            obj_collection = fileContent[i]['collection']
+            # check collection exists, check object exists
+            colQuery = session.query(Collection).filter(Criterion('=', Collection.name, obj_collection))
+            colExist = ""
+            for j in colQuery:
+                colExist = j[Collection.name]
+            if not len(colExist):
+                colNotFound.append(i)
+            else:
+                objQuery = session.query(DataObject).filter(Criterion('=', Collection.name, obj_collection)).filter(Criterion('=', DataObject.name, obj_name))
+                objExist = ""
+                for k in objQuery:
+                    objExist = k[DataObject.name]
+                if len(objExist):
+                    objFound.append(i)
+            # check local path exists, check local file exists
+            if not pathlib.Path(directory).exists():
+                directoryNotFound.append(i)
+            elif not pathlib.Path(file_path).exists():
+                fileNotFound.append(i)
+        # lists for possible combinations of rule failure
+        missingColMissingDir = [value for value in colNotFound if value in directoryNotFound]
+        missingColMissingFile = [value for value in colNotFound if value in fileNotFound]
+        existingObjMissingDir = [value for value in objFound if value in directoryNotFound]
+        existingObjMissingFile = [value for value in objFound if value in fileNotFound]
+        # create lists of broken entries and log
+        logMissingColMissingDir = {}
+        logMissingColMissingFile = {}
+        logExistingObjMissingDir = {}
+        logExistingObjMissingFile = {}
+        logColNotFound = {}
+        logObjFound = {}
+        logDirectoryNotFound = {}
+        logFileNotFound = {}
+        failedValidation = 0
+        removeFileContent = []
+        if len(missingColMissingDir):
+            print('\nmissing iRODS collection AND missing local path, check log\n')
+            for l in missingColMissingDir:
+                colNotFound.remove(l)
+                directoryNotFound.remove(l)
+                removeFileContent.append(l)
+                logMissingColMissingDir.update({l:fileContent[l]})
+            writeLog('\n#### Missing iRODS collection AND missing local path ####\n\n', logMissingColMissingDir)
+            failedValidation += 1
+        if len(missingColMissingFile):
+            print('\nmissing iRODS collection AND missing file in local path, check log\n')
+            for m in missingColMissingFile:
+                colNotFound.remove(m)
+                fileNotFound.remove(m)
+                removeFileContent.append(m)
+                logMissingColMissingFile.update({m:fileContent[m]})
+            writeLog('\n#### Missing iRODS collection AND missing file in local path ####\n\n', logMissingColMissingFile)
+            failedValidation += 1
+        if len(existingObjMissingDir):
+            print('\nexisting iRODS object AND missing local path, check log\n')
+            for n in existingObjMissingDir:
+                objFound.remove(n)
+                directoryNotFound.remove(n)
+                removeFileContent.append(n)
+                logExistingObjMissingDir.update({n:fileContent[n]})
+            writeLog('\n#### Existing iRODS object AND missing local path ####\n\n', logExistingObjMissingDir)
+            failedValidation += 1
+        if len(existingObjMissingFile):
+            print('\nexisting iRODS object AND missing file in local path, check log\n')
+            for o in existingObjMissingFile:
+                objFound.remove(o)
+                fileNotFound.remove(o)
+                removeFileContent.append(o)
+                logExistingObjMissingFile.update({o:fileContent[o]})
+            writeLog('\n#### Existing iRODS object AND missing file in local path ####\n\n', logExistingObjMissingFile)
+            failedValidation += 1
+        if len(colNotFound):
+            print('\nmissing iRODS collection, check log\n')
+            for p in colNotFound:
+                removeFileContent.append(p)
+                logColNotFound.update({p:fileContent[p]})
+            writeLog('\n#### Missing iRODS collection ####\n\n', logColNotFound)
+            failedValidation += 1
+        if len(objFound):
+            print('\nexisting iRODS object, check log\n')
+            for q in objFound:
+                removeFileContent.append(q)
+                logObjFound.update({q:fileContent[q]})
+            writeLog('\n#### Existing iRODS object ####\n\n', logObjFound)
+            failedValidation += 1
+        if len(directoryNotFound):
+            print('\nmissing local path, check log\n')
+            for r in directoryNotFound:
+                removeFileContent.append(r)
+                logDirectoryNotFound.update({r:fileContent[r]})
+            writeLog('\n#### Missing local path ####\n\n', logDirectoryNotFound)
+            failedValidation += 1
+        if len(fileNotFound):
+            print('\nmissing file in local path, check log\n')
+            for s in fileNotFound:
+                removeFileContent.append(s)
+                logFileNotFound.update({s:fileContent[s]})
+            writeLog('\n#### Missing file in local path ####\n\n', logFileNotFound)
+            failedValidation += 1
+        # continue
+        if failedValidation:
+            print('\nfailed validation for entry in ' +  fileList + ', the entry will be skipped, check log ' + logFile + '\n')
+        if not args.confirm:
+            print('\n(to avoid this confirmation run the script with the argument \'--confirm\')\n')
+            #answer = raw_input("do you want to continue? y/n").lower() # python2
+            answer = input("\ndo you want to continue? y/n ").lower()
+            while True:
+                if answer == 'y' or answer == 'yes':
+                    break
+                else:
+                    quit()
+        # sort/unique entries and remove from fileContent
+        unique = []
+        if len(removeFileContent):
+            for n in removeFileContent:
+                if n not in unique:
+                    unique.append(n)
+        removeFileContent = sorted(unique)
+        if len(removeFileContent):
+            for o in removeFileContent:
+                fileContent.pop(o)
+        #print(toml.dumps(fileContent))
+        return fileContent
+
+def main():
+    configDict = findConfig(exampleConfig, configFile)
+    parseConfig(configDict)
+    listExist = createFileList(configDict, fileList)
+    if listExist:
+        validFileList = prepUploadFiles(fileList, configDict)
+        uploadFiles(validFileList, configDict)
+
+if __name__ == "__main__":
+    main()
--- a/covid_csv_upload/venv-requirements.txt
+++ b/covid_csv_upload/venv-requirements.txt
--- a/covid_genome_upload/OCF_arctic_nanopolish/arctic_nanopolish_pipeline.sh
+++ b/covid_genome_upload/OCF_arctic_nanopolish/arctic_nanopolish_pipeline.sh
@ -0,0 +1,181 @@
+#!/bin/bash
+
+# refresh iRODS auth token
+#iinit
+
+# parameter validation
+arguments=2
+if [ $# -ne $arguments ]; then
+  echo -e "\e[1;31mUsage: `basename  $0` input.csv /path/to/runname\e[0m"
+  exit
+fi
+if [ ! -f $1 ]; then
+  echo -e "\e[1;31m$1 file not present\e[0m"
+fi
+if [ ! -d $2 ]; then
+  echo -e "\e[1;31m$2 invalid path\e[0m"
+fi
+
+# read script parameters
+csv=$1
+rundir=$(echo $2 | sed 's/\/$//')
+
+# populate some system variables
+host=$(uname -n)
+user=$(whoami)
+
+# create output directories
+mkdir -p $rundir/teams_outputs/consensus $rundir/teams_outputs/bams $rundir/teams_outputs/vcfs $rundir/teams_outputs/pngs $rundir/teams_outputs/con_fastqs $rundir/RUNFORTRANSFER
+
+# populate runname variables, $runname = PHE_PD_20210429_1
+runname=$(echo $rundir | awk -F '/' '{for(i=1; i<=NF; i++) {if($i ~ /PHE_PD_[[:digit:]]{8}_/) print $i}}')
+runorg=$(echo $runname | awk -F '_' '{print $1}')
+rundept=$(echo $runname | awk -F '_' '{print $2}')
+rundate=$(echo $runname | awk -F '_' '{print $3}')
+runnumber=$(echo $runname | awk -F '_' '{print $4}')
+
+# read csv file, write out csv to rundir
+dos2unix -o $csv
+rundir_csv=$rundir/$runname.csv
+while read i
+  do
+    # field validation can be done here (empty fields, column headers etc)
+    barcode=$(printf %02d%s $(echo $i | awk -F ',' '{print $1}')) # pad leading zeros for two digits, 1 = 01
+    sampleid=$(echo $i | awk -F ',' '{print $2}')
+    echo $barcode,$sampleid >> $rundir_csv
+  done < $csv
+
+<<COMMENT
+# start conda virtual environment: artic-ncov2019
+source ~/miniconda3/etc/profile.d/conda.sh
+conda activate artic-ncov2019
+COMMENT
+
+<<COMMENT
+# arctic nanopore function, experiment with parallelization variable for speed, 0=all-jobs-concurrent 1=serial-execution 2+=parallel-execution
+parallel_jobs=4 
+f_arctic_nanopore() {
+   # parallel command will spawn child shells, if the python scripts wont work try including the "start conda virtual environment: arctic-ncov2019" here in the function
+   #
+   Barcode=$1
+   Sample=$2
+   # guess guppyplex outputs .fastq to current directory, guess qc.py outputs $Sample'.pass.vcf.gz' to current directory
+   cd $rundir
+   artic guppyplex --skip-quality-check --min-length 400 --max-length 700 --directory $rundir/fastq_pass/barcode$Barcode --prefix $runname
+   # if function is already running in parallel experiment with --threads, for example it maybe more efficient with a single thread if the system has 4cpu cores and 4 parallel arctic_nanopore functions running
+   artic minion --normalise 200 --threads 4 --scheme-directory ~/artic-ncov2019/primer_schemes --read-file $runname'_barcode'$Barcode'.fastq' --fast5-directory $rundir/fast5_pass --sequencing-summary $rundir/sequencing_summary.txt nCoV-2019/V3 $Sample
+   python /data/scripts/qc.py --nanopore --outfile $rundir/$Sample'.qc.txt' --sample $Sample --ref /data/scripts/NC_045512.2.fa --bam $rundir/teams_outputs/bams/$Sample'.sorted.bam' --fasta $rundir/teams_outputs/consensus/$Sample'.consensus.fasta'
+   # does qc.py have a --vcf switch?
+   mv $Sample'.pass.vcf.gz' $rundir/teams_outputs/vcfs/
+   mkdir -p $rundir/RUNFORTRANSFER/$Sample
+   cp $rundir/teams_outputs/consensus/$Sample'.consensus.fasta' $rundir/RUNFORTRANSFER/$Sample/
+   cp $rundir/teams_outputs/bams/$Sample'.sorted.bam' $rundir/RUNFORTRANSFER/$Sample/
+   mv $runname'_barcode'$Barcode'.fastq' $Sample'.fastq'
+}
+export rundir
+export runname
+export -f f_arctic_nanopore
+# parallel may need full path depending on environment (such as /usr/bin/parallel), it may need to be installed (sudo apt install parallel)
+parallel -a $rundir_csv -j$parallel_jobs -C, f_arctic_nanopore
+COMMENT
+
+<<COMMENT
+#start conda virtual environment: pangolin
+conda activate pangolin
+pangolin --update
+pangolin --version > $rundir/$runname'.pang.version'
+COMMENT
+
+<<COMMENT
+# pangolin function, experiment with parallelization variable for speed, 0=all-jobs-concurrent 1=serial-execution 2+=parallel-execution
+parallel_jobs=4
+f_pangolin() {
+  # parallel command will spawn child shells, if the python scripts wont work try including the "start conda virtual environment: pangolin" here in the function
+  #
+  target=$1
+  # guess pangolin outputs .png to current directory
+  cd $rundir
+  #echo $target
+  pangolin $target --outfile $rundir/$target'.pang.txt'
+}
+export rundir
+export -f f_pangolin
+logdir=/var/log
+#find /var/log -mindepth 1 -maxdepth 1 -type f -name *.log 2>&1 -printf '%f\n' | grep -v "Permission denied" | parallel -j$parallel_jobs f_pangolin
+find $rundir/teams_outputs/consensus/ -mindepth 1 -maxdepth 1 -type f -name *.pang.txt 2>&1 -printf '%f\n' | grep -v "Permission denied" | parallel -j$parallel_jobs f_pangolin
+cat $rundir/*.pang.txt >> $rundir/teams_outputs/$runname'.pang.txt'
+cat $rundir/*.qc.txt >> $rundir/teams_outputs/$runname'.qc.txt'
+mv $rundir/*.png $rundir/teams_outputs/pngs/
+mv $rundir/*.fastq $rundir/teams_outputs/con_fastqs
+COMMENT
+
+<<COMMENT
+# create a collection with metadata tags
+imkdir /PHE/projectXYZ/$runname
+
+# add metadata to the collection
+imeta add -C /PHE/projectXYZ/$runname node $host
+imeta add -C /PHE/projectXYZ/$runname user $user
+imeta add -C /PHE/projectXYZ/$runname run_name $runname
+imeta add -C /PHE/projectXYZ/$runname date_uk $rundate
+dateusa=${rundate:0:4}${rundate:6:2}${rundate:4:2}
+imeta add -C /PHE/projectXYZ/$runname date_usa $dateusa
+barcodes=""
+sampleids=""
+while read i
+  do
+    barcode=$(echo $i | awk -F ',' '{print $1}')
+    sampleid=$(echo $i | awk -F ',' '{print $2}')
+    barcodes="${barcodes}$barcode "
+    sampleids="${sampleids}$sampleid "
+  done < $rundir_csv
+imeta add -C /PHE/projectXYZ/$runname all_barcodes $barcodes
+imeta add -C /PHE/projectXYZ/$runname all_sampleid $sampleids
+# edit permissions
+ichmod read "aperson" /PHE/projectXYZ/$runname
+ichmod read "somegroup" /PHE/projectXYZ/$runname
+
+# commands to find collections based on metadata
+#
+#qu -C run_name like PHE_PD_2021%
+# find all collections in january 
+#qu -C date_usa like 2021%01
+# find all collection with specific date
+#qu -C date_uk = 20210101
+# find all collections performed by host
+#qu -C host = ubuntu.local
+# find collection by sample id
+#qu -C all_sampleid like %sampleXYZ%
+COMMENT
+
+<<COMMENT
+# upload some files to the collection and tag metadata
+# find all directories under RUNFORTRANSFER, these are named by sample id
+# rsync the directories and files to the iRODS storage recursively
+# apply some tags to the directories under RUNFORTRANSFER and for the files under directories under RUNFORTRANSFER/$sampleid
+#
+for i in $(find $rundir/RUNFORTRANSFER/ -mindepth 1 -maxdepth 1 -type d 2>&1 -printf '%f\n' | grep -v "Permission denied")
+  do
+    sampleid=$i
+    barcode=$(awk -F "," -v var=$samplestuff '{if($2 ~ var) print $1}' $rundir_csv)
+    #imkdir /PHE/projectXYZ/$runname/RUNFORTRANSFER/$sampleid
+    irsync -r $rundir/RUNFORTRANSFER/$sampleid i:/PHE/projectXYZ/$runname/RUNFORTRANSFER/
+    imeta add -C /PHE/projectXYZ/$runname/RUNFORTRANSFER/$sampleid sampleid $sampleid
+    imeta add -C /PHE/projectXYZ/$runname/RUNFORTRANSFER/$sampleid barcode $barcode
+    imeta add -C /PHE/projectXYZ/$runname/RUNFORTRANSFER/$sampleid run_name $runname
+    imeta add -d /PHE/projectXYZ/$runname/RUNFORTRANSFER/$sampleid/* sampleid $sampleid
+    imeta add -d /PHE/projectXYZ/$runname/RUNFORTRANSFER/$sampleid/* barcode $barcode
+    imeta add -d /PHE/projectXYZ/$runname/RUNFORTRANSFER/$sampleid/* run_name $runname
+  done
+
+# upload the sequencing_summary.txt and add metadata in one command
+iput -K -V --metadata "run_name;$runname" $rundir/sequencing_summary.txt /PHE/projectXYZ/$runname
+imeta add -d /PHE/projectXYZ/$runname/sequencing_summary.txt date_uk $rundate
+
+# maybe possible to provide list for multiple tags - need access to PHE iRODS to test
+#iput -K -V --metadata "[run_name;$runname,node;$host;date_uk $rundate" $rundir/sequencing_summary.txt /PHE/projectXYZ/$runname
+
+# find all objects with date starting 2021 and get locations of files matching name sequencing_summary.txt
+#qu -d date_uk = 2021% | grep -i sequencing_summary.txt
+COMMENT
+
--- a/covid_genome_upload/OCF_arctic_nanopolish/input.csv
+++ b/covid_genome_upload/OCF_arctic_nanopolish/input.csv
@ -0,0 +1,101 @@
+0,a0
+1,b1
+2,c2
+3,d3
+4,e4
+5,f5
+6,g6
+7,h7
+8,i8
+9,j9
+10,k10
+11,l11
+12,m12
+13,n13
+14,o14
+15,p15
+16,q16
+17,r17
+18,s18
+19,t19
+20,u20
+21,v21
+22,w22
+23,x23
+24,y24
+25,z25
+26,a26
+27,b27
+28,c28
+29,d29
+30,e30
+31,f31
+32,g32
+33,h33
+34,i34
+35,j35
+36,k36
+37,l37
+38,m38
+39,n39
+40,o40
+41,p41
+42,q42
+43,r43
+44,s44
+45,t45
+46,u46
+47,v47
+48,w48
+49,x49
+50,y50
+51,z51
+52,a52
+53,b53
+54,c54
+55,d55
+56,e56
+57,f57
+58,g58
+59,h59
+60,i60
+61,j61
+62,k62
+63,l63
+64,m64
+65,n65
+66,o66
+67,p67
+68,q68
+69,r69
+70,s70
+71,t71
+72,u72
+73,v73
+74,w74
+75,x75
+76,y76
+77,z77
+78,a78
+79,b79
+80,c80
+81,d81
+82,e82
+83,f83
+84,g84
+85,h85
+86,i86
+87,j87
+88,k88
+89,l89
+90,m90
+91,n91
+92,o92
+93,p93
+94,q94
+95,r95
+96,s96
+97,t97
+98,u98
+99,v99
+100,w100
--- a/covid_genome_upload/PHE_arctic_nanopolish/artic_nanopolish_pipeline.sh
+++ b/covid_genome_upload/PHE_arctic_nanopolish/artic_nanopolish_pipeline.sh
@ -0,0 +1,42 @@
+#!/bin/bash
+#2ndApril2021V1.2
+#version 1.2 added unix ending to csv file, pangolin and qc and directory outputs
+dir=$(pwd);
+
+dos2unix $1.'csv';
+source ~/miniconda3/etc/profile.d/conda.sh;
+conda activate artic-ncov2019;
+mkdir teams_outputs;
+mkdir teams_outputs/consensus;
+mkdir teams_outputs/bams;
+mkdir teams_outputs/vcfs;
+mkdir teams_outputs/pngs;
+mkdir teams_outputs/con_fastqs;
+mkdir RUNFORTRANSFER;
+
+while IFS="," read -r Barcode Sample; do echo $Barcode $Sample;
+artic guppyplex --skip-quality-check --min-length 400 --max-length 700 --directory fastq_pass/barcode$Barcode --prefix $1;
+artic minion --normalise 200 --threads 4 --scheme-directory ~/artic-ncov2019/primer_schemes --read-file $1'_barcode'$Barcode'.fastq' --fast5-directory fast5_pass --sequencing-summary $2 nCoV-2019/V3 $Sample;
+python qc.py  --nanopore --outfile $Sample'.qc.txt' --sample $Sample --ref NC_045512.2.fa --bam $Sample'.sorted.bam' --fasta $Sample'.consensus.fasta';
+cp $Sample'.consensus.fasta' teams_outputs/consensus/.;
+cp $Sample'.sorted.bam' teams_outputs/bams/.;
+cp $Sample'.pass.vcf.gz' teams_outputs/vcfs/.;
+mkdir RUNFORTRANSFER/$Sample;
+cp $Sample'.consensus.fasta' RUNFORTRANSFER/$Sample/.;
+cp $Sample'.sorted.bam' RUNFORTRANSFER/$Sample/.;
+mv $1'_barcode'$Barcode'.fastq' $Sample'.fastq';
+done < $1'.csv';
+
+conda activate pangolin;
+pangolin --update;
+pangolin --version > $1'.pang.version';
+for i in *.consensus.fasta;
+do 
+pangolin $i --outfile $i'.pang.txt';
+done; 
+cat *.pang.txt >> $1'.pang.txt';
+cat *.qc.txt >> $1'.qc.txt';
+cp *.png teams_outputs/pngs/.;
+mv $1'.pang.txt' teams_outputs/.;
+mv $1'.qc.txt' teams_outputs/.;
+mv *.fastq teams_outputs/con_fastqs/.;
--- a/covid_genome_upload/README.md
+++ b/covid_genome_upload/README.md
@ -0,0 +1,57 @@
+# Arctic nanopolish
+
+This is a modified bash script to upload fast5/fastq results into iRODS.
+Most sections are commented as it has not been tested on the GRID genome sequencing machines.
+This is an example based on the customers existing script and an .odt document outlining the workflow.
+ 
+## Install iRODS client on GRID server, icommands are used in script
+
+>  https://packages.irods.org/ setup package repositories
+>  https://github.com/irods/irods_client_icommands do not follow this use for client host reference
+>  sudo apt-get install irods-icommands irods-dev irods-runtime
+
+- The irods client will require a service account for this host, the host cannot join the domain via SSSD as it is sold as an appliance and updated by the vendor under a service contract. 
+- The irods config will require a resource for the data, the resource is loosely a network disk
+- The irods config will require a top level collection, this is akin to a directory and can have permissions granted recursively for whomever requires access to the data
+- Data objects (files) maybe uploaded to the collection and then tagged with metadata or can be tagged on upload with metadata using the iput command
+
+Sample client config file follows.
+In this case the user_name is an LDAP user (windows active directory) authenticated systemwide via the pam auth stack.
+A local to iRODS service account and password pair will likely reside within this file with irods_authentication_scheme set to native.
+
+```
+[toby.seed@phe.gov.uk@smedmaster02 ~]$ cat << EOF > ~/.irods/irods_environment.json
+
+{
+    "irods_host": "irodscol01.unix.phe.gov.uk",
+    "irods_port": 1247,
+    "irods_user_name": "toby.seed@phe.gov.uk",
+    "irods_zone_name": "PHE",
+    "irods_default_resource": "s3_compound",
+    "irods_authentication_scheme": "PAM"
+}
+```
+
+With a working config the client will be authenticated against the iRODS server with the `iinit` command and checked with the `ienv` command.
+Depending on the irods server configuration the token may last for up to two weeks, it maybe necessary to ensure the bash_rc login script runs iinit on login or to be atomic, run iinit at the top of the various workload scripts.
+
+## rough requirements to test in a live environment
+
+- a resource
+- a service account for this host
+- network connectivity to the target irods server @ tcp 1247
+- a top level collection with some recursive permissions for users requiring access fast5/fastq data
+
+## Using iCommands to upload files to iRODS
+
+> resource ~ network disk
+> collection ~ directory
+> object ~ file
+
+- generally you would create a collection on your resource to put your files
+- it is likely you would create a collection (runnameXYZ) within a collection that already has recursive permissions for a group of users; /PHE/projectXYZ/runnameXYZ
+- the iput command will push objects, it may also push collections recursively, irsync is much preferred for this task to ensure data integrity
+- the imeta command will list/add/remove metadata for collections and objects that have been uploaded
+- the ils command will list file attributes and permissions
+- the irm command will remove files from the irods storage
+- https://docs.irods.org/master/icommands/user/
--- a/covid_genome_upload/SOP_artic_nanopolish_v1.2.odt
+++ b/covid_genome_upload/SOP_artic_nanopolish_v1.2.odt
--- a/documentation/1)
+++ b/documentation/1)
@ -0,0 +1,669 @@
+iRODS server setup
+
+# iRODS test server
+
+This is a sample virtual machine with an iRODS server instance and S3 compatible storage.
+
+## VM test server
+
+4 cpu
+4GB RAM
+40GB disk
+hostname irods.local
+user irods
+pass password
+ip 192.168.150.56 (should be a static ip to avoid hostname/domain/ssl type issues, millage may vary, virtualbox/hyperv NAT network maybe help)
+os Ubuntu 18.04
+
+## iRODS server installation
+
+> https://docs.irods.org/4.2.8/getting_started/installation/
+
+### Server prep
+
+```
+sudo apt-get install wget ca-certificates
+```
+
+### Netplan
+
+Static ip required for minio s3 connectivity from irods.
+
+```
+sudo nano -cw /etc/netplan/01-netcfg.yaml
+
+network:
+  version: 2
+  renderer: networkd
+  ethernets:
+      eth0:
+        addresses:
+            - 192.168.150.56/24
+        gateway4: 192.168.150.1
+        nameservers:
+            search: [local]
+            addresses: [192.168.150.1]
+            
+sudo netplan apply
+```
+
+### Sudoers
+
+Setup /etc/sudoers to include:
+
+```
+irods   ALL=(ALL) NOPASSWD:ALL
+```
+
+### Ubuntu software repo
+
+```
+wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add -
+echo "deb [arch=amd64] https://packages.irods.org/apt/ $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/renci-irods.list
+sudo apt-get update
+```
+
+### Setup Postgresql database, install iRODS server
+
+Install postgress packages.
+
+```
+sudo apt-get install postgresql-10 postgresql-client-10 postgresql-client-common postgresql-common postgresql-doc-10 postgresql-contrib
+```
+
+Create the iRODS user and database.
+
+```
+irods@iRODS:~$ sudo su - postgres
+postgres@iRODS:~$ psql
+psql (10.16 (Ubuntu 10.16-0ubuntu0.18.04.1))
+Type "help" for help.
+
+postgres=# CREATE USER irods WITH PASSWORD 'password';
+CREATE ROLE
+postgres=# CREATE DATABASE "ICAT";
+CREATE DATABASE
+postgres=# GRANT ALL PRIVILEGES ON DATABASE "ICAT" TO irods;
+GRANT
+postgres=# \l
+                                  List of databases
+   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
+-----------+----------+----------+-------------+-------------+-----------------------
+ ICAT      | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =Tc/postgres         +
+           |          |          |             |             | postgres=CTc/postgres+
+           |          |          |             |             | irods=CTc/postgres
+ postgres  | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
+ template0 | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres          +
+           |          |          |             |             | postgres=CTc/postgres
+ template1 | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres          +
+           |          |          |             |             | postgres=CTc/postgres
+(4 rows)
+
+postgres=# \q
+postgres@iRODS:~$ exit
+logout
+```
+
+Install irods server, irods-dev is required for any msi* microservice (the built-in functions called by rules)
+
+```
+irods@iRODS:~$ sudo apt-get install irods-server irods-database-plugin-postgres irods-dev
+```
+
+Run the iRODS installer script, this will be a provider setup to ensure all roles and storage are provided locally.
+Most defaults are fine for this installation, enter the following at the installer prompts.
+
+|     |     |
+| --- | --- |
+| ODBC driver | Unicode |
+| postgres user | irods |
+| postgres user password | password |
+| salt | randomsalt |
+| Zone name | OCF |
+| iRODS server's zone key | randomzonekey |
+| iRODS server's negotiation key | Is8710XeBKO26xlhCN5TcPfkE9qHJMDu |
+| Control Plane key | ohyHLaej2s9CWMTUq5AiQ37DBYdIE6wG |
+| iRODS server's administrator password | password |
+
+```
+irods@iRODS:~$ sudo python /var/lib/irods/scripts/setup_irods.py
+```
+
+Some of the script output that maybe useful.
+
+```
+iRODS server's role: 1. provider
+Database Type: postgres
+ODBC Driver:   PostgreSQL Unicode
+Database Host: localhost
+Database Port: 5432
+Database Name: ICAT
+Database User: irods
+
+Zone name:                  OCF
+iRODS server port:          1247
+iRODS port range (begin):   20000
+iRODS port range (end):     20199
+Control plane port:         1248
+Schema validation base URI: file:///var/lib/irods/configuration_schemas
+iRODS server administrator: rods
+```
+
+```
+irods@iRODS:~$ ienv
+irods_version - 4.2.8
+irods_client_server_negotiation - request_server_negotiation
+irods_server_control_plane_key - ohyHLaej2s9CWMTUq5AiQ37DBYdIE6wG
+irods_server_control_plane_port - 1248
+irods_client_server_policy - CS_NEG_REFUSE
+irods_cwd - /tempZone/home/rods
+irods_connection_pool_refresh_time_in_seconds - 300
+irods_encryption_key_size - 32
+irods_default_hash_scheme - SHA256
+irods_environment_file - /home/irods/.irods/irods_environment.json
+irods_default_number_of_transfer_threads - 4
+irods_encryption_algorithm - AES-256-CBC
+irods_encryption_salt_size - 8
+schema_version - v3
+irods_home - /tempZone/home/rods
+irods_encryption_num_hash_rounds - 16
+irods_default_resource - demoResc
+irods_match_hash_policy - compatible
+irods_maximum_size_for_single_buffer_in_megabytes - 32
+irods_session_environment_file - /home/irods/.irods/irods_environment.json.1692
+irods_port - 1247
+irods_server_control_plane_encryption_algorithm - AES-256-CBC
+schema_name - irods_environment
+irods_server_control_plane_encryption_num_hash_rounds - 16
+irods_user_name - rods
+irods_zone_name - tempZone
+irods_transfer_buffer_size_for_parallel_transfer_in_megabytes - 4
+irods_host - iRODS
+```
+
+Change the iRODS administrator account password, the account 'rods' with password 'rods' with permission 'rodsadmin'.
+After the password change re-run iinit to login.
+
+```
+irods@iRODS:~$ iadmin moduser rods password password
+irods@iRODS:~$ iinit
+Enter your current iRODS password:
+irods@iRODS:~$ ils
+irods@iRODS:~$ ilsresc
+```
+
+### Edit irods_environment.json
+
+Whilst almost all commands will function correctly with the default irods_host target being the localhost name, change this to the FQDN/IP of the host to emulate external administration/configuration to avoid weird gotchas.
+
+```
+nano -cw ~/.irods/irods_environment.json
+
+irods_host - 192.168.150.56
+
+iinit
+```
+
+### Change the Zone name
+
+This doesnt work with the default zone only ones you subsequently create it woudl seem?
+
+irods@iRODS:~$ iadmin lz
+tempZone
+irods@iRODS:~$ iadmin modzone tempZone name OCF
+
+### Add a new user
+
+Add a new user (tseed) with password (password).
+
+```
+irods@iRODS:~$ iadmin mkuser tseed rodsuser
+irods@irods:~$ iadmin lu
+rods#OCF
+tseed#OCF
+irods@iRODS:~$ iadmin help moduser
+irods@iRODS:~$ iadmin moduser tseed#OCF password password
+```
+
+## Minio Install
+
+### Install docker
+
+```
+sudo apt-get remove docker docker-engine docker.io containerd runc
+sudo apt-get update
+sudo apt-get install \
+    apt-transport-https \
+    ca-certificates \
+    curl \
+    gnupg-agent \
+    software-properties-common
+curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
+sudo apt-key fingerprint 0EBFCD88
+sudo add-apt-repository \
+   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
+   $(lsb_release -cs) \
+   stable"
+sudo apt-get update
+sudo apt-get install docker-ce docker-ce-cli containerd.io
+sudo systemctl start docker
+sudo systemctl enable docker
+sudo usermod -G sudo,docker irods
+#logout/login to update access to docker socket
+docker run hello-world
+```
+
+### Install docker compose
+
+```
+sudo curl -L "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
+sudo chmod +x /usr/local/bin/docker-compose
+```
+
+### Setup Minio S3 service
+
+Create directories.
+
+```
+mkdir ~/minio
+sudo mkdir /opt/minio_data
+```
+
+Create minio compose file and run.
+
+```
+cd ~/minio
+sudo nano -cw ~/minio/docker-compose.yml
+
+version: "3"
+services:
+  minio:
+    image: minio/minio:latest
+    container_name: irods-minio
+    environment:
+      - MINIO_ACCESS_KEY=minio
+      - MINIO_SECRET_KEY=password
+    command: server /data1
+    volumes:
+      - /opt/minio_data:/data1
+    ports:
+      - "9000:9000/tcp"
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
+      interval: 30s
+      timeout: 20s
+      retries: 3
+  miniomc:
+    links:
+      - minio:minio-container
+    image: minio/mc:latest
+    container_name: irods-miniomc
+    volumes:
+      - ./createbucket.sh:/tmp/createbucket.sh
+    entrypoint: /bin/sh /tmp/createbucket.sh minio-container irods minio password
+```
+
+Create miniomc bucket create script.
+
+```
+nano -cw ~/minio/createbucket.sh
+
+#!/bin/bash
+
+# params
+target=$1
+directory=$2
+directory_user=$2
+user=$3
+password=$4
+
+# vars
+mc=/usr/bin/mc
+
+# now rhel8 container with new package manager, install netcat
+microdnf install nc
+
+# wait for minio container
+until nc -z $target 9000
+do
+  echo "waiting for minio"
+  sleep 1
+done
+sleep 2
+
+# auth
+$mc config host add minioinstance http://$target:9000 $user $password
+sleep 2
+
+setup () {
+  # create bucket
+  $mc mb minioinstance/$directory
+  # allow public read
+  $mc policy set download minioinstance/$directory
+  # add user
+  $mc admin user add minioinstance $directory_user $password
+  # define policy
+cat > /tmp/bucketallaction.json << EOF
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Action": ["s3:*"],
+      "Effect": "Allow",
+      "Resource": ["arn:aws:s3:::$directory_user/*"]
+    }
+  ]
+}
+EOF
+  # create policy
+  $mc admin policy add minioinstance bucketallaction /tmp/bucketallaction.json
+  # attach policy to user
+  $mc admin policy set minioinstance bucketallaction user=$directory_user
+}
+
+# create bucket
+if $mc ls minioinstance/$directory 2>/dev/null
+  then
+    echo "minioinstance/"$directory" exists"
+  else
+    setup
+    echo "minioinstance/"$directory" created"
+fi
+
+# destroy bucket
+#/usr/bin/mc rm -r --force minioinstance/$directory
+
+exit 0
+```
+
+Start the minio service, once the compose file is run the minio bucket and user has been created.
+
+```
+cd ~/minio
+docker-compose up
+docker-compose down
+```
+
+### Create systemd unit files to start minio container on boot.
+
+Find the docker run command from a running container started by compose.
+
+```
+docker run --rm -v /var/run/docker.sock:/var/run/docker.sock assaflavie/runlike irods-minio
+```
+
+Take any parameters that have been specified in the compose file, effectively any parameter that isnt auto populated, this will look like the following:
+
+```
+docker stop irods-minio
+docker rm irods-minio
+
+# if the following command runs correctly populate the systemd unit file accordingly.
+
+docker run --name=irods-minio --env=MINIO_ACCESS_KEY=minio --env=MINIO_SECRET_KEY=password --volume=/opt/OCF_s3:/data1:rw -p 9000:9000 --restart=unless-stopped --detach=true minio/minio:latest server /data1
+```
+
+NOTE: if the /opt/minio_data directory is wiped the minio config will also be wiped and the compose file will need to be rerun to create the irods bucket and user.
+
+To ensure systemd can start the container ensure the --detach parameter is not used, the container restarts must be controlled by systemd and not the docker daemon.
+
+```
+sudo nano -cw /etc/systemd/system/minio.service
+
+[Unit]
+Description=minio
+Wants=network-online.target
+After=network.target network-online.target
+[Service]
+TimeoutStartSec=30
+Restart=always
+ExecStartPre=-/usr/bin/docker kill irods-minio
+ExecStartPre=-/usr/bin/docker rm irods-minio
+ExecStartPre=/usr/bin/docker pull docker.io/minio/minio:latest
+ExecStart=/usr/bin/docker run \
+        --name=irods-minio \
+        --env=MINIO_ACCESS_KEY=minio \
+        --env=MINIO_SECRET_KEY=password \
+        --volume=/opt/minio_data:/data1:rw \
+        -p 9000:9000 \
+        minio/minio:latest server /data1
+ExecStop=/usr/bin/docker stop irods-minio
+[Install]
+WantedBy=multi-user.target
+
+sudo systemctl daemon-reload
+systemctl list-unit-files | grep -i minio
+sudo systemctl start minio
+systemctl status minio
+sudo systemctl enable minio
+```
+
+## iRODS Resources
+
+### Log location
+
+> /var/lib/irods/log/
+
+### Add local filesystem Resource
+
+There needs to be a default resource on the service, it shouldnt be the default demo resource.
+
+NOTE: dont set the host as localhost, external users will have an issue resolving a remote host.
+
+```
+sudo mkdir /opt/OCFResc
+sudo chown -R irods.irods /opt/OCFResc
+iadmin mkresc OCFResc unixfilesystem $(hostname):/opt/OCFResc
+ilsresc
+#iadmin rmresc OCFResc
+nano -cw ~/.irods/irods_environment.json
+# change "irods_default_resource": "demoResc", to "irods_default_resource": "OCFResc",
+iinit
+```
+
+### Add new S3 Resource, this is a cached setup where files a local directory is sync'd with s3
+
+> https://docs.irods.org/4.2.0/plugins/composable_resources/
+> [https://github.com/irods/irods\_resource\_plugin_s3](https://github.com/irods/irods_resource_plugin_s3)
+
+```
+irods@iRODS:~$ sudo apt-get install irods-resource-plugin-s3
+```
+
+This is odd, unsure why this worked and previous didnt, maybe its because of using locally resolved hostname instead of ip, maybe it is due to /irods/Vault being one level deep vs two?
+S3\_DEFAULT\_HOSTNAME=$(hostname):9000 will work for a cached compound s3 resource but not for a cacheless resource?, S3\_DEFAULT\_HOSTNAME is probably best as the ip of the s3 compatible storage.
+
+```
+sudo mkdir /opt/iRODS_s3_cache /opt/iRODS_s3_cred
+sudo nano -cw /opt/iRODS_s3_cred/minio.keypair
+
+irods
+password
+
+iadmin mkresc compResc compound
+iadmin mkresc cacheResc unixfilesystem $(hostname):/opt/iRODS_s3_cache
+iadmin mkresc archiveResc s3 $(hostname):/irods/Vault "S3_DEFAULT_HOSTNAME=$(hostname):9000;S3_AUTH_FILE=/opt/iRODS_s3_cred/minio.keypair;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP"
+iadmin addchildtoresc compResc cacheResc cache
+iadmin addchildtoresc compResc archiveResc archive
+iput -R compResc docker-compose.yml
+```
+
+To remove this config:
+
+```
+irods@irods:~/minio$ iadmin rmchildfromresc compResc archiveResc
+irods@irods:~/minio$ iadmin rmresc archiveResc
+irods@irods:~/minio$ iadmin rmchildfromresc compResc  cacheResc
+irods@irods:~/minio$ iadmin rmresc cacheResc
+irods@irods:~/minio$ iadmin rmresc compResc
+irods@irods:~/minio$ ilsresc
+demoResc:unixfilesystem
+```
+
+If there are files on the compound resource children you need to a combination of commands to clear it for deletion:
+
+```
+iadmin rmchildfromresc compResc archiveResc
+iadmin rmresc archiveResc
+error -835000 CAT_RESOURCE_NOT_EMPTY
+iadmin rmchildfromresc compResc  cacheResc
+iadmin rmresc cacheResc
+error -835000 CAT_RESOURCE_NOT_EMPTY
+# re-add both children back to compound resource and clear out
+iadmin addchildtoresc compResc cacheResc cache
+iadmin addchildtoresc compResc archiveResc archive
+nano -cw ~/.irods/irods_environment.json
+# change "irods_default_resource": "demoResc", to "irods_default_resource": "compResc",
+iinit
+icd
+ils
+irm -r -f *
+ils
+itrim -vrMN 1 -S cacheResc /OCF
+irmtrash -f
+nano -cw ~/.irods/irods_environment.json
+# change "irods_default_resource": "compResc", to "irods_default_resource": "demoResc",
+iinit
+iadmin rmchildfromresc compResc archiveResc
+iadmin rmresc archiveResc
+iadmin rmchildfromresc compResc  cacheResc
+iadmin rmresc cacheResc
+iadmin rmresc compResc
+```
+
+### Add new S3 Resource in cacheless mode, write direct to the bucket (great for local s3 service)
+
+Create a path under the bucket to replicate a sub folder.
+
+```
+docker run -it --entrypoint=/bin/sh minio/mc
+mc config host add minioinstance http://192.168.150.56:9000 irods password
+mc mb minioinstance/irods/Data
+exit
+```
+
+Create a resource, this could be created as a compound resource as in the previous example, or standalone and set a replica ruleset to duplictae files to it or even use the s3 resource as the sole resource. A cache directory is still required however this only holds transient data. An s3 resource cannot be mounted/bound at a certain location/collection in a directory structure.
+
+```
+sudo mkdir /opt/iRODS_s3_cred /opt/iRODS_s3_cache
+sudo nano -cw /opt/iRODS_s3_cred/minio.keypair
+
+irods
+password
+
+iadmin mkresc OCFs3Resc s3 $(hostname):/irods/Data "S3_DEFAULT_HOSTNAME=192.168.150.56:9000;S3_AUTH_FILE=/opt/iRODS_s3_cred/minio.keypair;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_attached;S3_SIGNATURE_VERSION=4;S3_CACHE_DIR=/opt/iRODS_s3_cache"
+ilsresc -l
+ilsresc -l OCFs3Resc
+iput -R OCFs3Resc docker-compose.yml
+irm docker-compose.yml
+
+# remove, might want to remove and recreate before any iput commands to ensure there is no directory structure
+#irmtrash
+#iadmin rmresc OCFs3Resc
+```
+
+## Removing resources
+
+#### list resources
+
+irods@irods:~/minio$ ilsresc
+compResc:compound
+├── archiveResc:s3
+└── cacheResc:unixfilesystem
+demoResc:unixfilesystem
+
+irods@irods:~/minio$ ilsresc compResc
+compResc:compound
+├── archiveResc:s3
+└── cacheResc:unixfilesystem
+
+#### iadmin help
+
+iadmin help
+
+#### admin list resources
+
+irods@irods:~/minio$ iadmin lr
+bundleResc
+demoResc
+compResc
+cacheResc
+archiveResc
+
+irods@irods:~/minio$ iadmin lr demoResc
+resc_id: 10014
+resc_name: demoResc
+zone_name: OCF
+resc\_type\_name: unixfilesystem
+resc_net: irods.local
+resc\_def\_path: /var/lib/irods/Vault
+free_space:
+free\_space\_ts: Never
+resc_info:
+r_comment:
+resc_status:
+create_ts: 2021-05-11.13:11:07
+modify_ts: 2021-05-11.13:11:07
+resc_children:
+resc_context:
+resc_parent:
+resc_objcount: 0
+resc\_parent\_context:
+
+irods@irods:~/minio$ iadmin lr compResc
+resc_id: 10019
+resc_name: compResc
+zone_name: OCF
+resc\_type\_name: compound
+resc\_net: EMPTY\_RESC_HOST
+resc\_def\_path: EMPTY\_RESC\_PATH
+free_space:
+free\_space\_ts: Never
+resc_info:
+r_comment:
+resc_status:
+create_ts: 2021-05-11.13:49:51
+modify_ts: 2021-05-11.13:49:51
+resc_children:
+resc_context:
+resc_parent:
+resc_objcount: 0
+resc\_parent\_context:
+
+#### admin remove resource (from a compound)
+
+Find any child resource.
+
+irods@irods:~/minio$ ilsresc compResc
+compResc:compound
+├── archiveResc:s3
+└── cacheResc:unixfilesystem
+
+Remove child resources.
+
+iadmin rmchildfromresc compResc archiveResc
+iadmin rmchildfromresc compResc cacheResc
+
+Remove resources that were used to create the compound resource.
+
+iadmin rmresc archiveResc
+iadmin rmresc cacheResc # this threw an error
+iadmin rmresc compResc
+
+to remove cacheResc, i had to switch the env file to use default resource cacheResc:
+
+nano -cw ~/.irods/irods\_environment.json # "irods\_default\_resource": "demoResc", to "irods\_default_resource": "cacheResc",
+iinit
+
+run a selection of commands to check if the resource is empty and clean it up if not, ready for deletion:
+
+iquest "SELECT DATA\_NAME WHERE DATA\_RESC_NAME like 'cache%'"
+iquest "SELECT COLL\_NAME WHERE DATA\_RESC_NAME = 'cacheResc'"
+irmtrash -f
+itrim -vrMN 1 -S cacheResc /OCF
+irm -r -f /OCF
+
+After this switched back to "irods\_default\_resource": "demoResc", and removed the resource:
+
+iadmin rmresc cacheResc
--- a/documentation/2)
+++ b/documentation/2)
@ -0,0 +1,48 @@
+iRODS cyberduck
+
+# iRODS cyberduck
+
+This is really convienient but doesnt offer control over metadata, very useful for testing rulesets and iROD client scripts.
+
+Irods server should have an SSL certificate and all clients validate either via public CA or local CA with updated CA certs.
+To allow cyberduck to use not validate the irods cert as on our test virtual server (or something akin to this - maybe its even client auth?) a server side setting can be changed to accomodate an HTTP connection instead of only the default HTTPS connection:
+
+```
+nano -cw /etc/irods/core.re
+
+#acPreConnect(*OUT) { *OUT="CS_NEG_DONT_CARE"; }
+acPreConnect(*OUT) { *OUT="CS_NEG_REFUSE"; }
+```
+
+Cyberduck needs a profile in C:\Program Files\Cyberduck\profiles with minimally the Zone/Resource configured.
+
+```
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+<plist version="1.0">
+    <dict>
+        <key>Protocol</key>
+        <string>irods</string>
+        <key>Vendor</key>
+        <string>iRODS</string>
+        <key>Description</key>
+        <string>iRODS Cyberduck Bookmark</string>
+        <key>Hostname Configurable</key>
+        <true/>
+        <key>Port Configurable</key>
+        <true/>
+        <key>Default Hostname</key>
+        <string>192.168.150.56</string>
+        <key>Region</key>
+        <string>OCF:OCFs3Resc</string>
+        <key>Default Port</key>
+        <string>1247</string>
+        <key>Username rods</key>
+        <string>iRODS username</string>
+        <key>Password password</key>
+        <string>iRODS password</string>
+    </dict>
+</plist>
+```
+
+![77dbac55ba2e38e955cc830331391f28.png](resources/45fe7dfd0d8c482fae138bada7c94138.png)
--- a/documentation/3)
+++ b/documentation/3)
@ -0,0 +1,325 @@
+Using the native rule engine
+
+## Using the native rule engine
+
+> https://docs.irods.org/4.2.8/plugins/pluggable_rule_engine/
+> https://docs.irods.org/4.1.5/manual/rule_language/
+> https://docs.irods.org/4.2.8/doxygen/ # list of microservices
+> https://groups.google.com/g/irod-chat # user group with many examples
+> https://groups.google.com/g/irod-chat/c/ECt5oWSA978/m/hCtoKUjrBQAJ # dynamic PEP - new way of invoking rules (not tried)
+
+List rule engines installed:
+
+```
+tseed@tseed-irods:~$ irule -a
+Level 0: Available rule engine plugin instances:
+        irods_rule_engine_plugin-python-instance
+        irods_rule_engine_plugin-irods_rule_language-instance
+        irods_rule_engine_plugin-cpp_default_policy-instance
+```
+
+NOTE: To test these examples it will be neccessary to copy the scripts to a text editor that handles unix end of line characters, then copy/paste to the script file on the linux host.
+
+## simple client side rule, run from desktop
+
+NOTE: if you have installed the python rule engine as the first engine in the server.json schema you must select the rule engine to use in irule (not sure if there is a client side way to set the which rule engine to use as default?)
+
+```
+tseed@tseed-irods:~$ cat test.r
+main() {
+  writeLine("stdout", "Success!");
+}
+OUTPUT ruleExecOut
+
+tseed@tseed-irods:~$ irule -r irods_rule_engine_plugin-irods_rule_language-instance -F test.r
+Success!
+```
+
+## sample client side rule to illustrate metadata manipulation
+
+This example shows most of the methods to manipulate metadata in the native rule engine, it is designed to run on the client but most of the inbuilt microservices (functions) will run server side, there are some exceptions listed in the doxygen documentation. 
+
+```
+tseed@tseed-irods:~$ cat test-meta.r
+
+main() {
+  #### metadata examples ####
+
+  # before version 4.2.7 several microservices(functions) are required achieve metadata manipulation
+  # generally data structures of key pairs are created, applied or removed from the data-object/collection/resource
+  # from version 4.2.7 the msiModAVUMetadata microservice greatly simplifies metadata transformation
+  # for any version - retrieve metadata with sql like queries against the iRODS database, this is equivalent to the 'imeta ls' command
+
+  ### metadata structures
+
+  # sample metadata in the % delimted format for msiString2KeyValPair
+  *str = "a=10%zebra=horse%hula=dance"
+
+  # load kvp structure from string
+  msiString2KeyValPair(*str, *kvp);
+
+  # add some keys with values
+  msiAddKeyVal(*kvp, "art", "crayon");
+  msiGetSystemTime(*Time, "human");
+  msiAddKeyVal(*kvp, "timestamp", *Time);
+  
+  # print all keys and values
+  msiPrintKeyValPair("stdout", *kvp);
+  writeKeyValPairs("stdout", *kvp, " || ")
+
+  # print all keys
+  foreach(*kvp) {
+    writeLine("stdout", *kvp);
+  }
+
+  # print all keys with values
+  foreach(*i in *kvp) {
+    msiGetValByKey(*kvp,*i,*result);
+    writeLine("stdout", "1: *i = *result"); 
+  }
+
+  # print key=value where key like zeb*
+  # print key=value where value matches *Time (set with msiGetSystemTime above)   
+  # print key=value where string value matches 10, convert to int and divide
+  foreach(*i in *kvp) {
+    msiGetValByKey(*kvp,*i,*result);
+    if (*i like "zeb*") then {
+      writeLine("stdout","2: *i = *result")
+    } else if (*result == *Time) then {
+      writeLine("stdout","2: *i = *result")
+    } else if (*result == "10") then {
+      *number=int("*result")/2
+      writeLine("stdout","2: *i = *number")
+    } else {
+      writeLine("stdout","2: no match")
+    }
+  }
+
+  # more conditional behaviour
+  foreach(*i in *kvp) {
+    msiGetValByKey(*kvp,*i,*result);
+    #if (*result == "dance" || *i == "art" || *i == "zebra") then { # this is a valid condition, however multiple OR with != / "not like" operators are not pre-evaluated correctly
+    if (*i == "a" && *result == str(10)) then { # must preceed the else statement or will not be matched
+      writeLine("stdout","3: AND *i = *result")
+    } else if (*result not like "dance") then {
+      writeLine("stdout","3: *i = *result")
+    }
+  } 
+
+  ## Add/Remove metadata for files(data objects) or directories(collections) ##
+ 
+  # print all session variables (rei) with msiGetSessionVarValue, when running the rule locally without script parameters the only useful variables are userNameClient= / rodsZoneClient=
+  # when run from the server rules engine, there are many more useful session variables
+  #msiGetSessionVarValue("all", "client");
+
+  # access the variables as $<variable name>
+  *testpath = "/$rodsZoneClient/home/$userNameClient"
+  *newfile = "/$rodsZoneClient/home/$userNameClient/test.txt"
+  
+  # test for valid path, errorcode microservice is used to ensure the script does not exit on failure instead return boolean
+  *a = errorcode(msiObjStat(*testpath,*status)) >=0
+  writeLine("stdout","4: collection exists: *testpath *a")
+
+  # if path exists add/remove metadata
+  if (errorcode(msiObjStat(*testpath,*status)) >=0) then {
+    # remove file without sending to trash and unregister from database
+    if (errorcode(msiObjStat(*newfile,*status)) >=0) then {
+      msiDataObjUnlink("objPath=*newfile++++forceFlag=++++unreg=",*status)
+      writeLine("stdout","4: file removed: *newfile")
+    }
+
+    # create a file, forceFlag attribute required to overwrite file, resource can be specified here, note the field delimiter "destRescName=demoResc++++forceFlag="
+    *content = "test.txt content"
+    msiDataObjCreate(*newfile,"forceFlag=",*file_descriptor)
+    msiDataObjWrite(*file_descriptor,*content,*write_length)
+    msiDataObjClose(*file_descriptor,*status)
+    writeLine("stdout","4: file created: *newfile")
+
+    # apply metadata to object from kvp structure
+    msiAssociateKeyValuePairsToObj(*kvp,*newfile,"-d")
+
+    # get data object and collection from a full path string
+    #*filepath_element = ( size( (split(*newfile,"/")) ) )
+    #*file = (elem((split(*newfile,"/")), (*filepath_element - 1) ))
+    #*data_object = (elem( (split(*newfile,"/")), ( (size((split(*newfile,"/")))) - 1) ))
+    msiSplitPath(*newfile,*collection,*file)
+    
+    # query iRODS db for metadata of file, load into a new key pair structure
+    *query = SELECT META_DATA_ATTR_NAME,META_DATA_ATTR_VALUE WHERE DATA_NAME = '*file' AND COLL_NAME = '*collection'
+    foreach(*row in *query) {
+      #msiPrintKeyValPair("stdout",*row)
+      #writeLine("stdout","next row")
+      msiGetValByKey(*row,"META_DATA_ATTR_NAME",*key);
+      msiGetValByKey(*row,"META_DATA_ATTR_VALUE",*value);
+      msiAddKeyVal(*query_kvp, *key, *value);
+    }
+
+    # create a new 'trimmed' metadata structure including the key pairs to be removed
+    foreach(*i in *query_kvp) {
+      #writeLine("stdout", "key is *i")
+      if (*i == "a" || *i == "art") then {
+        msiGetValByKey(*query_kvp,*i,*result)
+        writeLine("stdout","4: metadata to keep on *newfile, *i=*result")
+      } else {
+        msiGetValByKey(*query_kvp,*i,*result)
+        writeLine("stdout","4: metadata to remove from *newfile, *i=*result")
+        msiAddKeyVal(*new_kvp, *i, *result)        
+      }
+    }
+    # remove key pairs listed in the new metadata structure from the data object 
+    msiRemoveKeyValuePairsFromObj(*new_kvp,*newfile,"-d")
+
+    # create a new kvp structure, add key pairs
+    msiAddKeyVal(*kvp2, "company", "OCF");
+    msiAddKeyVal(*kvp2, "department", "Cloud");
+    msiGetSystemTime(*created_epoc, "unix")
+    msiAddKeyVal(*kvp2, "create_date_epoc", *created_epoc );
+
+    # get system time, load into list and grab elements based on position
+    msiGetFormattedSystemTime(*created,"human","%d-%02d-%02d-%02d-%02d-%02d")
+    writeLine("stdout", "4: year:" ++ " " ++ (elem((split(*created,"-")),0)) )
+    *year = elem((split(*created,"-")),0)
+    *month = elem((split(*created,"-")),1)
+    *day = elem((split(*created,"-")),2)
+    msiAddKeyVal(*kvp2, "create_year", *year );
+    msiAddKeyVal(*kvp2, "create_month", *month );
+    msiAddKeyVal(*kvp2, "create_day", *day );
+
+    # add meta data to the data object; -d file(data object), -C directory(collection)
+    msiAssociateKeyValuePairsToObj(*kvp2,*newfile,"-d");
+
+    # find files with metadata between an epoc date range
+    # supported operators
+    #>=
+    #<=
+    #=
+    #<
+    #>
+    #'1' '100'
+    #
+    # 2020(1575072000) - 2030(1890691200)
+    *query = SELECT DATA_NAME WHERE COLL_NAME = '*collection' AND META_DATA_ATTR_NAME = 'create_date_epoc' AND META_DATA_ATTR_VALUE BETWEEN '01575072000' '01890691200'
+    foreach(*row in *query) {
+      msiGetValByKey(*row,"DATA_NAME",*data_name)
+      writeLine("stdout", "4: file: " ++ "*data_name" ++ " created between 2020 - 2030" )
+    }
+
+  }
+
+  ### msiModAVUMetadata - change metadata directly on the object/collection/resource ###
+
+  # this is new microservice as of version 4.2.7 and easy to use
+  # msiModAVUMetadata allows key, value and unit (AVU) manipulation, much like the imeta icommand
+  # assigning an additional attribute 'unit' to the key pair is useful and can be treated as a secondary value or left empty ""
+  
+  # remove all key pairs directly from the data object
+  msiModAVUMetadata("-d","*newfile","rmw", "%", "%", "%")
+
+  # add new key pair directly to the data object
+  msiModAVUMetadata("-d","*newfile","add", "car", "ford", "string")
+
+  # change value for key directly on the data object
+  msiModAVUMetadata("-d","*newfile","set", "car", "toyoda", "string")
+
+  # remove key pair directly on the data object
+  msiModAVUMetadata("-d","*newfile","rm", "car", "toyoda", "string")
+
+  # wildcard remove key pairs directly on the data object
+  msiModAVUMetadata("-d","*newfile","add", "car", "subaru", "string")
+  msiModAVUMetadata("-d","*newfile","add", "car", "suzuki", "string")
+  msiModAVUMetadata("-d","*newfile","add", "car", "saab", "string")
+  msiModAVUMetadata("-d","*newfile","rmw", "car", "su%", "%")
+  #msiModAVUMetadata("-d","*newfile","rmw", "ca%", "%", "%")
+
+  # add some meta data with arbitrary unit types
+  msiModAVUMetadata("-d","*newfile","add", "catC", "yes", "damage")
+  msiModAVUMetadata("-d","*newfile","add", "price", "1200", "sterling")
+
+  ## searching with metadata
+
+  # search for files in a collection where the key unit matches damage and the key value matches yes, return the filename key value key name with value yes and unit damage 
+  *query = SELECT DATA_NAME,META_DATA_ATTR_NAME WHERE COLL_NAME = '*collection' AND META_DATA_ATTR_UNITS = 'damage' AND META_DATA_ATTR_VALUE like 'y%'
+  foreach(*row in *query) {
+    msiGetValByKey(*row,"DATA_NAME",*target_file)
+    msiGetValByKey(*row,"META_DATA_ATTR_NAME",*damage_type)
+    # search for car key value using the file name 
+    *sub_query = SELECT META_DATA_ATTR_VALUE WHERE COLL_NAME = '*collection' AND DATA_NAME = '*target_file' AND META_DATA_ATTR_NAME = 'car'
+    foreach(*sub_row in *sub_query) {
+      msiGetValByKey(*sub_row,"META_DATA_ATTR_VALUE",*car)
+      # search for the price key value under threshold (string is dynamically evaluated as numeric)
+      *sub_query = SELECT META_DATA_ATTR_VALUE WHERE COLL_NAME = '*collection' AND DATA_NAME = '*target_file' AND META_DATA_ATTR_NAME = 'price' AND META_DATA_ATTR_VALUE < '1201'
+      foreach(*sub_row in *sub_query) {
+        msiGetValByKey(*sub_row,"META_DATA_ATTR_VALUE",*price)
+        #writeLine("stdout", *price)
+      }
+      # if price variable set, its value below 1201
+      if (errorcode(*price) >=0) then {
+        writeLine("stdout","5: writeoff: *damage_type *car £*price")
+      }
+    }
+  }
+
+}
+INPUT null
+OUTPUT ruleExecOut
+
+tseed@tseed-irods:~$ irule -h # display command parameters
+tseed@tseed-irods:~$ irule -t -r irods_rule_engine_plugin-irods_rule_language-instance -F test-meta.r
+a = 10
+zebra = horse
+hula = dance
+art = crayon
+timestamp = 2021-05-24.09:46:23
+a || 10
+zebra || horse
+hula || dance
+art || crayon
+timestamp || 2021-05-24.09:46:23
+a
+zebra
+hula
+art
+timestamp
+1: a = 10
+1: zebra = horse
+1: hula = dance
+1: art = crayon
+1: timestamp = 2021-05-24.09:46:23
+2: a = 5
+2: zebra = horse
+2: no match
+2: no match
+2: timestamp = 2021-05-24.09:46:23
+3: AND a = 10
+3: zebra = horse
+3: art = crayon
+3: timestamp = 2021-05-24.09:46:23
+4: collection exists: /OCF/home/rods true
+4: file removed: /OCF/home/rods/test.txt
+4: file created: /OCF/home/rods/test.txt
+4: metadata to keep on /OCF/home/rods/test.txt, a=10
+4: metadata to keep on /OCF/home/rods/test.txt, art=crayon
+4: metadata to remove from /OCF/home/rods/test.txt, hula=dance
+4: metadata to remove from /OCF/home/rods/test.txt, timestamp=2021-05-24.09:46:23
+4: metadata to remove from /OCF/home/rods/test.txt, zebra=horse
+4: year: 2021
+4: file: test.txt created between 2020 - 2030
+5: writeoff: catC saab £1200
+```
+
+## TODO developing server side rules
+
+Two server side "native language" rules examples in "python server side rule engine", need one more with some replication to s3 resource for reference.
+
+https://groups.google.com/g/irod-chat/c/ABED29dReBs/m/5fWo87WYCAAJ
+https://groups.google.com/g/irod-chat/c/ObjcBN7W1j0/m/LpQCzp-OAAAJ python rule instead of native - this needs to be put in the server side rul engine example.
+https://groups.google.com/g/irod-chat/c/evYIHiG0R60/m/QaaluRjpBwAJ print out
+https://groups.google.com/g/irod-chat/c/gZSB3Pzv8XM/m/t_eXf0LZAAAJ nice keyval stuff
+
+https://slides.com/irods/ugm2019-administration-rule-engine-plugins#/5 might explain how the rule engine has changed with new dynamic PEPs (these will mature over time and hopefully be better documented for the transition)
+
+ok what have we learned? - must have python rulesets ONLY for specific PEPS or rule engine breaks
+if the python rule engine is enabled any client irule activity must specify which rule engine to use (a bit of a pain - cant find a way to set client side defaults)
+why do they have it like this? - doesnt matter the python weighted matching engine is easier to configure in many ways - keep it but ensure the policy only overrides single PEPs - what if we want a mix of PEPS triggered from native ruleset vs python? - have to write additional python logic for the PEP to callback the r. rules engine.
+
+
--- a/documentation/4)
+++ b/documentation/4)
@ -0,0 +1,162 @@
+Using the python server side rule engine
+
+# Using the python server side rule engine
+
+[https://github.com/irods/irods\_rule\_engine\_plugin\_python](https://github.com/irods/irods_rule_engine_plugin_python)
+
+## enable rules engine /etc/irods/server_config.json
+
+To enable the engine, ensure the following definition is in the "rule\_engines" list in /etc/irods/server\_config.json:
+
+```
+        "rule_engines": [
+            {
+                "instance_name" : "irods_rule_engine_plugin-python-instance",
+                "plugin_name" : "irods_rule_engine_plugin-python",
+                "plugin_specific_configuration" : {}
+            },
+            {
+                "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
+```
+
+## add a new rulebase /etc/irods.server_config.json
+
+Add the new rule definition to the rulebase list in /etc/irods.server_config.json:
+
+```
+            "re_rulebase_set": [
+                "ocf",
+                "core"
+            ],
+```
+
+## create the new rulebase definition /etc/irods/ocf.re
+
+Create the /etc/irods/ocf.re rules definition file:
+
+```
+irods@irods:~$ cat /etc/irods/ocf.re
+# meta, object path, type -d
+ocfTagDataObject(*str, *objpath, *objtype) {
+    msiString2KeyValPair(*str, *kvp);
+    msiAssociateKeyValuePairsToObj(*kvp, *objpath, *objtype);
+    writeLine("serverLog", "added metadata to *objpath");
+
+    #session variables also available here in native rules structure
+    #
+    #msiGetSessionVarValue("all", "server");
+    #*get_session_var=msiGetSessionVarValue("objPath", "server");
+    #*something = str(msiGetSessionVarValue("objPath", "client"));
+    #writeLine("serverLog", "*something");
+}
+
+getSessionVar(*name,*output) {
+    *output = eval("str($"++*name++")");
+}
+```
+
+## create the python rule entry script
+
+The core.py entry script lists policy entry points (PEP) and contains logic to take actions typically based on the contents of session variables. When the rules engine matches an event it will call a rule.
+
+> /etc/core.py calls rule named ocfTagDataObject in rules definition file /etc/irods/ocf.re
+
+The purpose of the following entry script is to illustrate the PEP acPostProcForPut being triggered and custom rule logic deciding on which rule(s) to invoke, this could be done purely in the native rule engine language at the cost of flexibility afforded by Python.
+
+Rule 4 is calling to the native rule engine in ocf.re.
+
+```
+import os
+import session_vars
+import sys
+import inspect
+import time
+
+def rule1(sv):
+  print "this is rule1 unzip"
+
+def rule2(sv):
+  print "this is rule1 csv"
+
+def rule3(sv):
+  print "this is rule1 email"
+
+def rule4(callback, sv):
+  objpath = sv['data_object']['object_path']
+  tag_ext = os.path.splitext(sv['data_object']['object_path'])[1]
+  tag_upload_date = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(int(sv['data_object']['modify_time'])))
+  tag_upload_year = time.strftime('%Y', time.localtime(int(sv['data_object']['modify_time'])))
+  tag_upload_month = time.strftime('%B', time.localtime(int(sv['data_object']['modify_time'])))
+  tag_upload_day = time.strftime('%A', time.localtime(int(sv['data_object']['modify_time'])))
+  tag_user = sv['client_user']['user_name']
+  tag_dept = "OCF"
+  tag_project = "OCF"
+  metadata = []
+  for tag in dir():
+    if tag.startswith('tag_'):
+      metadata.append(tag.split('tag_')[1] + "=" + eval(tag))
+  metadata = '%'.join(metadata)
+  callback.ocfTagDataObject(metadata, objpath, '-d')
+  #call to the native rule in ocf.re
+
+def acPostProcForPut(rule_args, callback, rei):
+  callback.writeLine('serverLog', 'acPostProcForPut execrule triggered')
+
+  sv = session_vars.get_map(rei)
+  directory = os.path.dirname(sv['data_object']['object_path'])
+  file = os.path.basename(sv['data_object']['object_path'])
+  ext = os.path.splitext(sv['data_object']['object_path'])[1]
+  zone = directory.split("/")[1]
+  resource = sv['data_object']['resource_name']
+  user = sv['client_user']['user_name']
+  ruledict = {'rule1': {'directory': '/OCF/home/rods/unzip', 'ext': '.zip', 'zone': 'OCF', 'resource': 'OCFs3Resc', 'user': 'rods'}, 'rule2': {'directory': '/OCF/home/rods/csv', 'ext': '.csv'}, 'rule3': {'directory': '/OCF/home/tseed/email', 'user': 'tseed'}, 'rule4': {'directory': '/OCF/home/rods/jpg', 'ext': '.jpg', 'zone': 'OCF', 'resource': 'OCFResc', 'user': 'rods'}}
+  ruleweight = {}
+
+  for rule in ruledict:
+    weight = 0
+    rulesize = len(ruledict[rule])
+    for attribute in ruledict[rule]:
+      if eval(attribute): # check if sv returns empty for any variables, such as files without extension
+        if eval(attribute) == ruledict[rule][attribute]: # if sv returned directory matches rule directory entry
+          weight += 1
+    if rulesize == weight: # if rule attributes are all matched set as possible rule
+      ruleweight[rule] = weight
+
+  weight = 0
+  candidate = []
+  for entry in ruleweight: # greatest rule weight
+    if ruleweight[entry] > weight:
+      weight = ruleweight[entry]
+  for entry in ruleweight: # find rule(s) rules with greatest weight
+    if ruleweight[entry] == weight:
+      candidate.append(entry)
+  if len(candidate) > 1:
+    compete = ""
+    for entry in candidate:
+      compete = compete + entry + " "
+    #print("competing rules: %s" % compete)
+    message = ("core.py competing rules: %s" % compete)
+    callback.writeLine('serverLog', message)
+    return
+  elif len(candidate) < 1:
+    return # no matching rule
+  else:
+    execrule = candidate[0]
+    try:
+      inspect.isfunction(eval(execrule))
+    except:
+      #print("rule action missing: %s" % execrule)
+      message = ("core.py rule action missing: %s" % execrule)
+      callback.writeLine('serverLog', message)
+      return
+    else:
+      #print("execute rule: %s" % execrule)
+      message = ("core.py execute rule: %s" % execrule)
+      callback.writeLine('serverLog', message)
+      (eval(execrule))(callback, sv) # run function
+```
+
+## TO DO
+
+- example use of the native rule engine PEP triggering, this is an 'on' statement mechanism in core.re, the logic not as easy/flexible as python rule engine. 
+- purely python rule without callback to the native rule engine, this is a more flexible scenario but not well documented. 
--- a/documentation/5)
+++ b/documentation/5)
@ -0,0 +1,29 @@
+iRODS icommands
+
+# icommands
+
+> https://docs.irods.org/4.2.0/icommands/administrator/
+> https://docs.irods.org/4.2.0/icommands/user/
+
+#### commands
+
+ls /usr/bin/i*
+
+#### navigation
+
+irods@irods:~/minio$ ipwd
+/OCF/home/rods
+irods@irods:~/minio$ icd /OCF
+irods@irods:~/minio$ ipwd
+/OCF
+irods@irods:~/minio$ ils
+/OCF:
+C- /OCF/home
+C- /OCF/trash
+irods@irods:~/minio$ icd
+irods@irods:~/minio$ ipwd
+/OCF/home/rods
+
+## TODO some scenario examples that cover all the useful icommands
+
+need to document everything covered in training - lots of iCAT sql queries and metadata tricks
--- a/documentation/6)
+++ b/documentation/6)
@ -0,0 +1,9 @@
+whats next?
+
+whats next?
+
+- talk to Jomon, talk to thomas and give script
+- think need some "basics" of commands - go through training and write that up in the "iRODS commands" section
+- need native rule engine 'ON' conditions on PEPS
+- need python only rules with no call backs - its an absolute boon for quick rule development
+
--- a/documentation/_resources/45fe7dfd0d8c482fae138bada7c94138.png
+++ b/documentation/_resources/45fe7dfd0d8c482fae138bada7c94138.png
--- a/server_rule_example/core.py
+++ b/server_rule_example/core.py
@ -0,0 +1,102 @@
+import os
+import session_vars
+import sys
+import inspect
+import exifread
+import time
+
+def rule1(callback, sv):
+  objpath = sv['data_object']['object_path']
+  callback.ocfUnzipDataObject(objpath, '-d')
+
+def rule2(sv):
+  print "this is rule1 csv"
+
+def rule3(sv):
+  print "this is rule1 email"
+
+def rule4(callback, sv):
+  objpath = sv['data_object']['object_path']
+  tag_ext = os.path.splitext(sv['data_object']['object_path'])[1]
+  tag_upload_date = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(int(sv['data_object']['modify_time'])))
+  tag_upload_year = time.strftime('%Y', time.localtime(int(sv['data_object']['modify_time'])))
+  tag_upload_month = time.strftime('%B', time.localtime(int(sv['data_object']['modify_time'])))
+  tag_upload_day = time.strftime('%A', time.localtime(int(sv['data_object']['modify_time'])))
+  tag_user = sv['client_user']['user_name']
+  tag_dept = "OCF"
+  tag_project = "OCF"
+  metadata = []
+  for tag in dir(): # dir() all registered variables
+    if tag.startswith('tag_'):
+      metadata.append(tag.split('tag_')[1] + "=" + eval(tag))
+  metadata = '%'.join(metadata)
+  callback.ocfTagDataObject(metadata, objpath, '-d')
+
+"""
+/etc/irods/core.re rule
+
+ocfTagDataObject(*str, *objpath, *objtype) {
+    msiString2KeyValPair(*str, *kvp);
+    msiAssociateKeyValuePairsToObj(*kvp, *objpath, *objtype);
+    writeLine("serverLog", "added metadata to *objpath");
+}
+"""
+
+def acPostProcForPut(rule_args, callback, rei):
+  callback.writeLine('serverLog', 'acPostProcForPut execrule triggered')
+
+
+  sv = session_vars.get_map(rei)
+  directory = os.path.dirname(sv['data_object']['object_path'])
+  file = os.path.basename(sv['data_object']['object_path'])
+  ext = os.path.splitext(sv['data_object']['object_path'])[1]
+  zone = directory.split("/")[1]
+  resource = sv['data_object']['resource_name']
+  user = sv['client_user']['user_name']
+  ruledict = {'rule1': {'directory': '/OCF/home/rods/unzip', 'ext': '.zip', 'zone': 'OCF', 'resource': 'OCFResc', 'user': 'rods'}, 'rule2': {'directory': '/OCF/home/rods/csv', 'ext': '.csv'}, 'rule3': {'directory': '/OCF/home/tseed/email', 'user': 'tseed'}, 'rule4': {'directory': '/OCF/home/rods/jpg', 'ext': '.jpg', 'zone': 'OCF', 'resource': 'OCFResc', 'user': 'rods'}}
+  ruleweight = {}
+
+  for rule in ruledict:
+    weight = 0
+    rulesize = len(ruledict[rule])
+    for attribute in ruledict[rule]:
+      if eval(attribute): # check if sv returns empty for any variables, such as files without extension
+        if eval(attribute) == ruledict[rule][attribute]: # if sv returned directory matches rule directory entry
+          weight += 1
+    if rulesize == weight: # if rule attributes are all matched set as possible rule
+      ruleweight[rule] = weight
+
+  weight = 0
+  candidate = []
+  for entry in ruleweight: # greatest rule weight
+    if ruleweight[entry] > weight:
+      weight = ruleweight[entry]
+  for entry in ruleweight: # find rule(s) rules with greatest weight
+    if ruleweight[entry] == weight:
+      candidate.append(entry)
+  if len(candidate) > 1:
+    compete = ""
+    for entry in candidate:
+      compete = compete + entry + " "
+    #print("competing rules: %s" % compete)
+    message = ("core.py competing rules: %s" % compete)
+    callback.writeLine('serverLog', message)
+    return
+  elif len(candidate) < 1:
+    return # no matching rule
+  else:
+    execrule = candidate[0]
+    try:
+      inspect.isfunction(eval(execrule))
+    except:
+      #print("rule action missing: %s" % execrule)
+      message = ("core.py rule action missing: %s" % execrule)
+      callback.writeLine('serverLog', message)
+      return
+    else:
+      #print("execute rule: %s" % execrule)
+      message = ("core.py execute rule: %s" % execrule)
+      callback.writeLine('serverLog', message)
+      (eval(execrule))(callback, sv) # run function
+
+
--- a/server_rule_example/ocf.re
+++ b/server_rule_example/ocf.re
@ -0,0 +1,118 @@
+ocfUnzipDataObject(*objpath, *objtype) {
+
+    # use filename.zip to build string filename.csv 
+    msiSplitPath(*objpath,*collection,*file)
+    *file_extension_elem = split(*file,".");
+    *file_extension_elem_size = size(*file_extension_elem);
+    #*file_extension = elem(*file_extension_elem,(*file_extension_elem_size -1));
+    *match_file = ""
+    # loop used to match filename in filename.x.y.z.zip
+    for (*i = 0 ; *i < (*file_extension_elem_size -1) ; *i = *i + 1) {
+      *filename_part = elem(*file_extension_elem,*i)
+      *newstr = "*match_file" ++ "*filename_part" ++ "."
+      *match_file = *newstr 
+    }
+    # build filename file.csv from file.zip 
+    *match_extension = "csv"
+    *newstr = "*match_file" ++ "*match_extension"
+    *match_file = *newstr
+
+    # find the physical path of filename.zip, this should be a local unix filesystem mounted as a Resource (or s3 Resource cached directory)
+    *query = SELECT DATA_PATH WHERE COLL_NAME = '*collection' AND DATA_NAME = '*file'
+    foreach(*row in *query) {
+      msiGetValByKey(*row,"DATA_PATH",*physpath)
+    }
+
+    # get contents of filename.zip, if contains filename.csv extract and register in iRODS
+    #
+    # link unzip and stat to iRODS executable directory for msiExecCmd 
+    # ln -s /usr/bin/unzip /var/lib/irods/msiExecCmd_bin/unzip
+    # ln -s /usr/bin/stat /var/lib/irods/msiExecCmd_bin/stat
+    # ln -s /bin/mkdir /var/lib/irods/msiExecCmd_bin/mkdir
+    # ln -s /bin/rm /var/lib/irods/msiExecCmd_bin/rm
+    msiExecCmd("unzip","-Z1 *physpath","localhost","null","null",*result)
+    #msiGetStderrInExecCmdOut(*result,*stderr);
+    msiGetStdoutInExecCmdOut(*result,*stdout);
+    # check each line of stdout for a matching filename.csv
+    *stdout_elem = split(*stdout,"\n")
+    *stdout_elem_size = size(*stdout_elem)
+    for (*i = 0 ; *i < *stdout_elem_size ; *i = *i + 1) {
+      *zip_content_match = elem(*stdout_elem,*i)
+      # if filename.csv found in filename.zip
+      # where metadata is to be extracted from the file name (like date) this would be performed here
+      if (*zip_content_match == *match_file) then {
+        #writeLine("serverLog", "*check_match = *match_file")
+        # create temporary directory and unzip file
+        msiSplitPath(*physpath,*phys_dir,*phys_file)
+        msiGetSystemTime(*time, "")
+        *phys_tmp_dir = "*phys_dir" ++ "/" ++ "*time"
+        *tmp_dir = "*collection" ++ "/" ++ "*time"
+        *phys_tmp_path = *phys_tmp_dir ++ "/" ++ *match_file
+        *extract_objpath = *collection ++ "/" ++ *match_file
+        msiCollCreate(*tmp_dir,"",*result)        
+        msiExecCmd("mkdir","*phys_tmp_dir","localhost","null","null",*result)
+        msiExecCmd("unzip","-j *physpath *match_file -d *phys_tmp_dir","localhost","null","null",*result)
+
+        # check file extracted correctly, move from temp directory to unzip directory, otherwise tidy up and error
+        if (errorcode(msiExecCmd("stat",*phys_tmp_path,"localhost","null","null",*result)) >=0) {
+          writeLine("serverLog", "*phys_tmp_path successfully extracted")
+          # register the extracted file in iCAT to allow it to be copied to unzip folder
+          # extracting the file directly to the unzip folder and registering (without copy) has odd behaviour on irm delete
+          msiPhyPathReg("*tmp_dir/*match_file","OCFResc",*phys_tmp_path,"null",*reg_stat)
+          msiDataObjCopy("*tmp_dir/*match_file",*extract_objpath,"destRescName=OCFResc++++forceFlag=",*result)
+
+          # The zip cannot be removed from the iCAT in the PEP activation without a client side error
+        
+          # solution 1 - unlink/unregister file - works but client errors
+          #msiDataObjUnlink("objPath=*objpath++++forceFlag=++++unreg=",*status)
+          #msiDataObjUnlink("objPath=*objpath",*status)
+
+          # solution 2 - rename file to temp dir then remove temp dir - works but client errors
+          #*rename = *tmp_dir ++ "/" ++ *file
+          #msiDataObjRename(*objpath,*rename,"",*result)
+
+          # solution 3 - shell out and run irm - works but client errors
+          # ln -s /usr/bin/irm /var/lib/irods/msiExecCmd_bin/irm
+          #msiExecCmd("irm","-f *objpath","localhost","null","null",*result)
+
+          # solution 4 - use put microservice to move extracted file to the unzip folder
+          # this fails, the microservice only works as a client command not a server command
+          #msiDataObjPut(*extract_objpath,"OCFResc","",*put_status)
+
+          # solution 4 - run a bash script that deletes the file
+          #msiExecCmd("remove_obj.sh","*objpath","localhost","null","null",*result)
+
+          # solution 5 - try to trim all replicas
+          #msiDataObjTrim(*objpath,"OCFResc","0","null","null",*result)
+
+          # remove temp directory from iCAT and disk
+          msiRmColl(*tmp_dir,"forceFlag=",*result)
+          msiExecCmd("rm","-Rf *phys_tmp_dir","localhost","null","null",*result) 
+
+        } else {         
+
+          msiRmColl(*tmp_dir,"forceFlag=",*result)
+          msiExecCmd("rm","-Rf *phys_tmp_dir","localhost","null","null",*result)
+          
+        }
+      }
+    }
+}
+
+# meta, object path, type -d
+ocfTagDataObject(*str, *objpath, *objtype) {
+    msiString2KeyValPair(*str, *kvp);
+    msiAssociateKeyValuePairsToObj(*kvp, *objpath, *objtype);
+    writeLine("serverLog", "added metadata to *objpath");
+
+    #session variables also available here in native rules structure
+    #
+    #msiGetSessionVarValue("all", "server");
+    #*get_session_var=msiGetSessionVarValue("objPath", "server");
+    #*something = str(msiGetSessionVarValue("objPath", "client"));
+    #writeLine("serverLog", "*something");
+}
+
+getSessionVar(*name,*output) {
+    *output = eval("str($"++*name++")");
+}