================================================== XSEDE Gateway Attributes Reporting ================================================== Oct. 15, 2019 AUTHOR ====== Tony Chen t9chen@ucsd.edu INTRODUCTION ============ As of Oct. 2019, all science gateways which use XSEDE resources are required to report the number of unique users who have executed jobs on XSEDE resources. XSEDE calls it "Gateway Job Attributes" and they should be reported through the new simple REST API endpoints. Attributes must include the gateway unique user ID, the local job ID (obtained from the local resource manager), the submission time (also obtained from the local resource manager), and the submission host (configured by the local service provider). The REST API can also be used to submit optional software, usage, and VM ID attributes (if applicable). More information can be found at https://xsede-xdcdb-api.xsede.org/api/gateways. REQUIREMENTS ============ ***** NOTE ***** All information from this point onward are specific to CIPRES and may or may not be applicable to other gateways. Followings are mappings between XSEDE attributes and CIPRES data elements: XSEDE : CIPRES --------------------------- apikey : Generated by XSEDE specific to the gateway gatewayuser : UUID of user jobid : REMOTE_JOB_ID (assigned by Slurm when a job is submitted) xsederesourcename : 'comet.sdsc.xsede' | 'comet-gpu.sdsc.xsede' submittime : DATE_SUBMITTED software : CIPRES does not include this information. IMPLEMENTATION ============== CIPRES development team has decided to implement the reporting function in web portal (in Java) instead of some other places like submit.py (in Python). The team has also decided to save each reporting record in database as internal records and statistics. One of the requirements from XSEDE is to be able to resend the attributes if the XSEDE REST API endpoint is offline and by saving records in a database table can easily track the status of each reporting and retry if necessary. Database -------- A new table, 'xsede_attr_report_hx,' is created to save reporting records. Please see 2019-10-10.sql file for more detail information on other database changes. XSEDE SDK --------- A separate stand-alone SDK library, XSEDE SDK, is also implemented. Classes in this utility library are responsible for communicating with XSEDE REST API endpoints and can be imported to other Java projects. org.ngbw.sdk.xsede.attribute.* org.ngbw.sdk.xsede.attribute.resource.ListSubmittedGatweayAttributes org.ngbw.sdk.xsede.attribute.resource.SubmitGatewayAttribute ListSubmittedGatweayAttributes ------------------------------ This class retrieves submitted gateway attributes to XSEDE by communicating with corresponding XSEDE REST API endpoint. SubmitGatewayAttribute ---------------------- This class submits gateway job attributes to XSEDE. Gateway Portal -------------- The following three Java classes are implemented in portal and all reportings will be carried out in the background by these classes. org.ngbw.sdk.database.XsedeAttributeReportingScheduler org.ngbw.sdk.xsede.XsedeAttributeReportingTaskManager org.ngbw.sdk.xsede.SubmitXsedeGatewayAttributeTask XsedeAttributeReportingScheduler -------------------------------- This class is singleon class and responsible for creating and modifying records on 'xsede_attr_report_hx' table. No other class should have direct access to the 'xsede_attr_report_hx' table; all insert/update operations MUST go through this class. XsedeAttributeReportingTaskManager ---------------------------------- This is also a singleton class and instantiated when the ServletContext is initialized and terminated when the ServletContext is destroyed. This class periodically checks the 'xsede_attr_report_hx' table if any new job attributes require to be reported. When it detects that new job attributes to be reported, it instantiates a SubmitXsedeGatewayAttributeTask class to carry out reporting task. SubmitXsedeGatewayAttributeTask ------------------------------- This class implements Runnable, thus it can be executed in the background. This class reads in information from the 'xsede_attr_report_hx' table, instantiates a new SubmitGatewayAttribute object, and reports attributes to XSEDE REST endpoint. It then saves reporting status in the same table and determines if a reporting needs to be rescheduled. Properties ---------- Two new properties are required and must be added to build.properties file. xsede.attribute.reporting.api.key = {gateway-specific-api-key} xsede.attribute.reporting.fail.max.retries = {n | n > 0} xsede.attribute.reporting.api.key --------------------------------- This is the API key generated by XSEDE for a specific gateway. xsede.attribute.reporting.fail.max.retries ------------------------------------------ The maximum number to resend when CIPRES detects XSEDE endpoints are offline. This value must be greater than zero or an error will be raised. The portal should be configured to instantiate a new XsedeAttributeReportingTaskManager class when the ServletContext is initialized. This class is then pushed to background. When the ServletContext is destroyed, this class is collected and terminated accordingly. WORKFLOW ======== When a user creates a job and runs it or runs a saved job, a record is created in the 'xsede_attr_report_hx' table. XSEDE Job ID will not be available at this time, thus cannot be reported to XSEDE yet. When the job run is done (either succeeded or failed), a different process of the portal will update the 'job_stats' table with relevant information including XSEDE Job ID. The XsedeAttributeReportingTaskManager, running in the background, will periodically check the 'job_stats' table, populate XSEDE Job ID in 'xsede_attr_report_hx' table, and instantiates a SubmitXsedeGatewayAttributeTask class. This task retrieves relevant inforamtion from 'xsede_attr_report_hx' table and report them to XSEDE. HANDLING XSEDE OFFLINE ====================== SubmitXsedeGatewayAttributeTask will reschedule another report (if the REPORT_NUMBER is smaller than the value assigned to xsede.attribute.reporting.fail.max.retries) under following conditions: 1. When the reporting process is interrupted. 2. When a SocketException, SocketTimeoutException, or ServiceUnavailableException is raised indicating that there might be some issues with networking or XSEDE instance. It will record the error message but will not reschedule if it receives other errors.