The US Department of Energy (CDI) advanced computer and data infrastructure (CDI) – such as supercomputers, edge systems in experimental facilities, massive data storage, and high-speed networks – are upgraded contribution to solve the most pressing scientific problems of the country.
Problems include assisting research in astrophysics, supplying new materials, designing new drugs, creating more efficient engines and turbines, and making more accurate weather forecasts and climate change predictions and more. timely.
Increasingly, computer science campaigns are exploiting distributed and heterogeneous science infrastructures that span multiple locations connected by high-performance networks, resulting in the extraction of scientific data from instruments to computing, storage facilities. and visualization.
However, since these federated service infrastructures tend to be complex and managed by different organizations, domains and communities, both the operators of the infrastructures and the scientists who use them have limited overall visibility, resulting in an incomplete understanding of the behavior. complete set of resources that science workflows cover.
While science workflow systems greatly increase the productivity of scientists by managing and orchestrating computational campaigns, the complex nature of CDIs, including resource heterogeneity and the deployment of complex system software stacks , poses several challenges in predicting the behavior of scientific workflows and in guiding them beyond system and application anomalies.
Our new project will provide an integrated platform of algorithms, methods, tools and services that will help operators and scientists at DOE facilities address these challenges and improve the overall end-to-end scientific workflow. end.
– Professor-researcher in computer science and research director at the University of Southern California
As part of a new grant from DOE, the project aims to advance knowledge about how simulation and machine learning (ML) methodologies can be harnessed and scaled up to improve DOE computer and data science. .
The project will add three important capabilities to current scientific workflow systems: (1) predict the performance of complex workflows; (2) detect and classify infrastructure and workflow anomalies and “explain” the sources of these anomalies; and (3) suggest performance optimizations. To accomplish these tasks, the project will explore the use of novel simulation, machine learning and hybrid methods to predict, understand and optimize the behavior of complex DOE science workflows on DOE CDIs.
The Deputy Director for Research and Network Infrastructure at RENCI said that in addition to creating a more efficient schedule for researchers, we would like to provide CDI operators with the tools to effectively detect, locate and deal with anomalies as they arise. occur in the complex landscape of DOE facilities. .
To detect anomalies, the project will explore real-time ML models that detect and classify anomalies by leveraging underlying spatial and temporal correlations and expert knowledge, combine heterogeneous information sources and generate predictions in real time.
The selected solutions will be integrated into a prototype system with a dashboard that will be used for evaluation by DOE scientists and CDI operators. The project will enable scientists working at the frontier of DOE science to efficiently and reliably perform complex workflows across a wide range of DOE resources and accelerate discovery time.
In addition, the project will develop ML methods that can self-learn corrective behaviors and optimize workflow performance, with an emphasis on the explainability of its optimization methods. Working together, the researchers behind Poseidon will break down barriers between complex CDIs, accelerate the timeline of scientific discovery, and transform the way computer and data science is done.
As reported by OpenGov Asia, the U.S. Department of Energy’s (DOE) Argonne National Laboratory is leading efforts to couple artificial intelligence (AI) and advanced simulation workflows to better understand biological observations and d ” accelerate drug discovery.
Argonne has collaborated with academic and commercial research partners to obtain near real-time feedback between simulation and AI approaches to understand how two proteins in the SARS-CoV-2 viral genome interact to help the virus come together. to replicate and escape the host’s immune system.