First Visualization of the UVic Architecture

Our goal is to understand the composition of climate and ocean models to support their modularization and future development. Recently, we applied runtime monitoring on the MITgcm model. Based on our experience there, we applied the technique to the Earth System Climate Model (ESCM) of University of Victoria, Canada. Please be aware that these are very early results and may be erroneous.

The UVic model can be compiled with GNU Fortran (gfortran), but the current setup, we used, only produces a running executable with the Intel Fortran compiler (ifort). Fortunately, ifort support the same interface for runtime instrumentation as gfortran. Thus, we could apply the same probes in this context.

Based on this setup, we recorded 79 GB of binary monitoring data from a partial model run. We aim to have a complete run in future, but for the proof of concept, a partial run is sufficient. For our analysis we aimed to use the standard Kieker trace-analysis tool.

However, the Kieker trace-analysis tool uses call traces to reconstruct the deployed architecture. It is designed that way based on knowledge from web-based and service-oriented services. They have usually a small set of calls in a trace, triggered by an incoming event, message or request. In models, this is quite different. They are called once and run for a long time. Essentially, this results in one big trace. In our case 79 GB trace. This would not fit into memory, and even if, it would be very slow to process. Thus, we created a new architecture reconstruction tool based on another set of Kieker analysis stages. Utilizing this tool, we could generate our first component and operations graphs. The first component graph can be seen below.

UVic architecture based on Kieker monitoring data. Files are considered to be components.

We will continue our analysis to provide better readable graphs.

Thematic Analysis Tool

Thematic Analysis is a method to analyze text, audio, video and other material qualitatively. In OceanDSL, we use this method to analyze interview transcripts. Goal of the thesis is to develop suitable tooling or parts of it and evaluate the tool based on existing transcripts.

Type Bachelor / Master

Task Create a coding editor in Eclipse supporting coding of text, tagging/categorizing, code and category editing.

Task Provide an interactive visualization based on ELK/Kieler.

Task Develop a web-based visualization tool

Features

  • coding, categorizing/themes
  • regrouping of codes
  • re-coding
  • splitting codes
  • merging codes

Develop and Evaluate a Diagnostics DSL

Type Bachelor (evaluation with other ocean and climate models)

Type Master (evaluation with ocean and climate models and in conjunction with domain experts)

Task We already have developed a basic Configuration and Parameterization DSL (CP-DSL) and created generator to support the two Earth System Climate Models (ESCM) MITgcm and UVic. During an evaluation with experts, the DSL was considered useful, but multiple potential improvements were identified. In this thesis it would be your task to address one or two of these improvements:

  • Improve support for diagnostics which is able to support diagnostics in MITgcm, UVic, NEMO with XIOS and CDI-pio. The latter two are frameworks and APIs to realize logging in ESCMs.
  • Provide tooling to read existing configuration from source files and generate CP-DSL configurations from them.
  • Adapt grammar include and import mechanism for share and modify existing configurations.

Note This work can be based on an already existing DSL which will be extended.

Resources

  • XML Input Output Server (XIOS) http://www.ifremer.fr/docmars/html/doc.coupling.xios.html
  • CDI-pio https://code.mpimet.mpg.de/projects/cdi/wiki and https://code.mpimet.mpg.de/projects/cdi/wiki/Cdi-pio
  • The Sprat Approach (see also below), specifically the simulation configuration DSL
    • Sprat http://eprints.uni-kiel.de/32070/
      • Chapter 7 Especially 7-7.2.2
      • Metamodel in 7.2.3 to understand the general relationship of the terminology, the property definition in 7.2.3 is done in Object-Z, but essentially the name on the top refers to the class from the metamodel and the pairs of names in the inner frame are the properties for the class and additional constraints. Inheritance in Object-Z is shown in the defition of Internal_DSL and External_DSL (the internal DSL has also a property host language).
    • 7.3 Applying the Sprat Approach to understand how and where the separation between domains happen, you need to define roles. This is described in 7.3.1
    • Chapter 8
      • 8.3 The Sprat Ecosystems DSL
    • Source Code
    • https://github.com/cau-se/sprat-ecosystem-dsl-xtext
  • Introductions to how to write a thesis https://www.se.informatik.uni-kiel.de/en/student-theses/useful-hints
  • Case Studies
  • External DSL notation for syntax and semantics

Design or adapt a Deployment DSL for Ocean Models

Type Bachelor (evaluation with MITgcm or UVic scenarios)

Type Master (evaluation with MITgcm and UVic scenarios and in conjunction with domain experts)

Tasks Based on a given set of domain knowledge, concepts and sample deployment scripts, design a textual, external DSL to control deployment of models based on the two case studies UVic and MITgcm addressing all three deployment scenarios (local, dedicated host/node, and kubernetes). The DSL includes a code generator, interpreter or Jupyter kernel which performs the deployment.

Deployment is handled differently in the scientific modeling domain than it is in enterprise and embedded systems. Starting configured and compiled models is done with scheduling systems/schedulers, e.g., Slurm ​[5]​ and OpenPBS ​[2]​. On top of such schedulers tooling, like CYLC are proposed, as an overall workflow engine ​[3]​.

CYLC could already cover all our needs or parts of them. Therefore, it is your task to analyze the domain to understand what scientists do to configure and setup scientific models and then run them. Show whether this can be achieved with CYLC (if possible), what is missing, and how to integrate this with our other DSLs.

Resources and Notes

  • Introductions to how to write a thesis https://www.se.informatik.uni-kiel.de/en/student-theses/useful-hints
  • Ansible is a deployment and configuration language https://www.ansible.com/overview/how-ansible-works
  • The deployment process for ocean models can be quite different from those established in enterprise software where Ansible is designed for. Typical processes is (so far) configure code, compile, configure program, setup, run. All these tests can be done locally or remote. However, at the point when it goes remote it stays remote from that point on. That means when code configuration happens locally, but compiling is done remote, the rest is also done remote.
  • Access to remote machines happen via ssh (legacy) or via Jupyter (latest).
  • Version management and file synchronization is done via git.
  • The DSL must be able to describe all phases of the deployment. This could be done similarly to a build pipeline from
  • Sprat Deployment based on Ansible
  • The Sprat Approach (see also below), specifically the simulation configuration DSL
    • Sprat http://eprints.uni-kiel.de/32070/
      • Chapter 7 Especially 7-7.2.2
      • Metamodel in 7.2.3 to understand the general relationship of the terminology, the property definition in 7.2.3 is done in Object-Z, but essentially the name on the top refers to the class from the metamodel and the pairs of names in the inner frame are the properties for the class and additional constraints. Inheritance in Object-Z is shown in the defition of Internal_DSL and External_DSL (the internal DSL has also a property host language).
      • 7.3 Applying the Sprat Approach To understand how and where the separation between domains happen, you need to define roles. This is described in 7.3.1
    • Source Code
    • https://github.com/cau-se/sprat-ecosystem-dsl-xtext
  • Case Studies
  • External DSL notation for syntax and semantics Syntax EBNF
  • Operational semantics for programming languages
  1. [1]
    Reiner Jung. 2016. Generator-Composition for Aspect-Oriented Domain-Specific Languages. Kiel University, Kiel. Retrieved from https://oceanrep.geomar.de/33602/
  2. [2]
    Linuxfoundation. OpenPBS. OpenPBS Open Source Project. Retrieved January 28, 2021 from https://www.openpbs.org/
  3. [3]
    Hilary Oliver, Matthew Shin, David Matthews, Oliver Sanders, Sadie Bartholomew, Andrew Clark, Ben Fitzpatrick, Ronald van Haren, Rolf Hut, and Niels Drost. 2019. Workflow Automation for Cycling Systems. Computing in Science & Engineering, 7–21. DOI:https://doi.org/10.1109/mcse.2019.2906593
  4. [4]
    Benjamin C. Pierce. 2002. Types and Programming Languages. The MIT Press.
  5. [5]
    Slurm Commercial Support and Development. Slurm Workload Manager. Slurm Workload Manager. Retrieved January 28, 2022 from https://slurm.schedmd.com/documentation.html

Identify, adapt and develop Code Analysis Tooling for Fortran, C and Python

Type Bachelor (one language, one analysis, limited survey)

Type Master (full survey including preprocessing, one analysis)

Task Create a survey for code analysis tools including style checkers and components for Fortran, C and typical preprocessors. Components for such technology are parsers, lexers, ASTs.

Sources & Notes

  • Starting point: Existing Fortran grammars and tooling
    • ROSE project
    • Open Fortran Grammar

Complexity Analysis of Ocean Models

Type Master

Task Analyze existing climate and ocean models to identify internal dependencies and the architecture. Subsequently, identify code which is not used (dead code).

Potential models to analyze:

  • MITgcm, UVic (available)
  • NEMO, ICON, ECHAM5/6 (future candidates)

Key questions

  • How to identify internal dependencies/discover the architecture?
  • How to identify dead code in a model?
  • How to identify pattern in code (bad smells, clones)?

Tools

  • Static code analysis utilizing grammar and preprocessor stuff
  • Kieker 4 C runtime architecture
  • Joint visualization with new Kieker filters via dot and graphml

Unified access to scheduling systems

Type Bachelor

Task Analyze different schedulers, like Slurm, NQSV regarding their features.

  • Research which batch systems / schedulers are in use in specific HPC installations
  • Identify their features in a feature matrix
  • Identify common concepts and/or an abstraction the functionality
  • Ask modellers regarding the use of the features (questionnaire)
  • Sketch a basic API, internal (python)/external DSL usable within Jupyter

Sources & Notes

Architecture Analysis of Climate Models based on Kieker Runtime Data

We are analyzing the architecture of climate and ocean models utilizing runtime monitoring data reflecting call traces. Currently, we are testing our technical approach based on the MITgcm and its set of prepared verification experiments including all tutorial setup.

Today we analyzed two experiments, namely tutorial_barotropic_gyre and tutorial_global_oce_biogeo. For the former, we derived components based on files. For the latter, we derived components based on the package and source code directory structure, namely, each MITgcm package represents one component and the main code block is represented by the BASE component.

Preliminary results can be seen in the following two figures:

Components based on files

As you can see, the file based components are quite numerous and result in a fast graph. In contrast the, package based component graph looks as follows.

Components based on packages and main source tree

In the latter graphic you can see cyclic dependencies between BASE and most components. This might be the result that BASE comprises of all non-package code files. Thus, a better separation in this area might be helpful. Still, the figure is much concise and understandable.

Due to technical issues, we cannot provide operation based images for both cases. There might be an error in the dot renderer.

Configuration and Parameterization DSL

Over the summer, we developed our first language prototype of a Configuration and Parameterization DSL. The DSL allows to specify parameters for the MIT General Circulation Model (MITgcm). The current prototype is limited to support configuration for the following tutorial examples:

Currently, we extend the DSL to be able to support all tutorial examples and features of MITgcm and prepare the support of the UVic model.

Project Resources

Experimenting with MITgcm

In our OceanDSL research project, we aim to produce Domain-Specific Languages (DSLs) for scientists and technicians to support and ease their work. We learned that through a set of interviews with them that every model uses special self-made tools and methods to configure, build and run models. This makes our effort complicated. Therefore, we need a model which is fairly simple to setup, but covers all typical steps in model setup. We choose MITgcm based on a recommendation of a collaborating scientist.

MITgcm is an earth model including ocean and atmosphere. Before we can use it as a case study in our model user and model developer scenarios, we need to make us familiar with the model.

As a first step, we aimed to play with the examples of the model. Therefore, we created a setup which can run directly on a Linux machine and one to be run in a docker container. We tested both setup with the tutorial examples of MITgcm. Based on the Barotropic Ocean Gyre we created an image of the result of the model run and a video showing intermediate results of the model in an animation.

The single images where generated using Octave (as a free Matlab replacement) and transformed into a video with Mencoder.

Barotropic tutorial model result for an rectangular ocean.
# Adjust the following path
addpath ADD-HERE-FULL-PATH-TO-THE-MITGCM-MATLAB-SCRIPTS

XC=rdmds('XC'); YC=rdmds('YC');
# Select a proper value. The start value is 0
Eta=rdmds('Eta',0);

contourf(XC/1000,YC/1000,Eta,[-.04:.01:.04]);
colorbar;
colormap((flipud(hot)));
set(gca,'XLim',[0 1200]);
set(gca,'YLim',[0 1200])

The script for the animation looks like this:

# Adjust the following path
addpath ADD-HERE-FULL-PATH-TO-THE-MITGCM-MATLAB-SCRIPTS
XC=rdmds('XC'); YC=rdmds('YC');

# The number of frames depend on your setup
for m = 0:60
  out=["out" num2str(m, "%04d") ".png"];
  # The multiplier depends on your setup
  Eta=rdmds('Eta',1296*m);

  hf = figure ("visible", "off");
  contourf(XC/1000,YC/1000,Eta,[-.04:.01:.04]);
  colorbar;
  colormap((flipud(hot)));
  set(gca,'XLim',[0 1200]);
  set(gca,'YLim',[0 1200]);
  print(hf, out, "-dpng "); 
endfor

Finally, the mencoder call looks like this:

mencoder mf://out????.png -mf fps=2:type=png -ovc lavc -lavcopts vcodec=mpeg4:mbd=2:trell -oac copy -o output.avi
Animation of the example model.

The setup is available at https://git.se.informatik.uni-kiel.de/oceandsl/case-study-mitgcm