Thematic Analysis Tool

Thematic Analysis is a method to analyze text, audio, video and other material qualitatively. In OceanDSL, we use this method to analyze interview transcripts. Goal of the thesis is to develop suitable tooling or parts of it and evaluate the tool based on existing transcripts.

Type Bachelor / Master

Task Create a coding editor in Eclipse supporting coding of text, tagging/categorizing, code and category editing.

Task Provide an interactive visualization based on ELK/Kieler.

Task Develop a web-based visualization tool

Features

  • coding, categorizing/themes
  • regrouping of codes
  • re-coding
  • splitting codes
  • merging codes

Extend a Configuration and Parametrization DSL

Type Bachelor (evaluation with MITgcm or UVic scenarios)

Type Master (evaluation with MITgcm and UVic scenarios and in conjunction with domain experts)

Task We already have developed a basic Configuration and Parameterization DSL (CP-DSL) and created generators to support the two Earth System Climate Models (ESCM) MITgcm and UVic. During an evaluation with experts, the DSL was considered useful, but multiple potential improvements were identified. In this thesis it would be your task to address one or two of these improvements:

  • Improve support for diagnostics which is able to support diagnostics in MITgcm, UVic, NEMO with XIOS and CDI-pio. The latter two are frameworks and APIs to realize logging in ESCMs.
  • Provide tooling to read existing configuration from source files and generate CP-DSL configurations from them.
  • Adapt grammar include and import mechanism for share and modify existing configurations.

Note This work will be based on an already existing DSL which will be extended.

Resources

  • XML Input Output Server (XIOS) http://www.ifremer.fr/docmars/html/doc.coupling.xios.html
  • CDI-pio https://code.mpimet.mpg.de/projects/cdi/wiki and https://code.mpimet.mpg.de/projects/cdi/wiki/Cdi-pio
  • The Sprat Approach (see also below), specifically the simulation configuration DSL
    • Sprat http://eprints.uni-kiel.de/32070/
      • Chapter 7 Especially 7-7.2.2
      • Metamodel in 7.2.3 to understand the general relationship of the terminology, the property definition in 7.2.3 is done in Object-Z, but essentially the name on the top refers to the class from the metamodel and the pairs of names in the inner frame are the properties for the class and additional constraints. Inheritance in Object-Z is shown in the defition of Internal_DSL and External_DSL (the internal DSL has also a property host language).
    • 7.3 Applying the Sprat Approach to understand how and where the separation between domains happen, you need to define roles. This is described in 7.3.1
    • Chapter 8
      • 8.3 The Sprat Ecosystems DSL
    • Source Code
    • https://github.com/cau-se/sprat-ecosystem-dsl-xtext
  • Introductions to how to write a thesis https://www.se.informatik.uni-kiel.de/en/student-theses/useful-hints
  • Case Studies
  • External DSL notation for syntax and semantics

Design and Evaluate a Deployment DSL for Ocean Models

Type Bachelor (evaluation with MITgcm or UVic scenarios)

Type Master (evaluation with MITgcm and UVic scenarios and in conjunction with domain experts)

Tasks Based on a given set of domain knowledge, concepts and sample deployment scripts, design a textual, external DSL to control deployment of models based on the two case studies UVic and MITgcm addressing all three deployment scenarios (local, dedicated host/node, and kubernetes). The DSL includes a code generator, interpreter or Jupyter kernel which performs the deployment.

Resources and Notes

  • Ansible is a deployment and configuration language https://www.ansible.com/overview/how-ansible-works
  • The deployment process for ocean models can be quite different from those established in enterprise software where Ansible is designed for. Typical processes is (so far) configure code, compile, configure program, setup, run. All these tests can be done locally or remote. However, at the point when it goes remote it stays remote from that point on. That means when code configuration happens locally, but compiling is done remote, the rest is also done remote.
  • Access to remote machines happen via ssh (legacy) or via Jupyter (latest).
  • Version management and file synchronization is done via git.
  • The DSL must be able to describe all phases of the deployment. This could be done similarly to a build pipeline from
  • Sprat Deployment based on Ansible
  • The Sprat Approach (see also below), specifically the simulation configuration DSL
    • Sprat http://eprints.uni-kiel.de/32070/
      • Chapter 7 Especially 7-7.2.2
      • Metamodel in 7.2.3 to understand the general relationship of the terminology, the property definition in 7.2.3 is done in Object-Z, but essentially the name on the top refers to the class from the metamodel and the pairs of names in the inner frame are the properties for the class and additional constraints. Inheritance in Object-Z is shown in the defition of Internal_DSL and External_DSL (the internal DSL has also a property host language).
      • 7.3 Applying the Sprat Approach To understand how and where the separation between domains happen, you need to define roles. This is described in 7.3.1
    • Source Code
    • https://github.com/cau-se/sprat-ecosystem-dsl-xtext
  • Introductions to how to write a thesis https://www.se.informatik.uni-kiel.de/en/student-theses/useful-hints
  • Case Studies
  • External DSL notation for syntax and semantics Syntax EBNF

Identify, adapt and develop Code Analysis Tooling for Fortran, C and Python

Type Bachelor (one language, one analysis, limited survey)

Type Master (full survey including preprocessing, one analysis)

Task Create a survey for code analysis tools including style checkers and components for Fortran, C and typical preprocessors. Components for such technology are parsers, lexers, ASTs.

Sources & Notes

  • Starting point: Existing Fortran grammars and tooling
    • ROSE project
    • Open Fortran Grammar

Complexity Analysis of Ocean Models

Type Master

Task Analyze existing climate and ocean models to identify internal dependencies and the architecture. Subsequently, identify code which is not used (dead code).

Potential models to analyze:

  • MITgcm, UVic (available)
  • NEMO, ICON, ECHAM5/6 (future candidates)

Key questions

  • How to identify internal dependencies/discover the architecture?
  • How to identify dead code in a model?
  • Determine complexity

Tools

  • Static code analysis utilizing grammar and preprocessor stuff (maybe Sergej and Ralf)
  • Kieker 4 C runtime architecture
  • Joint visualization with new Kieker filters via dot and graphml

Unified access to scheduling systems

Type Bachelor

Task Analyze different schedulers, like Slurm, NQSV regarding their features.

  • Research which batch systems / schedulers are in use in specific HPC installations
  • Identify their features in a feature matrix
  • Identify common concepts and/or an abstraction the functionality
  • Ask modellers regarding the use of the features (questionnaire)
  • Sketch a basic API, internal (python)/external DSL usable within Jupyter

Sources & Notes