PhD Position F/M Enabling Scientific Workflow Composition in Large-Scale Distributed Infrastructures - Rennes, France - INRIA

INRIA Rennes, France

il y a 1 semaine

CDD

Description

Contexte et atouts du poste

Supervisory Team

Silvina Caino-Lores, PhD (Inria, France)

Gabriel Antoniu, PhD, HDR (Inria, France)

Location and Mobility

The thesis will be hosted by the KerData team at the Inria research center of Rennes. Rennes is the capital city of Britanny, in the western part of France. It is easy to reach thanks to the high-speed train line to Paris. Rennes is a dynamic, lively city and a major center for higher education and research: 25% of its population are students.

This thesis will likely include collaborations with international partners from Europe or the USA, thus research visits to and from the collaborator's teams are expected.

The KerData team in a nutshell for candidates

KerData is a human-sized team currently comprising 5 permanent researchers, 2 contract researchers, 1 engineer and 5 PhD students. You will work in a caring environment, offering a good work-life balance.

KerData is leading multiple projects in top-level national and international collaborative environments such as within the Joint-Laboratory on Extreme-Scale Computing: Our team has active collaboration with high-profile academic institutions all around the world (including the USA, Spain, Germany or Japan) and with industry.

Our team strongly favors experimental research, validated by implementation and experimentation of software prototypes with real-world applications on real-world platforms incluing some of the most powerful supercomputers worldwide.

The KerData team is committed to personalized advising and coaching, to help PhD candidates train and grow in all directions that are critical in the process of becoming successful researchers.

Check our website for more about the KerData team here: Mission confiée Context and Overview As witnessed in industry and science and highlighted in strategic documents such as the European ETP4HPC Strategic Research Agenda [MCS], there is a clear trend to combine numerical computations, large-scale data analytics and AI techniques to improve the results and efficiency of traditional HPC applications, and to advance new applications in strategic scientific domains (e.g., high-energy physics, materials science, biophysics, AI) and industrial sectors (e.g., finance, pharmaceutical, automotive, urbanism). A typical scenario consists in Edge devices creating streams of input data, which are processed by data analytics and machine learning applications in the Cloud; alternatively (or in parallel) they can feed simulations on large, specialised HPC systems, to provide insights and help for prediction of some future system state [BSAPLM17, dSFP]. Such emerging applications typically need to be implemented as complex workflows and require the coordinated use of supercomputers, Cloud data centres and Edge-processing devices. This assembly is called the Computing Continuum (CC). In the current state, multitudes of software development stacks are tailored to specific use cases, with no guarantee of interoperability between them. This greatly impedes application software development for integrated CC use cases. Moreover, existing software stacks have been developed specifically for HPC, data analytics and AI with very different requirements for their initial execution infrastructures, and cannot be integrated efficiently to support CC workflows. Programming the workflow at the highest level requires the ability to consistently combine all these components ad hoc. In this scenario, there is a need to efficiently integrate simulations, data analytics and learning, which first requires interoperable solutions for data processing in the CC [MCS]. Existing works on workflow composition and deployment in the CC focus on task-flow control and are disconnected from data patterns and structures beyond domain-specific applications [BTRZ, AVHK21]. Moreover, general approaches for representing knowledge and provenance in the form of metadata are also lacking for these converged workflows, and common interfaces for data management in the CC are necessary [RSS, GWWa]. Unified data abstractions can enable the interoperability of data storage and processing across the continuum and facilitate data analytics at all levels [BGBSC22], alleviating the disconnect between application- and storage-oriented approaches to interoperability. However, no unified data modeling approaches exist for how to structure and represent data on a logical level across
the CC. Research Objectives This project has the overarching goal of researching new data-centric approaches for scientific workflow composition in the full spectrum of the existing computing continuum, combining large-scale and distributed computing paradigms (e.g., HPC, edge-to-cloud computing) and methods in scientific computing, data science and ML/AI. The project is structured into three primary objectives: Objective 1: gain a deep understanding of the role of data in modern workflows and how data influences our ability to effectively and efficiently interoperate computing environments. Breakthroughs in data characterization are needed to understand the next steps towards interoperability in the CC, since existing works tend to focus on task characterization and placement. Workflow data and metadata characterization and profiling will be conducted to deliver data patterns for converged workflows and benchmarks. Objective 2: model data to enable interoperability of existing programming models across the CC to leverage the diversity of resources efficiently. Data patterns and workflow characterization as a result of completing the first objective will be the base to research what are the essential attributes needed to represent data and metadata (e.g., ML models, simulation data, annotations resulting from analysis) under uniform data abstractions that can be specialized for different programming models coexisting in the CC. The outcome of this objective is the formal definition of unified data abstractions and their implementation to facilitate the integration of heterogeneous data, tasks and compute resources. Objective 3: enable modular composition across the continuum. This supports the vision of the workflows community towards a modular approach to workflow composition and management, in which specialized building blocks (e.g., task scheduling, task control flow, data staging, provenance) can work together and can be configured to support the needs of different computing sites, users and applications. With a focus on data interoperability, we will contribute to this vision by providing a data exchange layer connecting established data staging and transport layers, alleviating the disconnect between raw data management and knowledge-based workflow management in the CC (e.g., in anomaly detection, steering, resource balancing, and provenance). This will be a key software deliverable of the overall project and will allow the composition and deployment of workflows across the full spectrum of the CC.

Principales activités

Envisioned Approach

The key novelty of this project is the perspective that places data in a central role for building and managing scientific workflows in the CC. We will build upon previous work that already leveraged this idea in the scope of high-performance computing and cloud computing, we will enrich and extend its reach, and we will increase its impact by establishing new collaborations.

For Objective 1 , first we will survey the literature to define how existing workflows are currently leveraging the resources in the CC in terms of infrastructure (e.g., HPC with cloud support, edge-to-cloud, and federated infrastructures) and application structure (e.g., ML-powered workflows, workflows including HPC simulations, use cases with large-scale science instruments, and applications of in situ analysis and visualization). A taxonomical analysis of these workflows will be complemented with exhaustive profiling of data volume, production rate, transfer volume and frequency of communication. We will conduct a descriptive statistical analysis of these metrics to characterize the data needs of these workflows and how they impact performance and scalability. In addition, we will qualitatively classify the characteristic features in the most common data models in the CC. In addition, a key aspect to consider is how different data access and transfer patterns at the application level affect the performance of the underlying computing hardware. For example, GPU accelerators are now a common resource in HPC, and much work has been conducted to improve the arrangement of data in memory to facilitate the work of the GPU in common application scenarios. Similarly, we will extract common workflow motifs and data patterns that must be represented by unified data abstractions and their interfaces. We have established initial connections with teams from Barcelona Supercomputing Centre (Spain) and Oak Ridge National Laboratory (USA) that are currently investigating these aspects [BBEdSL24, GWWb, GGX].

Objective 2 will borrow inspiration from own previous work that enabled the interoperation of data models tailored for supercomputing applications with Big Data-oriented paradigms like streaming and key-value storage [CLCN]. We achieved the interoperability of process- and data-centric programming models by representing the core characteristics of their respective data abstractions. We proposed a unified distributed data abstraction inspired by the data-awareness and task-based parallelism of data-centric abstractions, but with the possibility to preserve state as required by HPC applications. This abstraction represents a distributed collection of data organized in chunks, which can be locally accessible by both process- and data-centric computing units. We will leverage this foundation and the knowledge derived from the characterization of the data patterns in Objective 1 to consolidate the characteristic features of additional computing models into unified data abstractions suitable for programming models coexisting in the CC. Practically, in Objective 2 we will specialize the data abstractions by defining implementations tuned for different parts of the CC, and provide translation methods to interoperate these concrete implementations. These translation methods will take the form of data transformation interfaces and decorators for the specialization of data abstractions to specific infrastructures, thus hiding the details of the different implementations coexisting in the CC from the user of the data abstraction. The implementation will expose a Python interface, which is ubiquitous to most elements of the CC, and allows for a simple but powerful approach to workflow composition through Jupyter notebooks. This approach is increasingly accepted to build workflows as it can encapsulate configuration, composition, deployment, and post hoc analysis in a usable and reproducible manner [BTK, CAC]. Furthermore, Python programming is a common skill even for domain scientists, which can increase adoption and impact of the resulting software.

Objective 3 requires, first, identifying key enabling technologies for data staging and data management, including domain-specific and generalist solutions. We will study which technologies would give the best support to the unified data abstractions implemented in Objective 2, prioritizing the solutions produced in our team like Damaris [DAC], a middleware for I/O management and real-time processing of data from large-scale MPI-based HPC simulations. We will also study related solutions from our network of collaborators (e.g., Oak Ridge National Laboratory ADIOS-2 [GPW], University of Utah DataSpaces [DPK10], Argonne National Laboratory BraidDB [WLV]) to secure proper support in the adoption of these technologies. ADIOS-2 is a particularly promising enabling technology as it allows applications to express what data is produced, when that data is ready for output, and what data an application needs to read and when. Ultimately, we will design a data exchange layer that connects with the data staging and transport layers, alleviating the disconnect between raw data management and knowledge-based workflow management in the CC. This will take the form of an API that hides the complexity of re-implementing data models for different target infrastructures and connecting with the underlying data staging, transfer, and storage technologies. We have already discussed aspects of design of this data exchange layer with colleagues from ORNL as part of a symposium at the SIAM PP 2024 conference. Finally, we will build demonstrator workflows and end-to-end real-world applications on top of this system, specifically targeting our ongoing efforts towards supporting workflow composition for large-scale scientific projects like the Square-Kilometer Array Observatory (SKAO) in the context of

Compétences

Required:

An excellent academic record in computer science courses

Knowledge on distributed systems and data management systems

Strong programming skills (Python, C/C++)

Ability and motivation to conduct high-quality research, including publishing the results in relevant venues

Very good communication skills in oral and written English

Open-mindedness, strong integration skills and team spirit

Appreciated:

Knowledge on scientific computing and data analysis methods

Professional experience in the areas of HPC and Big Data management

Avantages

Subsidized mealsPartial reimbursement of public transport costsPossibility of teleworking (90 days per year) and flexible organization of working hoursPartial payment of insurance costs

Rémunération

Monthly gross salary amounting to 2100 euros for the first and second years and 2190 euros for the third year

2-year Post-doctoral Position: Open Hardware

il y a 1 semaine

INSA RENNES Rennes, France

**2-year Post-doctoral Position - Open hardware compilation for near-memory dataflow computing** · Keywords: High-Performance computing, Near-memory computing, Compilation, Dataflow models of computation , · Stream processing, Open hardware, RISC-V, FPGA, Computer architecture. · ...
Doctorant F/H Orchestration d'applications décentralisée et basée sur un marché dans les environnements Fog et IoT

il y a 6 jours

INRIA Rennes, France CDD

Contexte et atouts du poste · Dans le cadre du projet TARANIS (PEPR Cloud), nous proposons un poste de doctorat pour explorer l'orchestration flexible et décentralisée des applications dans les environnements fog. · Le travail sera réalisé au sein de l'équipe MAGELLAN (Centre In ...
R&D Engineer in Exascale High-Performance Computing

il y a 5 jours

INRIA Rennes, France CDD

Contexte et atouts du poste · About Inria, the team and the position · Inria is the only French public research body fully dedicated to computational sciences. Inria's missions are to produce outstanding research in the computing and mathematical fields of digital sciences and ...
PhD Position F/M Workflow Provenance and Its Application to Explainable and Transparent Artificial Intelligence

il y a 18 heures

INRIA Rennes, France CDD

Contexte et atouts du poste · Supervisory Team · Silvina Caino-Lores, PhD (Inria, France) · Alexandru Costan, PhD, HDR (INSA Rennes, France) · Rafael Ferreira da Silva, PhD (Oak Ridge National Laboratory, USA) · Ana Trisovic, PhD (Massachusetts Institute of Technology, USA) · L ...
PhD "Reconfiguration and orchestration of network's digital twins" M/F

il y a 4 jours

Orange Business Services Cesson-Sévigné, France

about the role · Your role is to work on the thesis entitled : « Reconfiguration and orchestration of network's digital twins » · Global context and problem of the subject · The digital twin is part of the major strategic technological trends [1]. Digital twins maintain of a ...
Head of Engineering

il y a 6 jours

Page Personnel Dompierre-du-Chemin, France

Our client is a leading and innovative player in the Palm oil industry, committed to sustainable and responsible practices. They are responsible for producing and distributing high-quality palm oil products within Gabon and to other parts of Africa. As the organization continues ...

PhD Position F/M Enabling Scientific Workflow Composition in Large-Scale Distributed Infrastructures - Rennes, France - INRIA

Description

2-year Post-doctoral Position: Open Hardware

Doctorant F/H Orchestration d'applications décentralisée et basée sur un marché dans les environnements Fog et IoT

R&D Engineer in Exascale High-Performance Computing

PhD Position F/M Workflow Provenance and Its Application to Explainable and Transparent Artificial Intelligence

PhD "Reconfiguration and orchestration of network's digital twins" M/F

Head of Engineering

pour les recruteurs

Information