MotionWise Global Scheduler: Tackling the Challenges of Modern SDVs

Challenge

The software-defined vehicle (SDV) is based on an integrated, high-performance, multi-SoC/multi-core computing platform that executes complex software functions for applications like advanced driving and driver assistance (ADAS). The increasing amount of software requires the fusion of small electronic control units (ECUs) and some domain controller units (DCUs) into a more centralized vehicle architecture. This is necessary to support the increasing demand for computation power, keep headroom for future software updates and new functionalities over the lifetime of the car as well as lowering wiring costs and the vehicle weight. As this trend continues, more and more software-defined vehicle functions will be integrated into the high-performance compute platform within the vehicle.

E/E Architecture

The main challenge is to ensure safety and reliability in light of the enormous increase in complexity. Fast execution of safety-related software is desired but not sufficient, since it is more important to run them within a proven upper time bound.

Vehicle functions are realized through multiple interdependent application threads and processes that form dependency chains. Their relationship is described by data dependencies, timing, shared resources etc. Figure 1 below shows an example of such interdependent software chain latencies within an ADAS/AD domain controller. The maximum latency for the single execution of such a chain ranges from 20 to 1200 ms, depending on its functionality. Although they run on the same hardware and overlap in time, all of them must finish within their own defined time intervals.

Chain Latencies

Figure 1: Execution chain latencies in ADAS/AD domain controller (© TTTech Auto) 

 

Fulfilling such end-to-end latency requirements on today’s hardware platform makes the situation even more challenging, due to several factors:

  • The continuously increasing number of CPU cores and semiconductors, which expands the configuration space. 
  • The presence of heterogeneous computing clusters, including safety and performance cores 
  • The complexity of heterogeneous networking, encompassing memory, Ethernet, PCIe.

These challenges raise several critical questions: 

  • How to allocate software tasks to CPU cores properly? 
  • How to ensure end-to-end timing, from sensors to actuators?
  • How to ensure that tasks finish on time under load scenarios?
  • How can we prevent the communication network from becoming overloaded?
  • How to achieve a valid configuration given the enormous configuration space (see Figure 2 below), where testing of all relevant states becomes impossible?
Size of an ADAS system in terms of the number of functions and the potential search and solution spaces for configurations (© TTTech Auto)

Figure 2: Size of an ADAS system in terms of the number of functions and the potential search and solution spaces for configurations (© TTTech Auto) 

The limitation of standard automotive Real-Time Operating Systems

Commercial off-the-shelf real-time operating systems (RTOS) are used to implement deterministic systems. However, although they have deterministic execution properties, standard RTOSes do not address how to satisfy all kinds of complex real-time constraints (such as the execution chains mentioned in the first chapter that compete for the same resources). Traditional, manual integration methods on those RTOSes converge slowly to a solution through costly iteration cycles. These manual approaches are not only slow but also unpredictable. They carry an inherent risk of delays during the development process as configuration changes likely impact the rest of the system. Thus, previously reliable vehicle functions might become unreliable due to software updates. In the worst case, this can lead to expensive production delays.

This traditional approach is known as correct-by-testing. Correct-by-testing is not based on mathematical guarantees at design-time. It typically involves manual integration steps, manual tuning of priorities for software component execution, manual allocation to CPU cores, and manual setting of communication priorities. Extensive testing is necessary to ensure that all real-time and safety requirements are met under all corner cases. However, given the size of the systems and the possible system states, this may no longer be feasible.

Safety critical functions are usually executed via a real-time scheduling algorithm such as Fixed-Priority (FP) and Earliest-Deadline-Frist (EDF). However, these traditional scheduling methods can generate an exponential number of system states at runtime. Variances in the runtime of a software function have a direct impact on subsequent software functions, causing data to become available at unpredictable times. This has an impact on system stability and testability, making it difficult or even impossible to reproduce scenarios. 

Solution

Finding a scalable solution with minimal human intervention is becoming business critical for the automotive industry, so that we can eventually achieve what SDVs promise. Such a solution should include (among others) the following components:

  • Shift-left methodologies to enable identification of embedded configuration issues early in the development process.
  • Minimizing changes upon software updates to reduce testing efforts.

The ANSI/UL 4600 Standard for the safety evaluation of autonomous products states that “design approaches not based on mathematically proven real-time scheduling properties are prone to missing deadlines during unusual operational conditions” [2] and recommends the use of time-triggered design techniques for safety-related functions. The time-triggered architecture provides strict correct-by-design determinism in the time domain, strong compositionality of independently built and tested components, and minimal interference between critical functions at runtime. Correct-by-design solutions are becoming more efficient given the exponentially increasing number of software components due to the intractability of correct-by-testing approaches. 

MotionWise Global Scheduler follows the correct-by-design approach. It implements scheduling algorithms to automate the integration of complex software functions on high-performance automotive compute platforms. It is designed to create a global deployment solution for the entire set of requirements defined by architects, developers and integrators, who are working on the same vehicle software. Examples are deadlines, jitter, time budgets, data dependencies, end-to-end latency of computation graphs and chains, precedence relations, safety levels and hardware resource needs, among many others. 

MotionWise Global Scheduler is powered by smart, highly parallelized heuristics. Based on the example above, where the size of the possible configurations is in the range of (10^{5000}) and the number of correct configurations is in the range of (10^5), a good heuristic [1] can solve over 80% of the cases within 200 seconds. Even in systems 2 or 3 times as large, the heuristic can still solve over 78% respectively 60% of the test cases.

Solvable Cases

Figure 3: The scalability of smart heuristics for ADAS test-cases (© TTTech Auto) 

Time-Triggered Approach

MotionWise Global Scheduler applies a Time-Triggered Architecture that shifts the complexity from an explosion of possible runtime states to finding the solution of the configuration problem (for further in-depth information click here). Compute resources are provisioned a-priori at design time based on the provided user constraints, and the execution is synchronized to a system-wide global time across all compute resources. This global synchronization is implemented in MotionWise. 

Time-triggered solutions provide better scalability, reduce testing efforts, and make systems more understandable and analyzable. Nevertheless, for sporadic (event-triggered) workloads, classical scheduling approaches like Fixed Priority (FP) are very suitable and can be integrated with the time-triggered approach.

MotionWise Global Scheduler is a vital part of TTTech Auto’s 4SDV approach, emphasizing the related roles of Systems, Safety and Security in the software-defined vehicle. The correct-by-design approach of MotionWise Global Scheduler is paramount to provide sufficient reserves of processing, memory, and communications bandwidth for continuous upgrades.

Scheduling capabilities with and without MotionWise Global Scheduler

Table 1: Scheduling capabilities with and without MotionWise Global Scheduler 

The MotionWise Global Scheduler Workflow

The MotionWise Global Scheduler generates: 

  • Task Scheduler configuration for various operating systems (QNX, Linux, Classic AUTOSAR based OSs (OSEK) and other POSIX OSs). 
  • Network Schedule for Time Sensitive Networking (TSN) devices, enabling the reliable interconnection of endpoints (SoCs) within a network.
  • Configuration for several MotionWise services (e.g. inter-SoC communication and task monitoring).
Global Scheduling within MotionWise Schedule

Figure 4: Global Scheduling within MotionWise Schedule 

Our cloud native tool suite utilizes the power of cloud computing to quickly find feasible schedule configurations. The generated global schedule is executed with the safe runtime of MotionWise. This combination replaces the highly labor-intensive manual deployment steps with tool automation, significantly reducing testing needs.

Scheduling of Event- and Data-driven Workloads

The Time-Triggered Architecture, applied by MotionWise Global Scheduler serves as the basis for resource allocation for workloads. Using this approach, deadlines can be ensured for workloads with real-time requirements, even if the exact timing is unknown a priori at design time. For example, MotionWise Global Scheduler can pre-allocate resources for event-driven software functions and fulfill their deadline requirements, if a minimum time difference between two consecutive events can be specified.

Furthermore, MotionWise Global Scheduler allocates CPU time on various levels (e.g., task, thread, process, group of processes). This is beneficial for use cases, where no precise worst- case execution time analysis is not possible and sporadic timing violations are acceptable. In ADAS/AD algorithms, the following often applies:

  • Data Determinism is important inside the data graph, and the data flow must not break. 
  • Time Determinism is important for a group of workloads (e.g. 10ms < ttotal < a few hundred ms)

Hence, MotionWise Schedule and MotionWise Safety Middleware are designed for multiple levels of scheduling, where Global Scheduling is used first to consider scheduling requirements on system level, while second-level level scheduler policies populate the CPU slots with event-driven and data-driven / data-flow-driven workloads.

Multiple levels of scheduling with MotionWise

Figure 5: Multiple levels of scheduling with MotionWise 

Conclusion

For a complex SDV, it is no longer feasible to define correct OS schedule configurations using manual trial-and-test approaches Next-generation SDVs require automated deployment of functions to high-performance compute platforms with reduced testing needs. 

MotionWise Global Scheduler automates this process. It uses highly advanced and sophisticated methodologies to automatically search for an allocation of software components to hardware resources that satisfy all constraints under all corner cases. The algorithm combines in its search task execution, event-driven and data-driven workloads, and communication across multiple SoCs, multiple CPU cores, and multiple communication capabilities at the same time.

 

References 

[1] S. D. McLean, S. S. Craciunas, E. Alexander Juul Hansen and P. Pop, "Mapping and Scheduling Automotive Applications on ADAS Platforms using Metaheuristics," 2020 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vienna, Austria, 2020, pp. 329-336, doi: 10.1109/ETFA46521.2020.9212029. 

[2] Task Group for UL 4600 (ed.): ANSI/UL 4600 Standard for Safety for the Evaluation of Autonomous Products, 2023. Online: https://www.shopulstandards.com/ProductDetail.aspx?productid=UL4600, access: January 17, 2024 

[3] Craciunas, S., Poledna, S. Time-triggered Approach for Software-defined Vehicles. ATZ Electron Worldw19, 20–24 (2024). https://doi.org/10.1007/s38314-024-1850-8&nbsp;