Modern IT systems have increasingly complex and dynamic architectures composed of loosely-coupled services and distributed components that operate and evolve independently. Managing system resources in such environments to ensure acceptable end-to-end application Quality-of-Service (QoS, e.g., availability, performance and reliability) while at the same time optimizing resource utilization and energy efficiency is a challenge. The adoption of virtualization and cloud computing technologies, such as Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS), comes at the cost of increased system complexity and dynamicity.
The increased complexity is caused by the introduction of virtual resources and the resulting gap between logical and physical resource allocations. The increased dynamicity is caused by the complex interactions between the applications and workloads sharing the physical infrastructure. The inability to predict such interactions and adapt the system accordingly makes it hard to provide QoS guarantees in terms of availability and responsiveness, as well as resilience to attacks and operational failures. Moreover, the consolidation of workloads translates into higher utilization of physical resources which makes systems much more vulnerable to threats resulting from unforeseen load fluctuations, hardware failures and network attacks.
A major part of our research is focused on the development of novel methods, techniques and tools for the engineering of so-called Self-Aware IT Systems and Services designed with built-in online QoS prediction and self-adaptation capabilities addressing the above described challenges. Our current emphasis is on performance and availability (focusing on capacity, responsiveness, and resource/energy efficiency aspects), on the one hand, and security (focusing on intrusion detection and prevention), on the other hand. Long- and Mid-term, we are planning to consider further QoS properties such as reliability and fault-tolerance.
This vision is the major topic of our research group named after the French philosopher and mathematician René Descartes. Self-awareness, in this context, is defined by the combination of three properties that IT systems and services should possess:
- Self-Reflective: aware of their software architecture, execution environment and hardware infrastructure on which they are running, as well as of their operational goals, e.g., QoS requirements, cost- and energy-efficiency targets (“the mind controls the body, but the body can also influence the mind”– René Descartes),
- Self-Predictive: able to predict the effect of dynamic changes, e.g., changing service workloads, as well as predict the effect of possible adaptation actions, e.g., changing service deployment and/or resource allocations (“thought is what happens in me such that I am immediately conscious of it” – René Descartes),
- Self-Adaptive: proactively adapting as the environment evolves in order to ensure that their operational goals are continuously met, (“for it is not enough to have a good mind: one must use it well” – René Descartes).
Our approach to the realization of the above vision is based on the use of online system architecture models integrated into the system components and capturing all system aspects relevant to managing their QoS and resource efficiency during operation. In contrast to black-box models, the modeling techniques we are working on are designed to explicitly capture all relevant aspects (both static and dynamic) of the underlying software architecture, execution environment, hardware infrastructure, and service usage profiles. In parallel to this, we are working on novel service platforms designed to automatically maintain the online models during operation to reflect the evolving system environment. The online models are intended to serve as a “mind” to the running system controlling its behavior at run-time, i.e., deployment configurations, resource allocations and scheduling decisions. To facilitate the initial model construction and continuous maintenance during operation, we are working on techniques for automatic model extraction based on monitoring data collected at run-time.
Self-Aware Software and Systems Engineering is a newly emerging research area at the intersection of several computer science disciplines including Software and Systems Engineering, Computer Systems Modeling, Autonomic Computing, Distributed Systems, Cluster and Grid Computing, and more recently, Cloud Computing and Green IT. The realization of the described vision calls for an interdisciplinary approach considering not only technical but also business and economical challenges. The resolution of these challenges promises to reduce the costs of ICT and their environmental footprint while keeping the high growth rate of IT services.
Another important area on our research related to the described vision is the development of standard metrics, tools and benchmarks for quantitative system evaluation and analysis, focusing on QoS and efficiency-related aspects. In line with this direction, we co-founded a new group within SPEC (Standard Performance Evaluation Corporation) called SPEC Research Group (SPEC RG) with the mission to serve as a platform for collaborative research efforts in the area of quantitative system evaluation and analysis, fostering the interaction between industry and academia in the field. The scope of the group includes computer benchmarking, performance evaluation, and experimental system analysis considering both classical performance metrics such as response time, throughput, scalability and efficiency, as well as other non-functional system properties included under the term dependability, e.g., availability, reliability, and security.
The SPEC RG currently has over 35 member organizations from academia and industry. The conducted research efforts span the design of metrics for system evaluation as well as the development of methodologies, techniques and tools for measurement, load testing, profiling, workload characterization, dependability and efficiency evaluation of computing systems. As part of these efforts, our research group is actively involved in the development of: i) metrics and benchmarks for intrusion detection and prevention systems in virtualized environments, ii) metrics and benchmarks for quantifying the elasticity of IaaS/PaaS environments, and iii) metrics and benchmarks for quantifying performance isolation in multi-tenant SaaS environments.