Performance is key in I.T. systems (part 2)
Aug 13, 2014
In the first part of this article we concentrated on what is performance and at what scale of the system it should be translated into technical level indicators.
Now we’ll see how the model can help us keep performance in check as soon as and as long as possible and then how we can leverage this by combining our measurement in the system design itself in some very specific and demanding cases.
Testing and modeling
With the model comes the testing points, and with the scenario the activation pattern. The key point here is to share the same values between the scenario and the model (inputs), and the same values between the model and the test (targeted outputs)enough to run the tests and keep the system in check during its construction, but also enough to monitor it when live, as we all know life is … sometimes different, but the model and the scenario remains, , only the inputs can fluctuate with the business (the output can also fluctuate when the impact is more closely witnessed in both directions).
One interesting approach with such a modeling effort is to keep checking various points, not only the end-to-end behavior. Change in subsystems or in the underlying components can then be more easily identified when they impact the system. Those “inner points” are also very useful to communicate with other actors.
In software we tend to overestimate the role of the code itself in the global performance of the system (bad code impacts performance, but is usually found quickly when good testing is in place, especially when it is done regularly). For example simply setting up an appliance for security reasons could easily contribute to the main part of the system latency even if it’s “transparent” for performance (latency is usually underestimated as a key contributor for user experience and sometimes for resources overuse).
Modeling a system is multi-layer work, modeling the behavior of all the components forming each subsystem. Testing is also multi-layer work as the software in a bubble is not only what will constitute the running system.
On a project some years ago, we modeled the performance from the inception phase to help a client select the right solution. This model has been used and improved during a two year journey. The project represented and has been very helpful in two key areas:
- Detecting variations in behavior (issues with regressions in product releases, with changes in the database configuration)
- When approaching production, detecting variations in the volumes of data and requests (load levels where estimated based on business indicators, most were very close to actual values, some were changed by systems implementation up in the chain sometimes 2 to 5 times more messages than anticipated)
New perspective, what if performance is built in the software?
In common heavy load systems designs, a technical architect often works “around” the software components to define how much instances are to set up and to control the load on each active component by adjusting threads and queues, and adding some monitoring to detect out of range behavior and anticipate for system overload.
That’s load regulation, and good subsystems have clear and regulated interfaces to reduce the potentially chaotic behavior induced by overloading a subsystem. But the more regulation points, the more latency raises. Performance under load is exchanged for pure performance.
Once the model is in place and indicators are defined that “art” of load regulation is not that complex and is generally expressed in simple formulas.
What if we could build this in the software itself? Those formulas could be implemented as an aspect measuring the live performance and adjusting the load it accepts or slowing down requests before the system chokes dynamically. It’s basically what some load balancers do, when setup with preventive health check, but with a global (or per services lines) approach.
Such solutions begin to emerge and one of the most elegant ones is autoletics (ex-jInspired) with Sentris (and Signals). William Louth, in a 2012 article describes an impressive view on what reflexivity can bring on software performance and QoS.
This could reduce the need to push all the constraints through heavy anonymous components, like proxy and appliances, build aspects with the software itself, and keep only the edge control in appliances. Lower cost means latency. Devops at work?
This is another reason to model early and continuously, but with an opportunity to be more granular and even have an improved resolution on the scale at which measures and actions are located.