Business Transaction Performance: Proactive Prevention and Visibility

Reacting in a timely manner to time-dependant behavior of applications and business transactions is key to beginning a proactive strategy.

David Mavashev

Sept. 4, 2009

9 min read

With the recent worldwide recession, layoffs and cost-cutting efforts have turned the spotlight on IT infrastructure and how utilizing advancements in enterprise technology can bring greater efficiency to business transactions. The concurrent dawn of Service Oriented Architecture (SOA), while promising great benefit, has actually added new challenges to the already demanding set of requirements for effective application management. In response to these changing conditions, businesses are starting to understand that proactive problem prevention can replace traditional strategies focused on fire-fighting.

In today's economy, IT is burdened with three commandments: reduce cost, improve service and manage risk. Nevertheless, IT personnel need to learn how to do more with less by finding ways to squeeze stealth waste out of business processes. They must also remember that competition still exists and they do have to be concerned with improvements to the quality of service their applications provide. And finally, they need to manage the risk that comes from potentially breaching an Service Level Agreements (SLA) or the bigger risk of customer attrition.

Every day, billions of business transactions flow back and forth from web-based applications through web servers, application servers, messaging technologies and back-end mainframe systems. While some emerging technologies are beginning to recognize the importance of proactive problem solving, most are stuck in reactive mode. Many don't provide the requisite visibility businesses need to effectively measure the impact of problems.

Now, businesses are beginning to recognize the power of applying a Business Transaction Performance (BTP) approach towards these issues. But how can monitoring business transaction performance help businesses prevent problems? How can this methodology provide the full visibility across operational performance, transactional performance and compliance to SLAs that is necessary to reduce cost, improve service and manage risk?

Challenges in Operational Efficiency

With the emergence of new applications based on SOA and Enterprise Service Buses (ESBs), IT infrastructure complexity has grown with such force that added strain has been placed on operations and application support groups. Workforce cuts due to the economic environment and increasing demand for highly trained personnel have placed even greater strain on executives that recognize the need for cutting-edge IT functionality.

Operations groups usually have well-defined tools for dealing with network complexities and monitoring availability and performance from distributed servers' perspectives. Technology groups have their own diagnostic products, usually specific to the area of their domain expertise. However, many businesses have yet to see the value in providing end-to-end visibility into applications from transactional and business service perspectives to pinpoint potential performance bottlenecks.

New vendors are attempting to address this problem by providing transactional views of specific business processes spanning multiple technology tiers. Unfortunately, most of these companies are only providing a narrow perspective. Some may attempt to give a horizontal view, while others are focused on particular verticals.

In either case, adding a transactional perspective into application performance helps reduce the mean time to problem resolution and provides the requisite insight to reduce transaction latency. However, the transactional angle alone is not an effective strategy to expedite quick determination of the root cause of a performance bottleneck. Businesses must be able to consider a combined transactional and operational view of the performance of business processes on the same pane of glass. Adding a third axis to the mix, business key performance indicators enable the correlation of root cause to business impact.

Therein lays the main challenge faced by enterprises today: changing the focus from IT-driven operational performance to driving business transaction performance in a proactive manner.

The Need for Speed

IT executives understand that no single ISV can provide the entire tier stack for business transaction management. They recognize that enterprise management solutions are not suited for monitoring of business transactions.

For large firms, many of which are driven by low latency requirements, the speed of Business Transaction Performance is essential. Billions of trades, millions of claim forms or constantly changing availability of source materials, flow daily through their infrastructures. With the ascendance of six-sigma, manufacturers must be able to rapidly synchronize their lines in order to ensure that process variation is minimal. The IT managers are looking for easily and quickly extensible and scalable solutions with high speed correlation engines. Proactively identifying potential root causes of problems and determining best execution are essential actions to them.

Proactive Problem Prevention via Complex Event Processing

Traditional reactive problem measures are based in detecting problems when they occur and addressing them, rather than anticipating and preventing them. Because of business dynamics, applications and transaction flows change so quickly that proper management instrumentation during the development stage is not feasible, and problem detection depending on events monitoring alone is not sufficient.

Problem discovery is inadequate and very costly. By the time the problem is detected, business service is either severely degraded, unavailable, or disabled, causing noncompliance with OLAs and SLAs, increased costs, and productivity loss.

Often, applications are rolled into production without proper management, compromising performance of supported business transactions and processes. As a result, far more time is spent on fault detection, rather then fault prevention.

Businesses can execute a proactive strategy by implementing tools for real-time, on-demand measurement of both transactional and operational Key Performance Indicators (KPIs) and measures for processing and recognizing patterns utilizing Complex Event Processing or CEP. CEP can be used to describe and then test for what is normal business behavior and what is abnormal. Reacting in a timely manner to time-dependant behavior of applications and business transactions is key to beginning a proactive strategy. And that approach can be significant in reducing cost.

End-to-End Visibility into Business Transactions and Applications

According to industry experts, eighty percent of time spent during the problem-solving process is allocated towards indentifying the problem.

Corporate managers, line of business owners and application support groups need end-to-end, relevant visibility into their applications, from a business transaction perspective and with a granular alerting mechanism for potential problems. Greater visibility allows them to see the impact of performance problems from their perspectives, to be aware of issues before their customers become unhappy, and to be able to either manually or automatically correct the problem.

In dynamic transactional application environments, personnel must know the magnitude of particular component failure or transaction performance bottleneck and the relative level of importance of fixing the problem.

Example: Monitoring Manufacturing Process Control Systems to Avoid Costly Factory Floor Shutdowns

For large manufacturers, running factories at peak efficiency 24 hours a day, seven days a week is a huge priority. Even the shortest production stoppage can lead to losses of several million dollars when product orders are not completed on time.

Situation

A large, well-known global electronics manufacturer has built its reputation -- as well as its multi-billion dollar industry standing -- on its ability to deliver products quickly and efficiently to keep up with its high sales volume. For the company, running factories at peak efficiency to meet sales-driven production quotas is crucial to maintaining its position in the market.

The electronics manufacturer's production floors are fully automated, helping to maximize worker productivity, streamline production, and ensure high-volume throughput to meet critical business demands. Automation in the main factory was designed to guarantee four complete systems every minute, with each system's market value hovering around $10,000. Every minute, the factory manufactures $40,000 in product revenue -- close to $2.4 million per hour. With so many automated systems depending on each other for timely completion of goods, systems that slow or shut down entirely can seriously damage the company's bottom line, and can even cause a ripple effect that extends to the rest of the factory floor.

Problem

One day, the worst happened: an automated system on the factory floor failed, halting manufacturing operations entirely. The IT teams had to search for hours for the root cause of the problem, finally determining that a server had crashed around 2:30 AM. The crash caused messages to the main production application to back-up until its capacity was exceeded and the application shut down.

Without end-to-end, real-time monitoring and managing of factory floor systems, the potentially disastrous failure wasn't discovered for several hours, and no one was alerted. The factory floor remained out of operation for three hours while the IT staff scoured the large, complex, integrated application infrastructure, eventually finding and fixing the problem -- but at what cost?

The net impact to the company's bottom line, in lost product revenue alone, was over $7 million in that one day.

Solution

The electronics manufacturer implemented an industry-leading suite of business transaction performance (BTP) management software to monitor the operation of its automated factory floor systems and automatically remediate its failures and degradations. Under a consolidated dashboard, IT teams were able to bring all key manufacturing automation applications and the infrastructure systems that support them under centralized control for simple monitoring and management.

The transaction management software helps the company's IT staff identify all of the servers, the real-time state of all the transactions that traverse them and possible failure points along the critical path of their manufacturing process control systems. By applying a set of policies and business rules defining "business normal" and using the BTP software's embedded Complex Event processing engine, they can instantly detect if any one of them exhibits conditions that could indicate a pending performance degradation or failure. The software then automatically issues alerts to the key IT staff so they can take corrective action to fix the problem before manufacturing operations are affected.

Benefits

Since implementing the software solution, the company's factory floor has not experienced any more extended periods of downtime due to problems with its automated manufacturing systems. Nighttime operations are run more efficiently, and the possibility of system stoppage has been nearly eliminated, guaranteeing a timely production schedule to support sales.

David Mavashev is CEO and Founder of Nastel Technologies which is a provider of application performance and transaction management solutions for mission-critical applications. http://www.nastel.com