Challenges to DevOps and How to resolve them 2018
The collaborative foundation of DevOps decrees positive and well-intentioned communications. Defining the rules of engagement that satisfy this expectation equips each team member for success. Knowing that communication underlies and perpetuates all aspects of DevOps encourages team members toward effective communications.
Each entity, DBA, or DevOps team member must take personal responsibility for the success of this talent merger. Years of resentment and uncooperativeness has brought team division to new heights. The cultural position of DevOps brings opportunities to bridge the divide for true team partnerships.
Because DevOps continues to achieve emergent momentum, DBAs might need to come around a bit further than the already engaged DevOps players.
As with any movement or incipient technology framework, new nomenclature develops that takes time to learn and understand. Existing DevOps team members need to educate DBAs on terminology as much as practical DevOps techniques and tools.
After becoming familiar with the DevOps approach and pertinent processes and tools, DBAs introduce database practice experience to expand perceptions of data protection, schema management, data transformation, and database build best practices.
DBAs who meld database management approaches into DevOps practices that are aligned with shared goals are successful only if the DevOps team members understand DBA methods and can see the value brought to the overall DevOps model.
Rules of Engagement
Guidelines are important for communicating and working effectively because differing collaborative terms pop up every few years with different names and different bullet points. They all have the same purpose: to respect each person and the value he or she offers.
As a United States Army veteran, a term such as rules of engagement resonates. Aligned with the DevOps principles, here is an easily understandable set of guardrails to keep us all communicating and operating efficiently:
• Goal alignment: Have a collaborative approach among team members who agree on common goals and incentives: strive to harvest excellent software products hosted on sustainable and stable infrastructures while continuously improving processes, automation, and cycle time.
• Deliverable co-responsibility: No single actor should be allowed to dominate or distort the principles, direction, or team accountability and actions, thus safeguarding DevOps guidelines and the Agile self-forming team concept.
• Speak to the outcomes: Require constant and consistent verbal communications for expeditious task coordination and execution, matched by effective and timely decisions to drive expected outcomes.
• Change adaptation: Accept business and customer fluidity as product requirement drivers while slaying traditional project management strategies better guarantees project success.
• Give the benefit of the doubt: Grant people grace, and trust that their intentions are good and intended for the team’s benefit. Embrace the possibility that you may be the person causing team tension and then stop doing so.
Continuous may be the most frequently heard word in DevOps conversations. Here’s why:
• flow: Work is always progressing and driven by automation, having value deliberately built in at a sustainable pace. Several Agile methods specifically limit the amount of work that can be in process at the same time.
Limiting work-in-progress grants focus and time to properly construct the product and product testing, which lead to better outcomes without over pressuring the staff.
• build: Build tests and code, preferably in that order. With QA shifting to the development stage, code with fewer defects can be created at a lower total cost of ownership.
• integration: Combine new or changed application code with the core product through extensive functional and performance testing, and correct defects immediately to produce the next product version.
• delivery: Ensure that the software product is positioned at all times for production release or deployment, encapsulating the building, testing, and integration processes. Successful integration produces the deliverable, making continuous delivery a product state, not a process.
• deployment: Where applicable, production deployments should occur as soon as the product is ready for release after integration (this is less likely for legacy environments).
• feedback: There should be persistent communications concerning the product quality, performance, and functionality intending to find, report, and fix bugs faster or to correct performance earlier in the pipeline. Commit to the “shift-left” concept.
• improvement: Apply lean principles to eliminate waste, build in value, reduce defects, and shorten cycle time to improve product quality. Team members should take time to reflect on completed projects or sprints to increase productivity by staking claim to value-adding tasks and shedding inefficiencies and unproductive steps.
Depending on the tools used, product themes abound. There are chefs with recipes, cookbooks, kitchens, and supermarkets; a butler, puppets, blowfish, broccoli, maven, ant, and many other strange yet fun product names. Check out XebiaLabs’ Periodic Table of DevOps Tools.
Automation and Orchestration
Automation focuses on executing tasks quickly. Building a script to run a set of database change commands is automation.
Orchestration focuses on process or workflow. Building a series of steps to execute tasks in a defined order to produce an outcome is orchestration. Spinning up a virtual database host combines automation (the set of commands for each task) and orchestration to run the tasks logically.
Languages vary among DBAs. For example, application DBAs talk code execution efficiency, logical DBAs (aka data architects) talk about normal forms, and operational DBAs talk about performance.
Yet DBAs manage to keep databases humming along—most of the time. Although there are differences in DBA roles and responsibilities, the end game is database stability, performance, availability, security, and recoverability (to name just a handful). DevOps team members must understand the DBAs’ database protectiveness and self-preservation tendencies.
After spending long nights and weekends recovering from code deployments that took months to build and test, it makes less sense on the surface to reduce the time spent building and testing the software.
DevOps team members are challenged to shine a light on the new paradigm and emphasize that the speed is offset by fewer code changes, which improves the odds of a successful deployment.
Also let the DBAs know that as a DevOps team, failures cause all team members—including developers—to be all hands on deck. Now that it is in everyone’s best interest to implement change correctly, DBAs are no longer the only people pursuing self-preservation.
Language and Culture: More than the Spoken Tongue and Traditions
The IT world is diverse on many levels, which is great! I have learned much from working with people in the United States, but also in South Korea, West Germany (I still make the distinction because I was serving in West Germany when the Berlin Wall fell), and for about a week in Brazil. I have also learned things from people in other states because diversity is needed.
As DBAs and DevOps team members come together, the differences add the value. Think about it; if everyone on the team knows the same things, all but a single person are redundant. People speaking different languages figure out how to communicate effectively, so DBAs and DevOps team members can do the same.
The difference is often perspective, which I have mentioned before: repetition reinforces ideas. DevOps is more a cultural shift for IT than a process shift. Sure, the tools and schedules are different, but those elements are easy to learn or adapt to; a culture shift requires time to digest the idea and bring everyone along.
Let’s take a look at the world of IT from different perspectives to begin to understand where DevOps is taking us all.
Resiliency versus Complexity
Resiliency describes the ability to sustain operations or to quickly restore operations when a failure occurs. For application systems with data stores, database clustering provides resiliency—the failure of one node does not reduce transactional throughput.
That happens when the cluster is built to withstand a single node failure, with the remaining nodes sized to maintain 100% capacity at mandated response times. A pool of web or application servers distributes the workload while improving resiliency because surviving nodes maintain operations when a node fails.
Resiliency can be scaled to meet financial considerations. Under the plan using the clustered database example, a single node loss could result in a 30% decrease in load capacity; mitigation must be preplanned to stop or deprioritize enough load to not impact critical operations. For example, batch processing or reporting can be suspended until the system is at full capacity.
DevOps provides an answer to the capacity problem if the database clustering can benefit from the host build template scripts. The loss of one node can be quickly offset by an automated build of a new node that can be introduced into the cluster. Furthermore, additional capacity can be activated when demand exceeds capacity.
Resiliency from clustering and other high-availability solutions does have a drawback: complexity. Be sure to not increase complexity to an unsustainable level when designing critical systems.
Overly complex systems with tricky interdependencies that create situations in which maintenance and upgrades are postponed defeats the purpose of resiliency. Being resilient requires keeping pace with database upgrades and security patching to increase stability and prevent breaches or data theft.
Rolling upgrades and patches signal resiliency by demonstrating the capability to maintain continuous operations while simultaneously improving the platform. Extending this capability to be able to completely replace the database environment with an upgraded or different database altogether, and with a fallback plan in place to return to the previous platform, exemplifies resiliency.
DevOps brings about the opportunity to maintain resiliency with less complexity because web, app, or database servers can be built in minutes or hours instead of the weeks or months it used to take to acquire and build servers. Virtualization is a major enabler of DevOps.
Simplifying architecture and application code runs counterintuitive to real–life IT solutions design, yet it is still a smart move for the long run. True solutions design not only leads to the best possible product but it also restrains from adding anything distracting to the product.
As DBAs and DevOps team members unite, they resolve to fight complexity with design eloquence and minimalist tendencies and prevent complexity from entangling DBA processes that may harm pipeline efficiency. Excitement builds as expectations for simple, precise, and demonstratively improved business systems are realized from this joining of forces.
Packaging and Propagation
Thoughtful and well-planned database software build packaging and propagation can be used to maintain resiliency, as described previously, but it can also be used for on-demand capacity, multisite distributive processing, and maintenance of pipeline database consistency.
Packaging versioned releases for upgrade simplification must include database owner and other account privileges needed for distribution. Database installs in which an OS hook must be executed by an administrator account need to be scripted to pull needed credentials during execution. The scripting must also ensure that password information does not get written to the installation or audit logs.
The shift goes from lengthy and tedious manual installs or lightly automated installs to a completely automated build that can be done fast enough that IT has the agility to immediately respond to demand, not after weeks of struggling to keep a system running in overload mode.
Structured and Unstructured
For decades, the relational database has been the database of choice, and large companies have invested millions in licensing and platforms. Without viable options, project data storage requirements landed in a relational database management system (RDBMS), regardless of the data structure or even the content.
More recently, many newer, viable database options are becoming mainstream, but it is still a hard sell to convince the upper echelon that additional investment is needed for another database ecosystem.
Even open-source databases come with staff support and hardware costs, or monthly DBaaS payments. Forcing data models into unsuitable databases deoptimizes solutions. From the start, performance is less than it could be than when a better–fitting database engine manages the data.
Maturing DevOps organizations lean toward optimized solutions, making force-feeding data into a database unthinkable. Relational databases remain “top dogs” as databases of record for transactional data. As applications shift toward multiple databases backends, services or APIs provide data call abstraction to maintain flow.
Unicorn companies start with very little cash flow, limiting the affordable scope of databases. Open-source databases enable individuals and small teams to build application software with a data store.
As these companies grew, the databases scaled to the point at which other companies took notice. When CIOs drive down IT costs, looking at alternative databases becomes a viable (and street-proven) option. DevOps leverages this learning, making it possible to store data in the database best suited for the content, pulling along cost-cutting options.
[Note: You can free download the complete Office 365 and Office 2019 com setup Guide for here]
Audit reviews are a necessity when build automation replaces human control. DBAs who install software pay attention to the screen messages, responding to configuration questions and noting errors that need attention. The risk is that the same person might do a second install that is not exactly like the first.
Vendors have included automation scripts for years, but platform differences still happen. DevOps automation is meant to build the complete platform without a person making decisions because the decisions are built into the automation or gathered before automation execution.
A developer requesting a new web server should need to provide only primitive inputs up front—OS, web server brand, and a few sizing parameters—before the automation kicks off.
There are legitimate reasons to pause automation, but asking for more information should not be one of them. As mentioned, automation is task-based, so stopping the orchestration is more likely. The automation and orchestration need to generate audit trails.
True to DevOps, audit log checkout should be automated because no DBA or DevOps team member wants to review pages and pages of audit information. Learning which error codes or other failures to search for tightens the noose around inconsistency. More importantly, governing bodies require documentation for application changes, which makes the audit log that evidence.
Repeatability of tests or builds improves the efficiency of code, and infrastructure as code, along with the full continuous delivery pipeline. Being able to build servers quickly allows developers to experiment with different code techniques or operations to build capacity on demand.
DBAs are used to being responsible for database builds, so it may take a little time for them to get used to the idea of developers building and destroying databases at will.
DBAs can instead create templates for the way databases are built, which seems like a better deal. Limiting the numbers of unique database software installs and database builds has advantages. The code should execute exactly the same within a version. Troubleshooting narrows from having fewer possible variables.
Once a problem is found, knowing where to apply the fix is easy. When testing a change, the way the database executes the change should be consistent on like architectures.
As much as possible, the nonproduction environment should mirror production, decreasing the chance of change failure caused by underlying configuration differences. Build repeatability is a win for developers, DBAs, and DevOps team members.
Nothing causes a puckering posture more than a potential data breach. On the scale of security threat mitigation, preventing data breaches sits at or near the top. Partnering with the information security team, DBAs play an inherent role in data protection. DBAs, as custodians of the corporate data assets, consider security a key deliverable.
Although database access comes in many forms, in all cases access should be granted only after authentication, and each access needs to meet audit requirements. Authentication can be granted by the database, application, or single sign-on protocol. Each authentication must be logged for auditing.
Each access, whether as a user request, job initiation, or integration interface, should be uniquely identifiable for auditing. How the auditing is performed is less important than the auditing being done.
The auditing may be controlled within the database by using a built-in feature or with application code that writes the audit information to a table or file. Importantly, DBAs should not be able to alter the data once the audit record is created, which protects the information from less-scrupulous DBAs.
Data encryption protects data at rest, including data stored in the database or stored as files. Many database products offer encryption, though it may be easier to use storage-based encryption, which covers the database and file data.
At a minimum, Social Security numbers (SSNs), credit card numbers, personal health information, and other sensitive data elements must be protected, which should already be done where compliance with governance requirements such as SOX, HIPAA, PCI-DSS, and more are enforced and audited.
Secure SSL protects data in transit, to and from the database to the application tier or end-user device. Preventing “on the wire” breaches is nearly impossible, but at least it should be challenging for the data to be interpreted.
Developers do consider security and at times write code to implement data protection or data hiding; for example, not allowing application users to see full SSNs (just the last four digits) when the user’s role does not require knowing the full SSN.
Developers may also code in calls to encryption functions or packages to obfuscate data elements. Storage encryption solutions are usually easier to manage and provide full data coverage, but not all organizations scale to the level at which the cost can be justified.
DevOps automation and orchestration should include security implementations. Configuring SSL and installing certificates should be automated.
Creating service accounts needed for application access to the database should be automated. Disabling FTP and Telnet on the host should be automated. Each of these automation pieces is collected for orchestration.
Computers continue to increase in processing power (more importantly, in transactional throughput), which allows more work to be done in less time. No matter how fast computers become, overhead work always reduces the optimal ceiling.
Work minimization improves optimization. Lean methodologies drive out unnecessary work to improve process times and reduce waste and cost. IT shops are learning from lean methodologies, DevOps being one representative model.
Execution plans define how the database engine decides to retrieve, sort, and present the requested information or how to modify the data per instruction.
Optimizers do a terrific job building execution plans, although misses still occur. If a query is performing poorly, the execution plan should be an early check in the troubleshooting process.
DBAs must interrogate the execution plan to determine appropriateness, which requires experience. Developers make great partners when checking execution plans; they are capable of interpreting the plan in light of what the code was built to do.
Code consistency matters for some database engine implementations. During the process of building execution plans, these databases interpret uppercase and lowercase letters as different, making a simple one-character difference appear to be a completely different statement. Keeping code consistent increases the reusability of plans already stored in the cache.
Using replaceable variables may also help optimize cached statement use. As DBAs integrate into DevOps teams, ensuring that solid code practices are in place to ease the database load is a step in the right direction.
“Hidden” predicates can make evaluating code and execution plans more challenging; just consider the possibility when the execution plan seems reasonable while performance lags. Security implementations may be the culprit, and one might expect the “secrets” to not be revealed.
An easy test to determine whether hidden predicates used by Oracle’s Virtual Private Database (VPD) are in play is simply to run the statement using an account with more authority.
Improved performance indicates the need to check for additional predicates. You may have to use a tool from a performance products vendor to find the predicates. Once discovered, improving performance may be as easy as elevating account privileges or executing with an account with more authority.
Sometimes reworking the code does not lead to enough performance improvement, making the privileges decision the fix. Also, if you know that something like VPD is implemented; jobs and reports suddenly take a dive in performance by two-, three-, or four-fold (or more);
And the database was not changed, check account security because it is not beyond the realm of possibility that a security job was run to correct perceived audit discrepancies.
Optimized code sheds unneeded work and data touches (the latter is critical to result in set size) and reporting and ETL processes in the context of the batch.
Selective predicates—the where clause statements—reduce execution effort and time while also lessening the burden on the database as a whole. DBAs understand, and developers and DevOps team members need to learn, that each segment of work contributes to the overall database load. Therefore, anything that can be done to reduce work at the statement level benefits all database transactions.
Leverage indexes for improved performance. Performance drags when large data scans are performed unnecessarily, making index selection critical. Whether an index was not considered as the code was built and implemented, or the statement was written so the optimizer decided that no existing index met the execution needs, performance suffers.
Today’s computing power and high-performing database engines contribute to response times in the low milliseconds for simple transactional reads and write, meaning that DBAs should seriously question response times that take a second or longer.
Kernel configuration undergirds databases and applications, ensuring resource availability. DBAs who lack kernel-tuning experiences are missing an opportunity to truly take full advantage of the underlying hardware and OS. DBaaS solutions being preconfigured leaves kernel configuration and tuning in the hands of the provider.
Otherwise, DBAs should work in tandem with SAs to monitor and tweak the kernel for better performance or go with a PaaS solution for more control over the database configuration, at the cost of increased maintenance overhead.
Network configuration is usually not a high-priority performance differentiator; it becomes a concern only when huge data sets have to be transferred over the network.
Even then, the primary focus is outside of the databases, requiring OS and/or network configuration tuning. Common modifications include increasing the maximum transmission unit (MTU) to pack more data into each network packet, or (when available) using “jumbo” packets that are dependent on platform options. Either way, the change needs to be done at both endpoints.
Data movement impacts performance based on volume. Remember that networks cannot get faster, only bigger. They are capable of moving more data at the same speed, but the amount of data that needs to be moved directly impacts the time needed for the move.
The larger data sets tend to be between the database and the application servers. Latency increases as the distance between point A and B increases, extending the time needed to move the data.
An easy test: place one application server in the data center that hosts the database and another application server in a location geographically distanced from the database location.
Test data pulls of increasing size until the data move duration becomes apparent. Then consider that impact spread across thousands of customers. Even if the distance is not a concern, it remains a wise decision to limit the data volume because client machines possess varying network–traffic processing capabilities.
Virtualization has improved server resource usage and facilitated data center consolidation from increased compute density per floor tile. DBAs need to ensure that the assigned virtual resources are “locked” so other guests can-not “steal” resources.
Resource reallocation generally helps to balance loads, and it produces excellent results in most cases. Databases are one exception because they do not play well with other kids in the same sandbox.
Just for fun, test the scenario in which a guest steals memory from the database guest. Nothing says “horrendous performance” faster than the database cache being swapped in and out of memory!
Being able to transfer information in your head to someone else’s head should be a required skill for all team members because one Agile precept (extended to DevOps) states that each team member should be able to perform all the team’s functions.
For DBAs, that implies that you are unlikely to be the only person creating automated database change scripts. Instead, you could be reviewing code and looking at audit files to improve automation execution.
Your DevOps teammates have the responsibility of making you a full-fledged team member. On the flipside, DBAs must teach DevOps team members how to manage database changes to support the development pipeline. This knowledge sharing is a great thing, especially if you want to ever take an uninterrupted vacation.
Teaching can and should be done formally and informally. Formal teaching requires planning, topic definition, and preparation to ensure that the information to be conveyed happens successfully.
Informal teaching can be done by sitting next to a team member (similar to extreme programming paired programming) and working through a database change or writing an automation script. Informal teaching includes talking with teammates while having lunch together or when gathered at the team’s favorite after-work watering hole.
Sharing knowledge within teams is step 1. Self-forming teams are a key Agile and DevOps element, but self-forming teams do not imply forever teams. As products and demands change, teams eventually disperse and reform differently, ready to complete new work. Team redistribution leads to knowledge distribution.
Something you taught to one team can now be shared within other teams, extending your impact and making teams more effective, while expanding the organizational knowledge base.
Personal development should not be replaced by team training; instead, personal development should inject fresh ideas and skills. Too often, people attend training specifically only for their primary technology skill:
Java programmers take an advanced Java class, or DBAs take a backup and recovery class for the database platform they support. My approach has always considered three perspectives that I believe fit the DevOps model of shared work.
Core technology or skill: deepening your core skill and intending to become an “expert.”
Aligned technology or skill: expanding your sphere of impact by adding complementary skills such as surrounding technologies. General or soft skills: communications, leadership, time management, and business understanding.
After the code has been implemented, the final step in the pre-DevOps model is usually Operations team members figuring out how to implement backups, monitoring, batch processing, reporting, and more.
DevOps makes it feasible to gather the operational information earlier in the process, which allows automation to handle much of the operationalization. For example:
• Backups: Backup software or agents can be installed and configured during the server build, including setting the schedule.
• Monitoring: Like backups, software or agents can be installed and configured, and registered to the administrative or master console, including baseline performance settings.
• Scheduling batch and report jobs: Load management pertains to distributing background work across the day interval to not impact transactional systems while completing batch and report work. Scheduling can be automated, even with load protections to delay execution for a prescribed time if the system load is high.
• Capacity management: Not the annual growth predictions, but real-time activity, monitoring provides opportunities to take proactive steps to add capacity on demand, or at least to plan for capacity to be added soon. Adding a fifth application server in real time to a four-node app server farm quickly provides 25% more capacity, provided you have the needed server build automation in place.
Once the server is built and ready for traffic, a quick update to the load balancer can be made to start directing traffic to the expansion server. Imagine being able to upgrade the entire farm by building replacement servers with higher transaction throughput and then making changes at the load balancer to insert the new servers and delete the old servers.
DevOps automation opens new windows to improve operational performance and resiliency with real-time capacity management. Capacity management needs to consider more than database growth; instead, it should encompass the full IT supply chain, up- and downstream.
Resiliency is the capability of a system to continue to function or recover quickly from failure. Being designed and baked into an application architecture that results in a high-availability infrastructure implementation able to tolerate single device failures affords strong continuous business capability.
A three-node server cluster built with enough horsepower that a single node failure can be absorbed by the two remaining servers without performance degradation demonstrates resiliency.
Failover is a methodology for moving a failed or significantly impaired production environment onto another similar system, usually located near or in close proximity to the primary system. One caution comes in the form of the statement “failover to DR,” which may not mean exactly what is stated.
Cost and complexity decisions weighed against business needs may lead to investments in like systems or smaller investments to provide a portion of the transactional capability of the primary systems as a stopgap until the main production platform can be operationally restored.
The transition involves redirecting all computers communicating to the failed system, which may simply be updating a few entries in a load balancer or be a complicated and tedious effort to manually point each interfacing system to the temporary production environment, having to repeat the same effort to fall back to normal operations.
Recovery dictates backup requirements. Many DBAs ask, “How should I back up an X TB database?” The question should actually be this: “How should I back up an X TB database when the business demands a 2-hour recovery window?” Understand that the recovery requirements drive the backup solution.
With DBaaS, the recovery requirement needs to be settled as a deliverable in the SLA. An in-house backup may be to disk or to a virtual tape solution (disk) that is capable of recovering an X TB database in under 2 hours.
Because the business wants the recovery to take no longer than 2 hours, the recovery must allow time to start the database and reconnect dependent systems before access is granted.
Server failures without the aforementioned resiliency models in place to maintain operations are more complicated to recover. The whole system may have been to be recovered from backup, or with DevOps automation, the host environment could be rebuilt new from a template package, followed by the database restore.
Disaster recovery is a program designed to protect the business from a catastrophic failure, most likely the destruction of a data center.
This form of recovery must be specifically planned and exercised, with predetermined executives authorized to declare the event and open the checkbook to cover the costs of people, vendors, and computing resources needed to recover automated business operations in a geographically distanced data center. DRaaS options are relatively new, albeit gaining respect and maturing quickly.
Business continuity is the business-side recovery process when disaster strikes, including an event requiring the disaster recovery program to be activated. Business continuity is more likely to be activated due to a natural disaster or civil unrest than the failure of the company’s data center. Planning and exercising options lead to success.
Knowing how to operate the business during the crisis—civil, natural disaster, or technology unavailability—covers a much broader scope than the disaster recovery program.
DevOps automation mentioned briefly here, brings a new and exciting option to the world of recovery. The capability to generate new virtual hosts or full application host environments on demand quickly presents the opportunity to improve recovery times. Database hosts can be rebuilt, but the database must be recovered.
More apparently, web and application servers built from predefined templates and install packages should be considered and tested for recovery-time comparison. Having the ability to build systems quickly as a recovery process frees traditional resources for other work.
If the current disaster recovery program includes replication between the primary and DR site, consider stopping the replication of web and app servers; for example, instead opt to build these servers on demand, potentially saving bandwidth costs. Ensure that the DR location has available resources for the automated build restores.
DevOps is an opportunity. Bringing together talented professionals to complete new missions by using new methods and tools facilitates business agility and growth while improving customer experience and developing IT team members.
Two obstacles—language and culture—can be easily overcome with frequent communication, the willingness to share experiences, and selfless knowledge sharing. The end game is to build great DevOps teams that are capable of delivering software and infrastructure better and faster than ever.
Adding DBAs to DevOps teams amps up team capabilities while making it possible to reduce risk by incorporating database builds, configurations, and changes into the Agile pipeline. This addition also removes a long sidelined process outlier to just another automation to be included in the orchestration.
The Roadmap to Transformation
Many challenges in your IT delivery process are caused by bottlenecks, which lengthen the time before you get meaningful feedback that you can, in turn, use to improve your process and products.
Making work visible is one of the most powerful ways to identify bottlenecks, yet IT is mostly dealing with invisible work: there is no visible stock showing how much product a team or facility has created, there is no warehouse that indicates how much product is available but not in use, and there is no physical process that you can follow to see how an output is being created from the inputs.
This leads to an interesting situation: while most people working in manufacturing have a rough idea of how their product is being created, in IT, the actual process is a lot less known. And I mean the real process, not the one that might be documented on some company web page or in some methodology.
Yet without that visibility, it is difficult to improve the process. So, one absolutely crucial task for any IT executive is to make the process visible, including status and measures like quality and speed. In this blog section, we leverage value stream maps to make work visible and jump-start the transformation with an initial roadmap and governance approach.
Making the IT Process Visible
I like to start any DevOps consulting activity with a value stream mapping exercise. The reason is quite simple: it is the most reliable exercise to align everyone in the organization and my team to what the IT process looks like.
You could look at the methodology page or some large Visio diagrams for the IT delivery process, but more often than not, reality has evolved away from those documented processes.
I have outlined the process to run such an exercise at the end of the blog section so that you can try this too. In short, you are bringing representatives from all parts of the organization together in a room to map out your current IT delivery process as it is being experienced by the people on the ground and, perhaps more importantly, reveals areas within the system that can be improved.
I suggest that you engage an experienced facilitator or at least find someone unbiased to run the meeting for you.
Ideally, we want to be able to objectively measure the IT process in regard to throughput, cycle time, and quality. Unfortunately, this is often a work-intensive exercise.
Running a value stream mapping exercise every three to six months (depending on how quickly you change and improve things) will give you a good way to keep progress on the radar while investing just a few hours each month.
It will highlight your current process, the cycle time, and any quality concerns. You want to make the result of the exercise visible somewhere in your office, as that will help focus people on improving this process. It will act as a visible reminder that improving this process is important to the organization.
Once you have a good understanding of the high-level IT process and the areas that require improvement, you can then create the first roadmap for the transformation.
Creating the First Roadmap
Roadmaps are partly science and partly art. Many roadmaps look similar at the high level, yet on the more detailed level, no two people create the exact same roadmap.
The good news is that there is no one right answer for roadmaps anyway. In true Agile fashion, it is most important to understand the direction and to have some milestones for evaluating progress and making IT visible.
Many things will change over time, and you will need to manage this. There are a few guidelines on how to create a good roadmap for this transformation.
Based on the value stream map of your IT delivery process, you will be able to identify bottlenecks in the process. As systems thinking, the theory of constraints and queuing theory teach us, unless we improve one of the bottlenecks in the process, every other improvement will not lead to a faster outcome overall.
This is important, as sometimes we spend our change energy on “shiny objects” rather than focusing on things that will make a real difference. One good way to identify bottlenecks is to use the value stream mapping exercise and let all stakeholders in the room vote on the problems that, if addressed, will make a real difference to overall IT delivery. The wisdom of the crowd in most cases does identify a set of bottlenecks that are worth addressing.
There are two other considerations for your roadmap to be a success: flow and speed of delivery rather than cost and quality. A focus on flow is the ultimate systems thinking device to break down silos in your organization.
In the past, the “owner” of a function, like the testing center of excellence or the development factory, ran improvement initiatives to make its area of influence and control more effective.
Over time, this created highly optimized functions for larger batch sizes to the detriment of the overall flow of delivery. Flow improves with small batch sizes.
There are usually three ways to evaluate IT delivery: speed, cost, and quality. Traditionally, we focused our improvements on cost or quality, which, in turn, often reduced the speed of delivery.
If you evaluate your IT delivery by just looking to improve quality, you often introduce additional quality gates, which cost you more and take longer to adhere to.
If you evaluate your IT function based on reduced cost, the most common approaches are to push more work to less experienced people or to skip steps in the process, which often leads to lower quality and lower speed due to rework. Focusing on cost or quality without considering the impact on flow is, therefore, an antipattern for successful IT in my experience.
In contrast, focusing on speed, specifically on bottlenecks that prevent fast delivery of really small batches, will bring the focus back to the overall flow of delivery and hence improve speed of delivery in general (even for larger batches), leading to improvements in quality and cost over time.
It is impossible to achieve higher speed if the quality is bad, as the required rework will ultimately slow you down. The only way to really improve speed is to automate and remove unnecessary steps in the process.
Just typing faster is unlikely to do much for the overall speed. So, speed is the ultimate forcing function for IT. I have been in transformations with clients where cost was reduced but the overall delivery experience continued to be bad for business stakeholders.
I have also seen a lot of quality improvement initiatives that stifled IT delivery and nearly ground it to a halt. I have yet to see the same problem with improvement initiatives that evaluate based on speed.
Two words of caution when it comes to speed: The first one is really not that bad of a problem. You can obviously “game” the speed evaluation criteria by breaking work down further and delivering smaller batches, which can be delivered faster.
While this does not result in a like-for-like comparison of the speed between batches, it is still a win for the organization, as smaller batches are less risky. The second warning is that people might look for shortcuts that increase risk or reduce quality.
To prevent this, you need to continue to look for quality measures on top of speed to make sure that quality is not dropping as speed increases. To evaluate for speed, you will look at work coming through your delivery lifecycle, and the process of measuring it will make it more visible to you.
Good measures for speed are cycle time for work items (cycle time = time from work item approved to work item completed and available in production) or volume of work delivered per time period.
Your overall transformation roadmap will likely have milestones focused on different functions and capabilities (e.g., automated regression testing available, lightweight business case introduced), which makes sense. However, there is another dimension, which is the coverage of applications and technologies.
In the next blog section, I will explain how to do an application portfolio analysis that allows you to identify sets of applications that you uplift as part of the transformation.
Your roadmap should include prioritized sets (often called waves) of applications, as any organization at scale will not be able to uplift absolutely everything. You shouldn’t anyway,
One last comment on the transformation roadmap: many capabilities and changes require a significant amount of time to implement. Unfortunately, organizations are not very patient with change programs, so you need to make sure that you build in some early and visible wins. For those early and visible wins, all other rules do not apply.
They can be with applications that are not critical for the business or in areas that are not part of a bottleneck. The goal of those wins is to keep the momentum and allow the organization to see progress.
You should see these as being part of the change-management activities of the transformation. Of course, ideally, the early and visible wins are also in one of the priority areas identified earlier.
Transforming your IT organization will take time.
As you adopt DevOps and become faster, you start to realize the organizational boundaries and speed bumps that are embedded in the operating model, which require some real organizational muscle and executive support to address. Don’t be discouraged if things don’t change overnight.
Governing the Transformation
As mentioned earlier, the roadmap is important, but without appropriate transformation governance, it is not going to get you much success. All too often, transformations get stuck.
It is not possible to foresee all the challenges that will hinder progress, and without appropriate governance that finds the right balance between discipline and flexibility, the transformation will stall. Transformation governance makes the progress of the transformation visible and allows you to steer it.
It’s different from the normal IT delivery governance that you run for your delivery initiatives (e.g., change review boards). In a meeting with a number of transformation change agents and consultants at the 2015 DevOps Enterprise Summit, we tried to identify what it takes to be successful when adopting DevOps.
We all had different ideas and were working in different organizations, but we could agree on one thing that we believed was the characteristic of a successful organization: the ability to continuously improve and manage the continuous improvement process.
This continuous improvement and the adaption of the roadmap are the largest contributors to success in transforming your IT organization. DevOps and Agile are not goals; hence, there is no target state as such.
What does successful transformation governance look like? Governance covers a lot of areas, so it is important that you know what you are comparing against as you make progress with your transformation. This means you need to establish a baseline for the measures of success that you decide on before you start the transformation.
Too many transformations I have seen spent six months improving the situation but then could not provide evidence of what had changed beyond anecdotes such as “but we have continuous integration with Jenkins now.”
Unfortunately, this does not necessarily convince a business or other IT stakeholders to continue to invest in the transformation. In one case, even though the CIO was supportive, the transformation lost funding due to a lack of evidence of the improvements.
If you can, however, prove that by introducing continuous integration you were able to reduce the instances of build-related environment outages by 30%, now you have a great story to tell.
As a result, I strongly recommend running a baselining exercise in the beginning of the transformation. Think about all the things you care about and want to measure along the way, and identify the right way to baseline them.
The other important aspect of transformation governance is creating flexibility and accountability. For each improvement initiative, as part of the roadmap, you want to leverage the scientific method:
Formulate a hypothesis, including a measure of success.
Baseline the measure.
Once the implementation is complete, evaluate the result against the hypothesis.
Some things will work, some won’t; and during governance, you want to learn from both. Don’t blame the project team for a failed hypothesis (after all, it should have been we, as leaders, who originally approved the investment—so, who is really to blame?).
You should only provide negative feedback where the process has not been followed (e.g., measures were not in place or results were “massaged”), which prevents you from learning.
As you learn, the next set of viable improvement initiatives will change. Your evaluation criteria of the initiatives you want to start next should be guided by: previous learnings, the size of the initiative following a weighted shortest job first (WSJF) approach, and how well the team can explain the justification for the initiative.
Don’t allow yourself to be tempted by large business cases that require a lot of up-front investments; rather, ask for smaller initial steps to validate the idea before investing heavily.
You should keep an eye on the overall roadmap over time to see that the milestones are achievable. If they are not anymore, you can either change the number of improvement initiatives or, when unavoidable, update the roadmap.
In the transformation governance process, you want a representation of all parts of the organization to make sure the change is not biased to a specific function (e.g., test, development, operations). Governance meetings should be at least once a month and should require as little documentation as possible.
Having the transformation team spend a lot of time on elaborate PowerPoint presentations for each meeting is not going to help your transformation. Ideally, you will look at real-time data, your value stream map, and lightweight business cases for the improvement ideas.
Making IT Delivery Visible
Talking about making things visible and using real data, it should be clear that some of the DevOps capabilities can be extremely useful for this. One of the best visual aids in your toolkit is the deployment pipeline.
A deployment pipeline is a visual representation of the process that software follows from the developer to production, with all the stages in between. This visual representation shows what is happening to the software as well as any positive or negative results of it.
This deployment pipeline provides direct insights into the quality of your software in real time. You might choose to provide additional information in a dashboard as an aggregate or to enrich the core data with additional information, but the deployment pipeline provides the core backbone.
It also creates a forcing function, as all the steps are represented and enforced, and the results can be seen directly from the dashboard, which reduces the chance of people doing things that are not visible.
Any improvements and process changes will be visible in the deployment pipeline as long as it remains the only allowed way to deliver changes. Where you don’t have easy access to metrics you can also add steps to each stage to log out metrics for later consumption in your analytics solution.
Having an analytics solution in your company to create real-time dashboards is important. Most companies these days either use a commercial visualization or analytics solution or build something based on the many open source options (like Graphite).
The key here is to use the data that is being created all through the SDLC to create meaningful dashboards that can then be leveraged not only during the transformation governance but at any other point in time.
High-performing teams have connected their DevOps toolchain with analytics dashboards and it allows us to see important information in real time. For example, we can see how good the quality of a release is, how the quality of the release package relates to post-deployment issues, and how much test automation has improved our defect rate in later phases of the SDLC.
Governing IT Delivery
IT governance is, in my view, one of the undervalued elements in the transformation journey. Truth be told, most governance approaches are pretty poor and achieve very little of the outcome they are intended to achieve.
Most governance meetings I have observed or been part of are based on red/amber/green status reports, which are subjective in nature and are not a good way of representing status.
Furthermore, while the criteria for the color scheme might be defined somewhere, it often comes down to the leadership looking the project manager in the eyes and asking what she really thinks.
Project managers from a Project Management Institute (PMI) background use cost performance indicator (CPI) and schedule performance indicator (SPI), which are slightly better but rely on having a detailed and appropriate project plan to report against. I argue that most projects evolve over time, which means that if you’re preparing a precise plan for the whole project, you plan to be precisely wrong.
Additionally, by the time the status report is presented at the meeting, it is—at best—a few hours old. At worst, it’s an unconscious misrepresentation, because so many different messages needed to be aggregated and the project manager had to work with poor inputs.
Too often, a status report remains green over many weeks just to turn red all of a sudden when the bad news cannot be avoided anymore.
Or the status that moves up the chain of command becomes more and more green the higher you get because everyone wants to demonstrate that he is in control of the situation.
Remember, one of our goals with status reports is to make IT work visible, and we’re not doing that in a meaningful way if the information we’re presenting isn’t a factual representation of our processes and progress.
The Lean Treatment of IT Delivery
In transformations, our focus is often on technologies and technical practices, yet a lot can be improved by applying Lean to IT delivery governance. By IT delivery governance, I mean any step of the overall IT delivery process where someone has to approve something before it can proceed.
This can be project-funding checkpoints, deployment approvals for test environments, change control boards, and so on. During the SDLC there are usually many such governance steps for approvals or reviews, which all consume time and effort.
And governance processes often grow over time. After a problem has occurred, we do a post-implementation review and add another governance step to prevent the same problem from happening again.
After all, it can’t hurt to be extra sure by checking twice. Over time, this creates a bloated governance process with steps that do not add value and diffuse accountability.
I have seen deployment approval processes that required significantly more time than the actual deployment without adding value or improving quality. I find that some approval steps are purely administrative and have, over time, evolved to lose their meaning as the information is not really evaluated as it was intended. The following analysis will help you unbloat the process.
I want you to take a good, hard look at each step in your governance process to understand (a) how often a step actually makes an impact (e.g., an approval is rejected), (b) what the risk is of not doing it, and (c) what the cost is of performing this step.
Let’s look at each of the three aspects in more detail:
1. When you look at approvals and review steps during the SDLC, how often are approvals not given or how often did reviews find issues that had to be addressed?
(And I mean serious issues, not just rejections due to formalities such as using the wrong format of the review form.) The less often the process actually yields meaningful outcomes, the more likely it is that the process is not adding a lot of value. The same is true if approvals are in the high ninetieth percentile.
Perhaps a notification is sufficient rather than waiting for the approval, which is extremely likely to come anyway. Or perhaps you can cut this step completely.
I worked with one client whose deployment team had to chase approvals for pretty much every deployment after all the preparation steps were complete, adding hours or sometimes days to the deployment lead time.
The approver was not actually doing a meaningful review, which we could see from the little time it took to approve once the team followed up with the approver directly. It was clearly just a rubber-stamping exercise.
I recommended removing this approval and changing the process to send information to the approver before and after the deployment, including the test results. Lead time was significantly reduced, the approver had less work, and because a manual step was removed, we could automate the deployment process end to end.
2. If we went ahead without the approval or review step and something went wrong, how large is the risk? How long would it take us to find out we have a problem and correct it by either fixing it or withdrawing the change? If the risk is low, then, again, the governance step might best be skipped or changed to a notification only.
3. What is the actual cost of the government step in both effort and time? How long does it take to create the documentation for this step? How much time does each stakeholder involve spend on it? How much of the cycle time is being consumed while waiting for approvals to proceed?
With this information, you can calculate whether or not the governance step should continue to be used or whether you are better off abandoning or changing it. From my experience, about half the review and approval steps can either be automated (as the human stakeholder is following simple rules) or changed to a notification only, which does not prevent the process from progressing.
I challenge you to try this in your organization and see how many things you can remove or automate, getting as close as possible to the minimum viable governance process. I have added an exercise for this at the end of the blog section.
First Steps for Your Organization
There are three exercises that I find immensely powerful because they achieve a significant amount of benefit for very little cost: (1) value stream mapping of your IT delivery process, (2) baselining your metrics, and (3) reviewing your IT governance. With very little effort, you can get a much better insight into your IT process and start making improvements.
Value Stream Mapping of Your IT Delivery Process
While there is a formal process for how to do value stream mapping, I will provide you with a smaller-scale version that, in my experience, works reasonably well for the purpose that we are after making the process visible and improving some of the bottlenecks.
Here is my shortcut version of value stream mapping:
1. Get stakeholders from all key parts of the IT delivery supply chain into a room (e.g., business stakeholders, development, testing, project management office (PMO), operations, business analysis).
2. Prepare a whiteboard with a high-level process for delivery. Perhaps write “business idea,” “business case,” “project kickoff,” “development,” “testing/QA,” “deployment/release,” and “value creation” on the board to provide some guidance.
3. Ask everyone in the room to write steps of the IT process on index cards for fifteen minutes. Next, ask them to post these cards on the whiteboard and work as a group to represent a complete picture of the IT delivery process on the whiteboard. Warning: you might have to encourage people to stand up and work together, or you may need to step in when/if discussions get out of hand.
4. Once the process is mapped, ask one or more people to walk the group through the overall process, and ask everyone to call out if anything is missing.
5. Now that you have a reasonable representation of the process, you can do some deep dives to understand cycle times of the process, hot spots of concerns for stakeholders due to quality or other aspects, and tooling that supports the process.
6. Get people to vote on the most important bottleneck (e.g., give each person three votes to put on the board by putting a dot next to the process step).
In my experience, this exercise is the best way to make your IT delivery process visible. You can redo this process every three to six months to evaluate whether you addressed the key bottleneck and to see how the process has evolved.
You can make the outcome of this process visible somewhere in your office to show the improvement priorities for each person/team involved. The highlighted bottlenecks will provide you with the checkpoints for your initial roadmap, as those are the things that your initiatives should address.
Accepting the Multispeed Reality (for Now)
Clients I work with often have a thousand or more applications in their portfolio. Clearly, we cannot make changes to all of them at the same time. This blog section looks at how to navigate the desire for innovative new systems and the existing web of legacy applications. We will identify minimum viable clusters of applications to start your transformation and perform an application portfolio analysis to support this.
One of the trends in the industry that has caused the increase in interest in Agile and DevOps practices was the arrival of internet natives, as I mentioned in the introduction. Those companies have the advantage that their applications are newer than most applications are in a large-enterprise context.
“Legacy” is often used as a derogatory term in the industry, but the reality is that any code in production is really legacy already. And any new code we are writing today will be legacy tomorrow. Trying to differentiate between legacy and nonlegacy is a nearly impossible task over time.
In the past, organizations tried to deal with legacy through transformation projects that took many years and tried to replace older legacy systems with new systems. Yet very often, many old systems survived for one reason or another, and the overall application architecture became more complicated.
These big-bang transformations are not the way things are done anymore, as the speed of evolution requires organizations to be adaptable while they are changing their IT architecture.
I think we all can agree that what we want is really fast, flexible, and reliable IT delivery. So, should we throw away our “legacy” applications and build a new set of “fast applications”? I think the reality is more nuanced. I have worked with dozens of organizations that are struggling with the tension between fast digital applications and slow enterprise applications.
Some of these organizations just came off a large transformation that was trying to solve this problem, but at the end of the multiyear transformation, the new applications were already slow-legacy again. A new approach is required that is more practical and more maintainable, and still achieves the outcome.
Analyzing Your Application Portfolio
Large organizations often have hundreds if not thousands of applications, so it would be unrealistic to assume that we can uplift all applications at the same time. Some applications probably don’t need to be uplifted, as they don’t change often or are not of strategic importance. In the exercise section of this blog section, I provide details so that you can run your own analysis.
With this analysis, we can do a couple of things: we can prioritize applications into clusters (I will talk about that a little bit more later) and gather the applications into three different groupings that will determine how we will deal with each application as we are transforming IT delivery. The groupings will determine how you will invest and how you will work with the software vendors and your delivery partners.
The first group is for applications that we want to divest from or keep steady at a low volume of change. Let’s call this true legacy to differentiate it from the word “legacy,” which is often used just for older systems.
In the true legacy category, you will sort applications that are hardly ever changing, that are not supporting business-critical processes, and in which you are not investing.
I think it is pretty obvious that you don’t want to spend much money automating the delivery life cycle for these applications.
For these applications, you will likely not spend much time with the software vendor of the application, and you will choose a low-cost delivery partner that “keeps the lights on” if you don’t want to deal with them in-house. And you really shouldn’t invest your IT skills in these applications.
The second group is for applications that are supporting your business but are a little bit removed from your customers. Think of ERP or HCM systems—these are the “workhorses” for your applications. You spend a bulk of your money on running and updating these systems, and they are likely the ones that determine your overall speed of delivery for larger projects.
Improving workhorses will allow you to deliver projects faster and more reliably, but the technologies of many of these workhorses are not as easily adaptable to DevOps and Agile practices.
It is crucial to these systems that you work closely with the software vendor to make the technology more DevOps suitable. If you choose to get help maintaining and evolving these systems, make sure the partner you work with understands your need to evolve the way of working as well as the system itself.
The third group is your “innovation engines” applications. These are the customer-facing applications that you can use to drive innovation or, on the flip side, that can cause you a lot of grief if customers don’t like what you are presenting to them. The challenge here is that most of these will rely on the workhorses to deliver the right experience.
My favorite example is the banking mobile app, which you can experiment with but only in so far as it continues to show accurate information about your bank accounts; otherwise, you will get very upset as a customer.
Here, you will likely use custom technologies. You should work very closely with your software vendor if you chose a commercial-off-the-shelf (COTS) product, and the delivery partner should be a co-creator, not just a delivery partner.
Now, this grouping of applications is not static. As your application architecture evolves, certain applications will move between groups; that means your vendor and delivery-partner strategy evolves with it.
Active application portfolio management is becoming increasingly more important as the speed of evolution increases and application architectures become more modular.
Finding a Minimum Viable Cluster
The Agile principle of small batch sizes applies to transformations as well. We can use the information from the application portfolio analysis above to guide us.
It is very likely that the categories of workhorses and innovation engines contain too many applications to work on at the same time. Rather than just picking the first x applications, you need to do a bit more analysis to find what I call a minimum viable cluster.
Applications don’t exist in isolation from each other. This means that most functional changes to your application landscape will require you to update more than one application.
This, in turn, means that even if you are able to speed up one application, you might not be able to actually speed up delivery, as you will continue to wait on the other applications to deliver their changes.
The analogy of the weakest link comes to mind; in this case, it is the slowest link that determines your overall delivery speed. What you need to determine is the minimum viable cluster of applications. The best way of doing this is to rank your application based on several factors, such as customer centricity and volume of change.
The idea of the minimum viable cluster is that you incrementally review your highest-priority application and analyze the dependencies of that application. You look for a small subset of those applications in which you can see a significant improvement of delivery speed when you improve the delivery speed of this subset.
(Sometimes you might still have to deal with further dependencies, but in most cases, the subset should allow you to make significant changes independently with a little bit of creativity.)
You can continue the analysis for further clusters so that you have some visibility of the next applications you will start to address. Don’t spend too much time clustering all applications. As you make progress, you can do rolling-wave identification of the clusters.
I want to mention a few other considerations when thinking about the prioritization of applications. First, I think it is important that you start to work on meaningful applications as early as possible. Many organizations experiment with new automation techniques on isolated applications with no serious business impact.
Many techniques that work for those applications might not scale to the rest of the IT landscape, and the rest of the organization might not identify with the change for that application. (“This does not work for our real systems” is a comment you might hear in this context.)
Because the uplift of your minimum viable cluster can take a while, it might make sense to find “easier” pilots to (a) provide some early wins and (b) allow you to learn techniques that are more advanced before you need to adapt them for your first minimum viable cluster.
The key to this is making sure that considerations from the minimum viable cluster are being proven with the simpler application so that the relevance is clear to the organization. Collaboration across the different application stakeholders is critical to achieving this.
How to Deal with True Legacy
We have spoken about the strategy that you should employ for the applications that continue to be part of your portfolio, but what should you do with the true legacy applications?
Obviously, the best thing to do would be to get rid of them completely. Ask yourself whether the functionality is still truly required. Too often, we hang on to systems for small pieces of functionality that cannot be replicated somewhere else, because the hidden cost of maintaining the application is not visible; not enough effort is being put into decommissioning the system.
Assuming this is not an option, we should use for architecture what software engineers have been using in their code for a long time, the strangler pattern. The strangler pattern, in this case, means we try to erode the legacy application by moving functions to our newer applications bit by bit.
Over time, less and less functionality will remain in the legacy application until my earlier point comes true: the cost of maintaining the application just for the leftover functionality will become too high, and this will serve as the forcing function to finally decommission it.
The last trick in your “dealing with legacy” box is to make the real cost of the legacy application visible. The factors that should play into this cost are as follows:
the delay other applications are encountering due to the legacy application, the defects caused by the legacy application,
the amount of money spent maintaining and running the legacy application, and
the opportunity cost of things you cannot do because of the legacy application is in place.
The more you are able to put a monetary number on this, the better your chances are to overcome the legacy complication over time by convincing the organization to do something about it.
I said before that every application you build now will be the legacy of tomorrow. At the increasing speed of IT development, this statement should make us nervous, as we are creating more and more legacy ever faster. This means that, ultimately, the best way to deal with legacy is to build our new legacy with the right mindset.
There is no end-state architecture anymore (well, there never was, as we now know—in spite of what enterprise architects kept telling us). As a result of this new architecture mindset, each application should be built so that it can easily be decommissioned and to minimize its dependency on other applications.
Governing the Portfolio and Checkpoints
Your application portfolio is always evolving, and the only way to be successful in such a moving environment is to have the right governance in place. Governance was hard in the past; in the new world, it has become even more difficult.
There are more things to govern, the overall speed of the delivery of changes has increased, and without a change in governance, governance will either slow down delivery or become overly expensive.
There are four main points of governance for any change:
Checkpoint 1 (CP1): this answers the question of whether or not the idea we have for the change is good enough to deserve some funding to explore the idea further and come up with possible solutions.
Checkpoint 2 (CP2): this answers the question of whether we have found a possible solution that is good enough to attempt as a first experiment or first release to validate our idea.
Checkpoint 3 (CP3): this answers the question of whether or not the implemented solution has reached the right quality to be released to at least a small sub audience in production.
Checkpoint 4 (CP4): this answers the question of whether or not the Checkpoint 1 (CP1)
At CP1, we are mostly talking about our business stakeholders. Somewhere in the organization, a good idea has come up or a problem has been found that requires fixing.
Before we start spending money, our first checkpoint is to validate that we are exploring the right problems and opportunities that have a business impact, are of strategic importance or are our “exploratory ideas” to find new areas of business.
This checkpoint is a gatekeeper to make sure we are not starting too many new things at the same time and to focus our energy on the most promising ideas.
Between CP1 and CP2, the organization explores the idea, and both business and IT come together to run a discovery workshop that can take a couple of hours or multiple weeks depending on the scale of the problem. You can run this for a whole business transformation or for a small change.
The goal of discovery really falls into three important areas: (1) everyone understands the problem and idea, (2) we explore what can be done with the support of IT, and (3) we explore what the implementation could look like in regard to schedule and teams. This discovery session is crucial to enable your people to achieve the best outcome.
Checkpoint 2 (CP2)
After discovery, the next checkpoint is validation that we now have discovered something that is worth implementing. At this stage, we should check that we have the capacity to support the implementation with all parties: IT, business stakeholders, the operations team, security, and anyone else impacted.
This is a crucial checkpoint at which to embed architectural requirements, as it becomes more difficult to add them later on. Too often, business initiatives are implemented without due consideration of architectural aspects, which leads to the increased technical debt over time.
It is my view that every initiative that is being supported by the organization with scarce resources such as money and people should leave the organization in a better place in two ways: it better supports the business, and it leaves the IT landscape better than it was before. This is the only reasonable way to reduce technical debt over time and deal with legacy.
CP2 is the perfect time to make sure that the improvement of the IT landscape/down payment of technical debt is part of the project before it continues on to implementation. This has to be something that is not optional; otherwise, the slippery slope will lead back to the original state.
It is quite easy to let the necessary rigor be lost when “just this once” we only need to quickly put this one temporary solution in place. I learned over the years that there is nothing more permanent than a temporary solution.
Between CP2 and CP3 is the bulk of the usual software delivery that includes design, development, and testing work being done in an Agile fashion.
I am confident that Agile is the only methodology we will need going forward but that we will have different levels of rigor and speed as part of our day-to-day Agile delivery. Once the solution has matured over several iterations to being a release candidate, we will have CP3.
Checkpoint 3 (CP3)
At CP3, we will confirm that the release candidate has reached the right quality for us to release it to production. We will validate that the architecture considerations have been adhered to and technical debt has been paid down as agreed, and we will not introduce new technical debt unknowingly.
(Sometimes we might consciously choose to accrue a little more debt to test something early but commit to fixing it in the next release. This should be a rare occasion, though.)
This checkpoint is often associated with the change control board, which has to review and approve any changes to production. Of course, we are looking for the minimum viable governance here, and you can refer to the previous blog section for more details on general governance principles to follow at CP3.
Between CP3 and CP4 the product is in production and is being used. If we follow a proper Agile process, the team will already be working on the implementation of the next release in tandem with supporting the version that has just gone live.
Internal or external stakeholders are using the product, and we gather feedback directly from the systems (through monitoring, analytics, and other means) or directly from the stakeholders by leveraging surveys, feedback forms, or any other communication channel.
Checkpoint 4 (CP4)
Checkpoint 4 is the checkpoint that is extremely underutilized in my experience. It’s one of those processes that everyone agrees is important, yet very few have the rigor and discipline to really leverage it to meet its full potential.
This checkpoint serves to validate that our idea and the solution approach are valid. Because projects are temporary by definition, the project team has often stood down already and team members have been allocated to other projects.
CP4 then becomes a pro forma exercise that people don’t appreciate fully. If we have persistent, long-lasting product teams, the idea of learning from the previous release and understanding the reaction of stakeholders is a lot more important.
Those product teams are the real audience of CP4, though, of course, the organizational stakeholders are the other audience that needs to understand whether the money was well invested and whether further investment should be made.
CP4 should be an event for learning and a possibility for celebrating success; it should never be a negative experience. If the idea did not work out, we learned something useful about the product that we have to do differently next time.
You can combine CP4 with a post-implementation review to look at the way the release was delivered and to improve the process as well as the product. It is my personal preference to run the post-implementation review separately to keep improving the product and the delivery process as two distinct activities.
With this governance model and the four checkpoints in place, you can manage delivery in several speeds and deal with the faster pace. Each checkpoint allows you to assess progress and viability of the initiative, and where required, you can move an initiative into a different delivery model with a different (slower or faster) speed.
First Steps for Your Organization
I will provide two exercises for you to run in your organization. This time, both of them are highly related: the first is an analysis of your application portfolio and the second is the identification of a minimum viable cluster of application for which a capability uplift will provide real value.
If you are like most of my clients, you will have hundreds or thousands of applications in your IT portfolio. If you spread your change energy across all of those, you will likely see very little progress, and you might ask yourself whether the money is actually spent well for some of those applications.
So, while we spoke about the IT delivery process in the blog section 1 exercises as one dimension, the application dimension is the second dimension that is important. Let’s look at how to categorize your application in a meaningful way.
Each organization will have different information available about its applications, but in general, an analysis across the following four dimensions can be done:
Criticality of application: How important is the application for running our business? How impactful would an issue be on the user experience for our customers or employees? How much does this application contribute to regulatory compliance?
Level of investment in the application: How much money will we spend in this application over the next 12–36 months? How much have we spent on this application in the past? How many priority projects will this application be involved in over the next few years?
The preferred frequency of change: If the business could choose a frequency of change for this application, how often would that be (hourly, weekly, monthly, annually)? How often have we deployed a change to this application in the last 12 months?
Technology stack: The technology stack is important, as some technologies are easier to uplift than others. Additionally, once you have a capability to deliver, for example, Siebel-based applications more quickly, any other Siebel-based application will be much easier to uplift too, as tools, practices, and methods can be reused. Consider all aspects of the application in this technology stack: database, data itself, program code, application servers, and middleware.
For each of the first three dimensions, you can either use absolute values (if you have them) or relative numbers representing a nominal scale to rank applications. For the technology stack, you can group them into priority order based on your technical experience with DevOps practices in those technologies.
On the basis of this information, you can create a ranking of importance by either formally creating a heuristic across the dimensions or by doing a manual sorting. It is not important for this to be precise; we are aiming only for accuracy here.
It’s clear that we wouldn’t spend much time, energy, and money on applications that are infrequently changed—applications that are not critical for our business and on which we don’t intend to spend much money in the future.
Unfortunately, just creating a ranking of applications is usually not sufficient, as the IT landscape of organizations is very complex and requires an additional level of analysis to resolve dependencies in the application architecture.
Identifying a Minimum Viable Cluster
As discussed above, the minimum viable cluster is the subset of applications that you should focus on, as an uplift to these will speed up the delivery of the whole cluster. Follow the steps below to identify a minimum viable cluster:
1. Pick one of the highest-priority applications (ideally based on the portfolio analysis from the previous exercise) as your initial application set (consisting of just one application).
2. Understand which other applications need to be changed in order to make a change to the chosen application set.
3. Determine a reasonable cutoff for those applications (e.g., only those covering 80% of the usual or planned changes of the chosen application).
4. You now have a new, larger set of applications and can continue with steps 2 and 3 until the application set stabilizes to a minimum viable cluster.
5. If the cluster has become too large, pick a different starting application or be more aggressive in step 3.
Once you have successfully identified your minimum viable cluster, you are ready to begin the uplift process by implementing DevOps practices such as test automation and the adoption of cloud-based environments, or by moving to an Agile team delivering changes for this cluster.
Dealing with Software Packages and Software Vendors
In many organizations, the language that describes the software package in use —usually a COTS product that has been chosen for its features—is less than favorable.
Clearly this software, often considered legacy, did not magically appear in the organization; someone made the decision to purchase a software package as a response to a business problem. There are good reasons to not reinvent the wheel but leverage packaged software instead.
Unfortunately, the state of many software packages these days is such that they don’t behave like modern applications should. In this blog section, I will discuss the criteria that you should consider when choosing a software package and how you can work to improve the software package you already have, and I will provide some exercises at the end to adapt these guidelines for yourself. Let’s talk first about the idea behind software packages.
The original purpose of software packages was to support commodity processes in your organization. These processes are very similar to those of other organizations and do not differentiate you from your competitors.
And even though many of these software packages are now delivered as software as a service (SaaS), your organization has legacy package solutions that you have to maintain.
The problem is that many organizations that adopted software packages ended up customizing the product so much that the upgrade path has become expensive.
For example, I have seen multiple Siebel upgrades that cost many millions of dollars. When the upgrade path is expensive, it means that newer, better, safer functionality that comes with newer versions of the package is often not available to the organization for years.
Besides this downside, heavy customization over time to make the software support all the requirements from business also means that each further change becomes more expensive and that technical debt increases over time.
How to Choose the Right Product for Your Organization
Warlike arguments have been fought over which IT product to choose for a project or business function. Should you use SalesForce, Siebel, or Microsoft as your CRM system, for example?
Just looking at the functionality is not sufficient anymore, because as much as it should be our preference, it is very unlikely that an organization will use the product as is.
The application architecture the product will be part of will continue to evolve, which often requires changes to the software product.
Architecture and engineering principles play a much larger role than in the past due to the continuous evolution of the architecture. This puts a very different view on product choice.
Of course, the choice is always contextual for each company and each area of business. Your decision framework should guide the decision to make sure all factors are considered.
And while the decision might be different for each of you, what I can do is provide a technology decision framework (TDF) that helps you think more broadly about technology choices before you make them.
My TDF is based on three dimensions for you to evaluate: (1) functionality, (2) architecture maturity, and (3) engineering principles.
Very often the functionality provided by the software package is the key decision factor. The closer the functionality aligns with the process that you want to support, the better a choice it will be.
For you to determine whether a software package is suitable or whether you should build a custom system (which hopefully leverages open-source libraries and modules, so you aren’t starting from scratch) requires that you take a good, hard look at your organization.
Two factors will be important in this decision: your flexibility in the process you are trying to support and your engineering capabilities. If you are not very flexible and you have a custom-made process, then leveraging a software product will likely require a lot of expensive customizations.
If you don’t have a strong engineering capability either in-house or through one of your strategic partners, then leveraging a software package is perhaps the better choice.
You need to understand where you stand on the continuum from the flexible process and low engineering capability (= package) to a custom-made process and high engineering capability (= custom solution).
If you land on the side of a software package, then create an inventory of the required functionality as requirements or user stories, and evaluate the candidate packages.
Ideally, you want real business users to be involved in the evaluation to make sure it is functionally appropriate for the business. The idea is that a package is giving you a lot right out of the box, and it shouldn’t be too much hassle to get a demo installed in your environment for this purpose. If it is a hassle, then that’s a warning sign for you.
Application architecture maturity is important to the ongoing support of your application, as a well-architected application will make it easier for you to manage and maintain the application.
If you build an application yourself, you will have to deal with architecture considerations, such as scaling and monitoring, and the better your IT capability is, the more you are able to build these architecture aspects yourself. Otherwise, you can choose a package solution to do this for you.
Four aspects that you can use to start the assessment of architecture maturity are as follows:
1. Auto-scaling: When your application becomes successful and is being used more, then you need to scale the functions that are under stress. The architecture should intelligently support the flexible scaling of different parts of the application (e.g., not just scale the whole application but rather the functions that require additional scale).
2. Self-healing: When something goes wrong, the application architecture should be able to identify this and run countermeasures. This might mean the traditional restarting of servers/applications, cleaning out message queues, or spinning up a new version of the application/server.
3. Monitoring: You want to understand what is going on with your application. Which elements are being used? Which parts are creating value for your business? To do this, the application architecture should allow you to monitor as many aspects as possible and make that data available externally for your monitoring solution.
4. Capability for change: You want to understand what it takes to make customizations. How modular is the architecture? If there are a lot of common components, this will hinder you from making independent changes and will likely increase your batch size due to dependencies on those common modules.
The application architecture should be modular in nature so that you can change and upgrade components without having to replace the whole system. Backward and forward compatibility is also important to provide flexibility for upgrades.
Engineering principles increase in importance the more you believe that the application will have to evolve in the future; and this, in turn, is often driven by the strategic importance of the application for your customer interactions.
Good engineering principles in an application allow you to quickly change things and scale up delivery to support increasing volumes of change.
The better skilled your IT department is, the more it will be able to leverage these principles and patterns. If you don’t have strong IT capabilities, then you will focus more on the built-in architecture features. Here are a few things to look out for:
Source code management: All code and configuration should be extractable. You want to be able to use enterprise-wide configuration management to manage dependencies between systems. To do that, the exact configuration of an application must be extractable and quickly restorable.
Inbuilt or proprietary solutions don’t usually allow you to integrate with other applications, hence breaking the ability to have a defined state across your enterprise systems.
If necessary, you should be able to re-create the application in its exact state from the external source control system, which is only possible if you can extract the configuration from the application to store it in a software configuration management (SCM) system. This means no configuration should be exclusive to the COTS product.
The ease with which the extract and import can be done will give you an indication of how well this can be integrated into your delivery lifecycle. The extracts should be text based so that SCM systems can compare different versions, analyze differences, and support merge activities as required.
Automation through APIs: The application should be built with automation in mind and provide hooks (e.g., application programming interfaces [APIs]) to fully automate the life cycle. This includes code quality checks, unit testing, compilation, and packaging. None of these activities should have to rely on using a graphical user interface.
The same is true for the deployment and configuration of the application in the target environments; there should be no need for a person to log in to the environment for deployment and configuration purposes. As a result of automation, build and deployment times are short (e.g., definitely less than hours, and ideally, less than minutes).
Modularity of the application: This reduces the build and deployment times, allowing for smaller-scale production deployments and overall smaller batch sizes by reducing the transaction cost of changes. It minimizes the chance of concurrent development and developers having to work on the same code. This, in turn, reduces the risk of complicated merge activities.
Cloud enablement: First of all, it’s not monolithic, so required components can be scaled up and down as needed without engaging the whole application. Licensing is flexible and supports cloud-use cases. Mechanisms are built into the system so that application monitoring is possible at a granular level.
To help you better select the most appropriate product for your needs, score each of the proposed products on the four areas we’ve discussed: functionality, architecture, engineering capability, and in-house IT capability.
What Do We Do with Our Existing Legacy Applications?
You are probably already working with a list of applications, and some of them are supported by software vendors who created the software or made specific changes for you, or both. Yet when you look at the application, it is quite possible that it is not following the modern architecture and engineering principles as I described earlier.
You will likely not want to invest in completely replacing these systems immediately, so you have to find other ways to deal with them There are four principal options that I recommend exploring with such software vendors: (1) leverage your system integrators, (2) leverage user groups, (3) strangle the application, and/or (4) incentivize the vendor to improve their product.
Of course, not all vendors and applications are created equally. There are multimillion-dollar organizations that you have to deal with differently than with your small-scale vendor that has only one application to support. I think the four basic patterns apply across the spectrum, but you should tailor them to your context.
Leverage a System Integrator
Having spent my entire career either working for a software vendor or system integrator, I find it surprising that so much is left on the table when it comes to using the relationship effectively beyond the immediate task at hand.
If you work with a large system integrator (SI) to maintain and develop an application, it is likely that the SI is working with the same application in many other places. While you have some leverage with the software vendor, the SI can use the leverage he has across several clients to influence the application vendor.
The better and more aligned your organization is with your SI on how applications should be developed, the easier it will be to successfully influence the software vendor. Better leverage with software application vendors to change their architecture is only one of many benefits that you can derive from changing your relationship with your SI.
Leverage User Groups
For most popular applications, there are user groups, which can be another powerful channel to provide feedback to the vendor. Sometimes these are organized by the vendor itself; sometimes they are independent.
In either case, it is worthwhile to find allies who also want to improve the application architecture in line with modern practices. Having a group of clients approach the vendor with the same request can be very powerful.
A few years back, I was working with an Agile software that was unable to provide reporting based on story points, relying instead on hour tracking. The product vendor always told my client, my colleagues, and me that our request was unique and hence not a high priority for them.
We could only get traction once we had reached out to some other organizations that, unsurprisingly, had issued the same request and had received the same response. The vendor was clearly not transparent with us. Once we had found an alliance of customers, the vendor took our feedback more seriously and fixed the problem.
I encourage you to look for these user groups as a way to find potential allies as well as workarounds and “hacks” that you can leverage in the meantime. By now, people worldwide have solved how to leverage DevOps practices for applications that are not very suitable on paper for the implementation of
DevOps. And the good news is that DevOps enthusiasts are usually very happy to share that information.
Fence In Those Applications and Reinvest
As discussed in the previous blog section, when the application is not changing and you have to divest from it, you can use an analogy to the strangler pattern in software development to slowly move away from the application. Reduce the investment in the application and reinvest to build new functionality somewhere else that is more aligned with the architecture you have in mind.
Be transparent that you are doing this because the software vendor is not providing the capabilities that you are looking for, but you would reconsider if and when those capabilities were available.
This will incentivize the software vendor to look into investing into a better architecture (perhaps the reason that the capabilities don’t exist is simply that no one ever asked for them before).
Make sure to explain why locked-down architecture and tools are not appropriate going forward and that your requirements for modern architecture require changes in the application architecture.
If the software vendor decides that those capabilities are just not the right ones for their application, then not investing any further into the application and spending your money somewhere else is the right thing to do anyway to enable the next evolution of your architecture.
Incentivize the Vendor
I always prefer the carrot over the stick; you, too, should look for win-win situations. Improvements in the architecture and engineering of an application will lead to benefits on your side, which you can use to incentivize the vendor.
What is usually even more effective is to show how changes will make the application more attractive for your organization and how more and more licenses will be required over time.
This is the ultimate incentive for vendors to improve their architecture. And of course, you can present publicly how great the application is and hence create new customers—a win for both parties.
As I said in the beginning of this blog section, the opinion on software packages in many organizations is not great. It is therefore surprising that organizations do not actively manage what software they do use and how little effort goes into engaging their software vendors to improve the situation.
I truly believe that vendors would be happy to come to the party more effectively if more organizations would ask the right questions.
After all, why would a software vendor invest in DevOps and Agile–aligned architectures when all every customer is asking for is more functionality and no one is paying for architecture improvements?
If companies engaged vendors to discuss the way they want to manage the software package and how important DevOps practices are for them in that process, vendors would invest more to improve those capabilities.
If all else fails and you feel courageous and curious, then you can ignore the guidance from your vendors and attempt to adopt DevOps techniques yourself, even to software products that don’t easily lend themselves to those techniques. This is how you start:
Find and manage the source code, which sounds easier than it often is. You might have to extract the configuration out of a database or from the file system to get to a text-based version of the code.
Find ways to manage this code with common configuration and code-merge tools rather than the custom-made systems the vendor might recommend.
You should also investigate the syntax of the code to see whether there are parts of the code that are non-relevant metadata that you can ignore during the merge process. Something that in Siebel, for example, has saved my team hundreds of hours.
Try to find APIs or programming hooks in the applications that you can leverage to automate process steps that otherwise would require manual intervention, even if those were meant for other purposes.
In my team, we have used these techniques for applications like Siebel, SAP, Salesforce, and Pega.
The above techniques will, I hope, help you to better drive your own destiny and be part of a thriving ecosystem where IT is a real enabler. The last piece of the ecosystem that I want to explore is the role of the system integrator, a topic obviously close to my heart, which I’ll address in the next blog section.
First Steps for Your Organization
Strengthen Your Architecture by Creating an Empowering Ecosystem
So, you already have software packages in your organization like so many others. In the previous blog section, we did an analysis of your application portfolio, which you can leverage now to determine which software packages are strategic for your organization.
1. Based on the previous application portfolio analysis (or another means), determine a small subset of strategic applications (such as the first minimum viable cluster) to devise a strategy for creating an empowered ecosystem around them.
2. Now pick these strategic packages and run the scorecard from this blog section. You can largely ignore the functional aspects, as they are used more for the choice between package and custom software.
You could, however, use the full scorecard in case you are willing to reconsider whether your current choice is the right one. Given that you are doing this after the fact, you will already know how suitable the package was by the number of customizations that your organization has already made.
3. Where you identify weaknesses in your software package, determine your strategy for them. How will you work with the software vendor to improve the capabilities? Will you work with them directly? Will, you leverage a system integrator or engage with a user group?
4. Results take time. Determine a realistic review frequency to see whether or not your empowered ecosystem is helping you improve the applications you are working with.
You can leverage the principles for measuring technical debt from the previous blog section as a starting point if you don’t have any other means to measure the improvements in your packaged applications.
Finding the Right Partner
The reality is pretty much every large organization is that you are not working alone. Somewhere in your organization, smaller or larger parts of your IT are either outsourced or at least have an SI helping you deliver the IT that is required to run your business.
This must be managed correctly to make sure you retain sufficient IP while getting the benefits from working with an experienced partner.
There is a lot of talk in the industry about how important culture is and that Agile and DevOps are mostly cultural movements. You will hear a lot of stories and examples of how to improve the culture within your organization when you attend conferences or read blog sections.
I completely agree that your organizational culture is crucial to being successful, but I wonder why there is not more discussion on how to align the cultures of SIs and the organizations they work with.
To date, the most company–SI relationships are very transactional and driven through vendor management. Words like a partner, partnership, and collaboration are often used, yet the results on the ground are too often a misaligned culture due to many reasons.
In this blog section, I want to help improve the situation based on my experience of being on both sides as an SI and as a client of SIs. There are ways to improve the relationship and make it more meaningful. And then there are certain pitfalls that you need to avoid. At the end of the day, both sides want the relationship to be successful—at least that is my experience.
It is often a lack of context and limited experience that are preventing us from extending organizational culture beyond the traditional organizational boundaries.
How to Create Beneficial Strategic Partnerships with a System Integrator
Many organizations going down the path of Agile and DevOps determine that the best way to be successful is to transition to Agile and DevOps by initially relying on in-house capabilities due to the higher level of control over your people and the environment they work in (salaries, goals, incentives, policies) than you have over the people of your SI.
Unless you are really willing to take everything back in-house, you will at some stage start working with your SI partners. Fortunately, there are plenty of benefits to working with a partner.
The right partner will be able to bring you experience from all the companies they are working with, they have relationships with your product vendors that are deeper than yours, and they can provide an environment that entices talent to join them that you might not be able to provide.
IT is at the core of every business nowadays, but not every company can be an IT company. Strategic partnerships allow you to be a bit of both—to have enough intellectual property and insight into the way your system is built and run while permitting your strategic partner to deal with much of the core IT work.
Be open and willing to delegate IT when needed in order to maintain balance—and success—overall.
The world of technology is moving very fast, which means we have to learn new technologies all the time. If you have a good relationship with your partner, you might be able to co-invest in new technologies and support the training of your partner’s resources; and in return, you might get reciprocal benefits in exchange for a credential that the partner can use to showcase their ability with the new technology.
My heart warms every time I see a conversation like that take place—where two companies sit together truly as partners to look for win-win situations. Taking an active interest in the world of your partners is important.
In some of my projects, I was part of a blended team in which my people’s experience in technology worked together with the client’s employees’ intimate knowledge of the business. Those client teams could maintain and improve the solution long after we left, which is what real success looks like.
We not only built a better system but left the organization better off by having upskilled the people in new ways of working. As discussed in the application portfolio blog section, there might be applications where you don’t want to build in-house capability and for which this approach does not apply.
For your innovation and workhorse applications, you want to leverage the technology and project experience on the SI side with the business knowledge and continued intellectual property around the IT landscape from your organization.
You should avoid having vendors who do not align with your intended ways of working and those whom you don’t have visibility into their processes and culture to ensure they align with yours—otherwise, knowledge of your systems sits with individuals from these vendors/contractors, and most changes happen in what appears to be a black box mode.
This makes it very difficult for you to understand when things go wrong, and when they do, you don’t see it coming. One way to avoid this proliferation of vendors and cultures is to have a small number of strategic partners so that you can spend the effort to make the partnerships successful.
The fewer the partners, the fewer the variables you must deal with to align cultures. Cultural alignment in ways of working, incentives, values, as well as the required expertise should really be the main criteria for choosing your SI besides costs.
Importance of In-House IP
Your organization needs to understand how IT works and needs to have enough capacity, skill, and intellectual property to determine your own destiny. As we said before, IT is at the core of every business now; a minimum understanding of how this works is important so that you can influence how IT supports your business today, tomorrow, and the day after.
But what does it mean to have control of your own destiny in IT? While there are some trends that take “headaches” away from your IT department (think cloud, SaaS, or COTS), there is really no way of completing outsourcing the accountability and risk that comes with IT.
You will also have to think about the tools and processes that your partners bring to the table. It is great that your vendor brings additional tools, methods, and so on, but unless you are able to continue to use those tools and methods after you change vendors, they can become a hindrance later if those tools are critical for your IT delivery.
If they are not transparent to you and you don’t fully understand how they work, you have to take this into account in your partnering strategy, as you will be bound to them tighter than you might like.
Fortunately, there is a trend toward open methods and standards, which makes it a lot easier to communicate across company barriers. Agile methodologies like the Scaled Agile Framework (SAFe) and Large-Scale Scrum (LeSS) are good examples. It is likely that you will tailor your own method based on influences from many frameworks.
When you make using your method a condition for working with your organization, it helps you keep control. You do, however, need to make sure your methods are appropriate and be open to feedback from your partners. Your partners should absolutely bring their experience to the table and can help you improve your methods.
Standards are also important on the engineering side. Too many organizations have either no influence over or no visibility into how their partners develop solutions. Practices like automatic unit testing, static code analysis, and automated deployments are staples.
Yet many organizations don’t know whether and to what degree they are being used by their partner. Having the right structure and incentives in place makes it easier for your partner to use those practices, but it is up to you to get visibility into the engineering practices being used for your projects.
One practical way to address this is to have engineering standards for your organizations that every team has to follow no matter what, whether it’s in-house, single vendor, or multivendor.
These standards will also provide a common language that you can use with your partners to describe your vision for IT delivery (for example, what your definition of continuous integration is).
Changing the “Develop-Operate-Transition” Paradigm
In the past, contracts with system integrators had something mildly Machiavellian to them, where a company creates a terrible work environment in which nobody wins. One of the models that suffer from unintentional consequences over time is the develop-operate-transition (DOT) contract.
I am not sure how familiar you are with this contract term, so let me quickly explain what I mean. DOT contracts work on the basis that there are three distinct phases to a project: a delivery phase, where the product is created; an operating phase, where the product is maintained by another party; and a transition phase, where the product is brought back in-house.
Many organizations use two different vendors for development and operations or at least threaten to give the operational phase to someone else while working with a delivery partner.
There are a few things wrong with this model. First of all, if you have a partner who is only accountable for delivery, it is only natural that considerations for the operating phase of the project will be less important to them.
After all, the operating activities will be done by someone else. The operate party will try to protect their phase of the project on their side, and you will likely see an increasing amount of escalations toward hand-over. There is no ill intent here, it is just a function of different focuses based on the scope of the contracts.
The second problem is that many DOT projects are run as more or less black box projects, where the client organization is only involved as the stakeholder and, until it gets to the transition phase, has not built internal knowledge on how to run and maintain the system.
This causes problems not only during the transition but also when navigating misalignments between delivery and operate parties. With just a little tweaking, we can bring this model up to date.
Choose a partner that is accountable for both the delivery and operation. You can change the incentive model between the two phases to reflect the different characteristics. Make sure that there is team continuity between phases with your partner, so that people who will operate the solution later are already involved during delivery.
Across the whole project life cycle, embed some of your own people into the team so that you can grow your understanding of the solution and what it took to create and support it.
Ideally, have tandem roles where both your partner and your own organization put people in (e.g., project manager, delivery team lead, system architects) to share responsibilities.
In this model, the downsides of the old DOT model are addressed, and you can still leverage the overall construct of DOT projects and combine it with DevOps principles. My best projects have used this model, and the results have been long lasting, beyond my involvement.
Cultural Alignment in the Partnership
As mentioned earlier in the book, I have been on both sides of a partnership as a system integrator (SI) providing services to a client and in staff augmentation roles, where I had to work with SIs.
It is quite easy to blame the SIs for not doing the right thing—for not leveraging all the DevOps and Agile practices and for not experimenting with how to do things better.
The reality is that every person and every organization does what they think is the right thing to do in their context. No one is trying to be bad in software development.
Unfortunately, sometimes relationships have been built on distrust: because I don’t trust you, I will have a person looking after what you are doing. The vendor then creates a role for someone to deal with that person, and both sides add more process and more documents on each side to cover their backside.
More and more process, roles, and so on get introduced until we have several levels of separation between the real work and the people talking to each other from both organizations. To make things worse, all this is just non-value-added activities as payment for the distrust between partners.
But imagine you trusted your SI like you trust the best person on your team. What processes and documents would not be required, and what would that do to the cost and speed of delivery for you?
Despite these potential advantages, there is a way too little discussion on how to make the relationship work. How could we create a joint culture that incentivizes all partners to move toward a more Agile and DevOps way of working, and how do we do this when we have long-lasting relationships with contracts already in place?
First of all, I think it is important to understand your partner; as in any good marriage, you want to know what works and what doesn’t work for your partner. And when I say partner, I mean partner. If you do the off project with a vendor and it is purely transactional, then you don’t have to worry about this.
But if you work with the same company for many years and for some of your core systems, then it does not make sense to handle them transactionally. You want to build a partnership and have a joint DevOps-aligned culture.
In a real partnership, you understand how the SI defines his or her success, and both sides are open about what they want from the relationship. Career progression has been one of those examples, and I have been lucky, as most of my clients understood when I discussed the career progression of my people with them and why I needed to move people out of their current roles.
From a company perspective, they would have preferred to keep my guy in the same role for many years; but for me, that would not have been good, as my people would have looked for opportunities somewhere else.
Of course, all of this goes both ways, so you should not accept if the SI wants to behave like a black box—you want the relationship to be as transparent as you feel is right.
You have the choice to buy a “service” that can be a black box with only the interface defined. In this case, you don’t care how many people work on it or what they are doing; you just pay for the service.
This gives the SI the freedom to run independently—a model that works well for SaaS—and you might have some aspects of your IT that can work with a XaaS mindset.
For other projects that include working with your core IT systems and with people from your organization or other third parties, you want transparency. A vendor that brings in their own tools and methods is basically setting you up for a higher transition cost when you want to change.
You should have your own methods and tools, and each SI can help you improve this from their experience. You don’t want any black box behavior. Fortunately, common industry frameworks such as SAFe or Scrum do help get to a common approach across organizations with little ramp-up time.
Thinking about partnerships, you should remember that you cannot outsource risk. I have often seen that the client is just saying “well, that’s your problem” when an SI brings up a possible situation. The reality is that if the project fails, the client will be impacted. Just closing your eyes and ears and making it the SI’s problem will not make it go away.
Think of the disaster with the Australian census, where the delivery partner took a lot of negative publicity, or Get 2018 health coverage. Health Insurance Market place in the United States, where vendors blamed each other for the problems.
Even if the vendors were at fault, the organizations took a huge hit in reputation in both cases; and given that they were public services, it created a lot of negative press.
In Agile, we want flexibility and transparency. But have you structured your contracts in a way that allows for this? You can’t just use the same fixed-price, fixed-outcome contract where every change has to go through a rigorous change control process. Contracts are often structured with certain assumptions, and moving away from them means trouble.
Time and materials for Agile contracts can cause problems because they don’t encourage adherence to outcomes— something that is only okay if you have a partner experienced with Agile and a level of maturity and trust in your relationship.
Agile contracts require you to take more accountability and be more actively involved in the scope management. In my experience, the best Agile contracts are the ones that are built on the idea of fixed capacity aligned to some outcome and flexible scope (the delivery of a number of features for which some details are defined as the project progresses).
There are ways to create Agile contracts that work for all parties, so let’s explore some basics of a typical Agile project. While Agile encourages teams to deliver production-ready code with each sprint, the reality often means that the delivery process is broken down into four phases:
1. scope exploration upfront and ongoing refinement (definition of ready)
2. sprint/iteration delivery of user stories (definition of done)
3. release readiness preparation/hardening and transition to production (definition of done-done)
4. post-go-live support/warranty (definition of done-done-done)
With that in mind, a contract should reflect these four phases. As a departure from the common deliverable- or phase-based pricing, where your partner is being paid based on deliverable (such as design documents) or completion of a project phase (such as design or development), these contracts reflect user stories as units of work.
Each story goes through the four phases described above, and payments should be associated with that; a certain percentage should be paid as a story achieves the definition of ready and the different levels of done. Here is a sample breakdown that works well:
We have three hundred story points to be delivered in three iterations and one release to production: $1,000 total price.
A payment schedule of 10%/40%/30%/20% (first payment at kickoff, the second one as stories are done in iterations, third one once stories are released to production, last payment after a short period of warranty).
Signing contract: 10% = $100.
Iteration 1 (50 pts. done): 50/300 × 0.4 × 1,000 = $66.
Iiteration 2 (100 pts. done): 100/300 × 0.4 × 1,000 = $133.
Iteration 3 (150 pts. done): 150/300 × 0.4 × 1,000 = $201.
Hardening and go-live: 30% = $300.
Warranty complete: 20% = $200.
With this contract model in place, we have a contractual model that ties the delivery of scope to the payments to the vendor. In my experience, this model is a good intermediate point of having flexibility while only paying for the working scope.
There are things that you want to provide as part of the contract too: an empowered product owner who can make timely decisions, a definition of the necessary governance, and a work environment that supports Agile delivery (physical workspace, IT, infrastructure, etc.).
Very mature organizations can utilize time and material contracts as they operate with their own mature methodology to govern the quality and quantity of outcome; less mature organizations benefit from the phased contract outlined above.
Another aspect of contracts is aligned incentives. Let’s start with a thought experiment: You have a really good working relationship with an SI over many years, but somehow, with all the legacy applications you are supporting together, you didn’t invest in adopting DevOps practices.
You now want to change this. You agree on a co-investment scheme and quickly agree on a roadmap for your applications. A few months in, you see the first positive results with demos of continuous integration and test automation at your regular showcases.
Your SI approaches you at your regular governance meeting and says he wants to discuss the contract with you, as the average daily rate of his overall team has changed. What do you expect to see? That the average daily rate of a worker has gone down thanks to all the automation? I mean, it should be cheaper now, shouldn’t it?
Well, let’s look at it together. The average daily rate is the average rate calculated on the basis that less-skilled work is cheaper and work that requires more skills or experience is paid higher. The proportion of those two to each other determines the average daily rate. When we automate, what do we automate first?
Of course: the easier tasks that require fewer skills. The automation itself usually requires a higher level of skill. Both of these mean that the proportion of higher-skilled work goes up and, with it, the average daily rate. Wait … does that mean things become more expensive? No.
Since we replaced some work with automation, in the long run, it will be cheaper overall. If you evaluate your SIs based on the average daily rate, you have to change your thought process. It is the overall cost, not daily rates, that matters.
Partnerships from the SI Side
I also want to look at the company–service provider relationship from the other side—the side of the system integrator. This perspective is not spoken about much, but given that it is usually my side, I want to show you how we think and what our challenges are.
The influence of DevOps culture has started to transform relationships to be more open. Even in the request for proposal processes, I can see an increased openness to discuss the scope and approach to delivery.
I have worked with government clients for whom, during the process, I was able to help them shift the request to something more aligned with what they were after by adopting an Agile contract like the one I mentioned earlier.
Originally, they wanted to pay per deliverable, something unsuitable for Agile delivery. Together, we can usually come up with something that works for all parties and makes sure you get what you are after.
As system integrators, we are still relegated too often to talking to procurement departments that are not familiar with modern software delivery. Contracts are set up with such efficiency that there is no room for experimentation, and where some experimentation is accepted, only positive results are “allowed.”
If you think about your relationship with SIs, I am sure you can think of ways to improve the relationship and culture to become more open and aligned with your goals. I have added a little test to diagnose your culture alignment in the exercises of this blog section.
I want to spend the last part of this blog section on partner evaluation. Clearly, you don’t want your partner to just take your money and provide a sub-optimal service to your organization.
So what can you do to govern the relationship while considering the open culture you are trying to achieve? And how do you do that while still having the means to intervene if the performance deteriorates?
In the example of a project being done by one SI, you can use a balanced scorecard that considers a few different aspects:
One of them is delivery; in this area, you care about quality and predictability of delivery. How many defects slip into production, how accurate is the delivery forecast, and how are the financials tracking?
You might also want to add the evaluation of delivery quality by stakeholders, potentially as an internal net promoter score (NPS, which quantifies the percentage of people recommending your service) of business stakeholders. Cycle time and batch size are two other metrics you should care about to improve the overall flow of work through your IT.
The second aspect is technical excellence. At a bare minimum, you want to look at the compliance with your engineering methods (unit testing, test automation, continuous integration, continuous delivery …). If your delivery partner is taking shortcuts, technical debt will keep increasing, and at some stage, you will have to pay it down.
In my experience, clients do a good job checking the quality of the end product but often fail to govern the engineering practices that prevent technical debt from accruing.
Providing a clear set of expectations around engineering methods and regular inspection (e.g., showcases for test and deployment automation, code reviews, etc.) reduces the chances of further technical debt. I have had great discussions about engineering strategies with clients during such showcases.
The third aspect is throughput; you can approach this from a story point or even stories per release perspective. I assume, though, that your releases will change in structure as your capabilities mature.
In light of this, cycle time and batch size are better measures. The interesting aspect of cycle time is that if you optimize for speed, you tend to get quality and cost improvements for free, as discussed earlier.
You should also reserve a section of the scorecard for improvements. Which bottlenecks are you currently improving, and how are you measuring against your predictions? You can add costs per service here (e.g., the cost of a deployment or a go-live) to see specific improvements for salient services in your organization.
Last but not least, you should have a section for the interests of your partner. Career progression, predictability, and market recognition come to mind as some of the non-commercial aspects of the relationship.
Of course, revenue and profitability are two other areas of interest that you want to talk about from a qualitative perspective—is this a relationship that is beneficial for both?
I recommend having a section of the scorecard where you track two or three priorities from the partner perspective and evaluate those together on a regular basis.
First Steps for Your Organization
Horses for Courses—Determining the Partners You Need
This whole blog section is about finding the right partner that fits your ambition and culture. But the truth is that you probably need different partners for different parts of your portfolio.
If you have done the application portfolio activity in blog section 2, this exercise will be easier. There are three different types of applications for the purpose of this exercise:
Differentiator applications: These applications are evolving very quickly are usually directly exposed to your customers, and define how your company is perceived in the marketplace.
Workhorses: These applications drive the main processes in your organizations, such as customer relationship management, billing, finance, and supply-chain processes. They are often referred to as enterprise or legacy systems.
They might not be directly exposed to the customer, but the company derives significant value from these applications and continues to make changes to them to support the evolving business needs.
True legacy: These applications are pretty stable and don’t require a lot of changes. In general, they tend to support your more stable, main processes or some fringe aspects of your business.
Based on these classifications, review your partner strategy to see whether you need to change either the partner itself or the way you engage with the existing one. For the first two categories, you want to engage strategic partners. For legacy applications, you are looking for a cost-effective partner who gets paid for keeping the system running.
The incentives for your strategic partners are different. Your partners for the workhorse applications should be evaluated by the efficiencies they can drive into those applications; for the differentiator applications, you want someone who is flexible and will co-invest with you. The outcome of this activity will feed into the second exercise for this blog section.
Run a Strategic Partners Workshop for Your Workhorse Applications
Organizations spend the majority of their money on their workhorse applications. This makes sense, as these applications are the backbone of the business. For this exercise, I want you to invite your strategic partners who support your workhorse applications (and possibly the differentiator ones) to a workshop.
You can do this with all of the partners together, which can be more difficult, or by running separate workshops for each partner. It is important to tell them to assume that the current contract structure is negotiable and to be open-minded for the duration of the workshop.
The structure of this workshop should be as follows:
Explain to your partner what is important for you in regard to priorities in your business and IT.
Discuss how you can measure success for your priorities.
Let your partner explain what is important for them in their relationship with you and what they require in their organization to see the relationship as successful.
Workshop how you can align your interests.
Brainstorm what the blocks are to truly achieve a win-win arrangement between your two organizations.
The key to this workshop is that both sides are open-minded and willing to truly collaborate. In my experience, it will take a few rounds of this before barriers truly break down—don’t be discouraged if all of the problems are not solved in one workshop.
Like everything else we talk about, it will be an iterative process, and it is possible that you will realize that you don’t have the right partners yet and need to make some changes in the makeup of your ecosystem.
Do a Quick Self-Check about Your Partnering Culture
A quick test to evaluate your DevOps culture with your system integrator:
Are you using average daily rate as an indicator of productivity, value for money, and so on?
+1 if you said no.
Do have a mechanism in place that allows your SI to share benefits with you when he improves through automation or other practices?
+1 if you said yes. You can’t really expect the SI to invest in new practices if there is no upside for him. And yes, there is the “morally right thing to do” argument, but let’s be fair. We all have economic targets, and not discussing this with your SI to find a mutually agreeable answer is just making it a bit too easy for yourself, I think.
Do you give your SI the “wiggle room” to improve and experiment, and do you manage the process together?
+1 if you said yes. You want to know how much time the SI spends on improving things by experimenting with new tools or practices. If she has just enough budget from you to do exactly what you ask her to do, then start asking for an innovation budget and manage it with her.
Do you celebrate or at least acknowledge the failure of experiments?
+1 if you said yes. If you have an innovation budget, are you okay when the SI comes back to let you know that one of the improvements didn’t work? Or are you just accepting successful experiments? I think you see which answer aligns with a DevOps culture.
Do you know what success looks like for your SI?
+1 if you said yes. Understanding the goals of your SI is important, not just financially but also for the people who work for the SI. Career progression and other aspects of HR should be aligned to make the relationship successful.
Do you deal with your SI directly?
+1 if you said yes. If there is another party involved, such as your procurement team or an external vendor, then it’s likely that messages get misunderstood. And there is no guarantee the procurement teams know the best practices for DevOps vendor management. Are you discussing any potential hindrance in the contracting space directly with your SI counterpart?
If you score 0–2 points, you have a very transactional relationship with your SI and should consider getting to know him or her better to improve the relationship.
If you score 3–4 points, you are doing okay but with room for improvements, so you could run a partner workshop to address the other dimensions. If you score 5 or 6 points, you are up ahead with a real partnership that will support you through your transformation. Well done!