Code Reading Techniques Tutorial 2019
Object-oriented (OO) languages such as C++, Java, and C# are very popular nowadays. However, OO code reading is challenging. Stepwise abstraction is extended to support OO reading. To understand both static and dynamic aspects of the OO code, use-cases are used to direct the reading process.
Object-oriented framework reading is even more challenging and functionality-based reading is an effective and efficient technique in finding defects in frameworks. This tutorial explains the best 50+ Code Reading Techniques with examples. Also explains the Importance of Code Reading and How Do software engineering professionals Read Code.
Software developers spend most of their professional lives on reading other developers’ code and there are many legacy codes that developers are tasked to maintain.
Task-directed reading is developed to fill that need. To read and understand the programming code, the readers themselves shall be familiar with the program constructs. This is not covered here, however.
We instead cover techniques that are applicable to any high-level programming languages. Lastly, factors that impact code readability are also examined before we conclude the blog.
Code Reading As a Professional Skill
Importance of Code Reading
We read programming code for different purposes. As students, we read code in books, magazines, and journals or on the web to learn language constructs and how to master them.
As professionals, we may still read the code for learning, but most of the time it is for other reasons. We read code written by colleagues as part of the code inspection process; in this case, we read and analyze the code to verify its quality and detect possible defects.
We also read the code to identify reuse opportunities or figure out how to use it in our own projects. As software developers, a large part of our professional lives is spent maintaining, adapting, correcting, perfecting, and modifying existing code.
Empirical data show that developers spend about half of their time on reading and comprehending programs during software maintenance.
We want to introduce minimal disruption to the existing functionality to maintain its original architectural and stylistic integrity. These purposes are consistent with the schema Basili et al. categorized, i.e., reading for analysis and reading for construction.
To fulfill all these purposes, we must read and achieve a necessary level of understanding of the code. Successful software engineering professionals must learn, cultivate, and master code reading skills.
How Do People Read Code?
Program comprehension is an important activity in software engineering and a central activity during software maintenance, evolution, and reuse. Comprehension is a process in which individuals build their own mental representation of the program.
It has continued to be an active research topic since the 1970s for computer science researchers as well as cognitive psychologists, and there is a rich body of literature in the field.
Although program comprehension is a broad topic, much of the research has been focused on code reading and comprehension. Researchers are interested in how people, both experts, and novices, read and understand the code and what strategies they exploit to facilitate comprehension. Due to its practical importance, program comprehension is still an active and interesting topic today.
To understand the empirical findings on how people read and understand programs, researchers put forth numerous, sometimes conflicting, models, which tend to have a set of common elements: an assimilation process, cognitive structures, and the knowledge base.
The assimilation process is the reading process or strategy the programmer uses to extract information from the code in order to build their mental representation of the code. Such strategies may include top-down and bottom-up. It is similar to the reading techniques we have discussed but less well-structured.
The cognitive structure may include a programmer’s existing knowledge base on the programming and application domain and his/her mental representation of the model.
Although the elements are common, the views on them are not. It is not our interest here to review and analyze cognitive models, for which interested readers can refer to the aforementioned surveys. We rather summarize the empirical findings related to code reading and comprehension.
The source code is not read like a novel, nor is its meaning determined by seeing how it behaves when run or traced using test data. To read and understand the code, one has to build layered abstractions of the code.
There are various strategies to build the layered abstractions, one of which is top-down reading. Top-down reading is similar to how we write code. When we write code, we typically follow a divide and conquer approach to decompose a high-level function into multiple low-level ones, recursively as needed.
In top-down reading, one gains the understanding of the code by appreciating the overall purpose first, followed by understanding how the function is implemented by constituent components. During this process, the reader repeatedly forms hypotheses about the code, which are subsequently verified, modified, or rejected.
The reader scans the code and searches for familiar clues in the text, which are called beacons or program plans. For this reading strategy to be effective, one needs to know the overall programming purpose, and well-documented code can usually be read top-down.
Bottom-up reading is the opposite of top-down reading. In bottom-up reading, understanding of the code is accumulated by an understanding of small fragments of code.
The reader recognizes the function of groups of statements as chunks and combines these chunks to explain increasingly larger program fragments. Deep knowledge of the programming language and constructs, and of the application domain, helps the reader read and understand the code.
Identifying and understanding the control flow and data flow of the program greatly facilitates the global program comprehension. Cross-referencing the program domain and the application domain tends to confirm and enhance the level of understanding, which points to the significant roles the reader’s experience and knowledge play during reading and comprehension.
Empirical studies of professional programmers reveal that people do not employ pure top-down or bottom-up strategies but mix them freely. Code reading and comprehension is a hard and time-consuming task, and programmers often adopt “as-needed” (opportunistic) approaches to avoid deep understanding.
They focus on the task at hand and gain just enough knowledge to complete the task. They do a deep understanding of the code only when they have to.
People adopting the as-needed reading strategy focus on local program behavior and fail to construct successful modifications to the program since they fail to detect critical interactions among program components and don’t have a complete and accurate understanding of the code.
The pragmatic approach to code reading and comprehension has also been recently related to code reuse. Which strategy to use is typically driven by what question the reader is seeking to answer.
For example, to answer a “how” question (how a sorting algorithm is implemented), top-down reading is warranted in order to find out the low-level implementation.
The workflows including reading strategies vary among developers and depend on their skills, experience, personality, tasks at hand, and technology used. The tool usage was very low or limited and some developers were even not aware of the existence of certain features in the tools they used daily.
Reading by Stepwise Abstraction
Reading by stepwise abstraction is a bottom-up reading technique that formulates an abstract description of what a fragment of code does from the fragment itself.
It was first presented by Linger, which became the basis for what Basili and Mills did with greater formality. The reading technique was later integrated into the Cleanroom process, as the verification-based inspection to assert the implementation correctness.
They have argued that program writing is expanding the known function into a program and program reading is abstracting the known program into a function. When reading the code for defect detection, one compares the known functions (design) to their expansions (code).
Code reading is thus to recognize directly what the code does or mentally transform it into something that can be recognized directly. The result of this mental transformation is an abstraction, irrespective of all implementation details.
Reading by stepwise abstraction can be applied to reading any programming code. A structured program of any size can be read and understood in a completely systematic manner by reading and understanding its hierarchy of prime programs and their abstractions.
A prime program is a fragment of code that has one entry and one exit and is irreducible in some sense.
The purpose of reading the prime programs is to discover their program functions, and the program functions can be captured as comments in the code. A well-structured and documented program can be read top-down, from overall design to lower levels of details.
For poorly structured and documented code, however, bottom-up reading is a better strategy, which allows one to discover the intermediate abstractions, successively at higher levels. The process of bottom-up reading is called stepwise abstraction.
The description of reading by stepwise abstraction is not very clear in the literature. We can, however, identify the reading instruction in Panel 1: Instruction for Stepwise Abstraction.
PANEL 1: INSTRUCTION FOR STEPWISE ABSTRACTION
1. Read code line by line to build up a conceptual understanding of code fragments.
2. Connect code fragments to form an overall picture.
3. Compare with specifications to detect defects.
4. Repeat until all code is abstracted and compared with specifications.
Since it was proposed in the 1970s, reading by stepwise abstraction was well studied and compared with other dynamic testing techniques such as functional testing and structural testing.
The general consensus is that the effectiveness of reading by stepwise abstraction varies significantly from code to code, and code reading shall be combined with other dynamic testing techniques in order to detect different kinds of coding defects.
[Note: You can free download the complete Office 365 and Office 2019 com setup Guide for here]
Object-Oriented Code Reading
In early times, the code reading techniques were proposed mostly for procedural languages. As OO languages and programming techniques became popular, there was a growing collection of evidence suggesting that early code reading techniques couldn’t deal with issues with OO programming.
In the following section, we first discuss the challenges raised in OO code reading and then introduce two reading techniques that address the challenges. We conclude our OO code reading with a summary of empirical findings.
Challenges of Object-Oriented Code Reading
OO programming has three hallmarks: encapsulation, inheritance, and polymorphism. These influence how the code is created, structured and executed. The OO programming paradigm encourages the distribution of functionality related code elements across the system.
Understanding of code frequently requires the understanding of code not in the same class, e.g., in its base class or in other composed class. Polymorphism and late binding make the dynamic behavior of the code hard to comprehend. To fully appreciate the code, one needs to understand its static and dynamic behaviors.
Soloway and Ehrlich introduced the concept of a programming plan, which is a generic fragment of code that represents typical scenarios in programming.
They observed that when a programming plan is distributed non-contiguously in a program, it becomes hard to comprehend since only a part of the code is seen at a time and the reader has to guess based on local information.
They called this kind of plan delocalized. This delocalized nature is pervasive in OO programming, and Dunsmore et al. named the characteristic delocalization. Effective OO code reading has to address this delocalization.
In the following, we discuss two reading techniques: abstraction-driven reading, which addresses the delocalization nature of the OO code, and use-case-driven reading, which is intended to address the difference between static and dynamic behaviors in OO systems.
Dunsmore and colleagues extended the idea of stepwise abstraction to OO code reading, and their systematic reading technique is called abstraction-driven reading. In essence, the reading techniques have the following ingredients, as shown in Panel 2: Instruction for Abstraction-Driven Reading.
There are many kinds of dependencies and couplings, such as data dependencies and control dependencies. Dunmore et al. didn’t provide details on how to quantify them.
Skoglund and Kjellgren used coupling metrics (interaction coupling, component coupling, and inheritance coupling) to measure and rank the classes and methods so that the reading order can be objectively determined.
When developing the abstraction of a method, the reader should identify any changes of state and outputs in terms of inputs and prior states. The specifications should be brief and complete, describing what the method does but not how.
A vigilant reader may have noticed that the abstraction development process is similar to the stepwise abstraction discussed earlier.
Abstraction-driven reading is a systematic approach. It encourages a deep understanding of the code and helps the readers stay focused and on track. The abstract specification generated during reading can be used in future code reading.
It is a promising technique to address the delocalization nature of OO code. However, abstraction-driven reading has its shortcomings. It is often slow and time-consuming, and it is not designed to address the dynamic nature of OO software.
PANEL 2: INSTRUCTION FOR ABSTRACTION-DRIVEN READING
1. Determine the reading order.
a. Analyze the interdependencies and couplings within the whole object-oriented system. Read the classes with the least amount of dependencies first.
b. Analyze the methods within classes. Read the methods with the least amount of dependencies first.
2. Read using abstraction.
a. For each method, reverse-engineer an abstract specification of the method. The method abstract specification may be used to compare with the class specification; it can also be used to support further reading and understanding of other methods (see the tracing of referenced methods and classes below).
b. Trace and understand all referenced classes during reading. This includes reading methods/classes, documentation, previously created abstractions, etc.
The abstraction-driven reading technique has the potential to discover delocalized defects. To deal with the highly dynamic nature of the OO system behavior, however, additional reading techniques are needed.
Use-cases play a significant role in OO system development. For example, they are used to capture the system requirements and play a driving role in the Rational Unified Process. It is natural to use use-cases to guide code reading. We describe use-case-driven reading as originally documented by Dunsmore et.
The aim of use-case-driven reading is to check if each and every object behaves correctly in all the possible ways they are used. Specifically, we seek the answers to the following questions.
Are correct methods called? Are decisions and state changes made within each method correct and consistent? The reading procedure is described in Panel 3: Instruction for Use-Case-Driven Reading.
PANEL 3: INSTRUCTION FOR USE-CASE-DRIVEN READING
1. For each use-case, in turn, devise a set of scenarios that include preconditions, success or failure conditions, and exceptions.
2. For each scenario derived from a use-case:
a. Document the expected outcome (e.g., state changes, outputs).
b. Use a sequence diagram or other diagrams that capture the dynamic aspects of the system. Trace the interactions among participating objects that the scenario dictates by following the message calls.
c. For the class whose code is under reading, verify that the correct methods of the object of that class are called to support the scenario.
d. Note any decision and state changes in the method of the class under reading and verify that they are correct and consistent with respect to the scenario.
e. When reading the method code, follow the call to other methods if any. If the called method is in the class under reading, follow the method call, read the method and verify its correctness in a similar fashion; otherwise return and follow the sequence diagram.
f. At the end of the scenario tracing, make a note on the final outcomes and compare them to the expected ones. If there is any difference or anomaly, note the location of the difference and mark it as a defect.
In use-case reading, one devises a number of scenarios from a use-case and examines how the classes deal with those scenarios.
It forces the readers to consider object behavior in the given concrete contexts, giving the readers a better idea of whether the code is operating as expected. The readers pay attention to missing/incorrect method calls, erroneous state changes, etc.
The readers compare the sequence diagram and the implementing code to verify whether the correct method is called in the right context and whether a side effect of the method call is consistent between the code implementation and the sequence diagram, etc. The readers shall also trace other method calls and ensure their correctness.
For defect detection, any difference, inconsistency and missing information, as well as its location in code, is noted and analyzed.
It is not feasible to exercise all scenarios and use-cases. Therefore readers take a dynamic slice of the system. In practice, it will detect fewer defects than other reading techniques. Hence, use-case reading is meant to be complementary to other reading methods.
Dunsmore and colleagues introduced two new reading techniques, abstraction-driven reading, and use-case-driven reading, for OO code and compared them to other reading methods, namely ad hoc reading and checklist-based reading.
To overcome the known shortcomings of the checklist approach, the authors designed their checklist carefully and based their questions off of historical defect.
The final checklist includes 18 carefully ordered questions, covering “where to look” (class-level, method-level, and method-overriding issues) and “how to detect” components.
There is no significant difference between abstraction-based reading an ad hoc reading in terms of a number of defects discovered. However, there is a small improvement using abstraction-based reading.
Readers using ad-hoc reading went through the code two or three times to build up their understanding, while readers using abstraction-based reading read through the code once, at most twice, albeit slowly.
Some defects are completely undetected by all readers using ad-hoc reading, but this was not the case for readers using abstraction-based reading. That is, abstraction-based reading has the potential to detect delocalized defects. Compared with ad hoc reading, abstraction-based reading also helps readers stay focused and on track.
Empirically compared the defect detection capabilities of abstraction-driven reading, use-case-driven reading, and checklist-based reading, using experienced students as subjects.
They observed that readers using checklist-based reading found more defects and at a quicker rate. However, the detection performance dropped off sharply after the first 60 minutes.
The defect detection of abstraction-driven reading and use-case-driven reading appeared to be similar to each other due to higher initial overhead. Their defect detection performance leveled off at a later time, but not to the same degree. Readers using the use-case-driven method might have discovered more defects if they were given more time.
In terms of the number of false-positive defects reported, checklist-based reading reported the most false positives and use-case-driven reading reported the least. These results are not totally unexpected.
Abstraction-driven reading is slow and it aims at the full understanding of the code. With use-case-driven reading, one has to generate scenarios before comparing the code and the sequence diagram.
The researchers reported that although the performance of abstraction-driven reading is not as strong as that of checklist-based reading, abstraction-driven reading appears to be effective at detecting delocalized defects (but less effective at detecting other defects). Use-case-driven reading had the worst performance of the three studied.
However, the method deals with the behaviors in the context of executing systems. Among the three reading techniques studied, no single method detected all defects, and there was not much overlap regarding the kinds of defects detected, suggesting a complementary reading approach would work best. This is in line with the underlining idea behind perspective-based reading.
The combination of these three reading techniques would have the potential to detect recurring defect types (checklist-based), unusual defects that require deeper understanding (abstraction-driven), and particularly defects that are associated with OO programming (abstraction-driven reading and use-case-driven reading).
Their experimental results are inconsistent with those reported by Dunsmore et al. Further, Skoglund and Kjellgren reported that abstraction-driven reading gave more support in understanding the code.
Object-Oriented Framework Code Reading
An object-oriented framework is getting popular. We discussed scope-based reading for OO application construction in blog 6. Here we present an OO framework code reading technique for defect detection developed by Abdelnabi et al. (2004), functionality-based reading.
Why Yet Another Object-Oriented Code Reading Technique?
The OO code reading techniques discussed earlier are presumably applicable to OO framework code reading, so why do we need yet another reading technique? Application frameworks are generalized from existing applications in a specific domain. They have light requirements with no specific or fixed set of use-cases.
It is not feasible to define all possible use-cases the framework is going to support since the concrete applications have not been instantiated yet when the application framework is being actively developed. The dynamic behavior of the framework is at least incomplete since the hotspots will be extended by application developers.
Therefore, the OO code reading techniques discussed earlier have only limited use. Additionally, the reading framework involves two aspects: code reading and design reading. The latter is crucial, otherwise, a framework with very poor design will seriously limit its potential adoption.
Application framework has a steep learning curve, and understanding the framework remains a challenging task. It is important to understand the structure, both static and dynamic, of the framework when reading the framework code for defect detection. It is thus a wise approach to use the framework understanding to guide framework reading for defect detection.
Functionality-Based Approach to Framework Understanding
An OO application framework can also be understood by first understanding its top-most framework constructs and then general OO constructs. General OO constructs typically include basic constructs (classes and their relationships, such as inheritance and composition) and advanced constructs (e.g., meta-classes and reflection). The top-most framework constructs include:
Components: Here framework components are fully implemented functionalities that application developers can reuse directly.
Interfaces: Framework interfaces are a collection of abstract operations called hotspots, which are customized and implemented by application developers without altering the structure and behavior of the basic framework.
Design patterns: A design pattern is a reusable, proven solution to a commonly occurring problem within a given context.
Framelets: Framelets are small frameworks that package components, interfaces, and design patterns. They are used to structure and document large and complex frameworks.
To understand a framework, one must extract and understand its functionalities, which can be traced to an operation or a set of operations the framework provides or supports. A functionality can expand to, be implemented by, or use another functionality. Functionalities supported by a framework can be categorized as:
Do-functionality: A do-functionality is a fully implemented capability that every instantiation of the framework application must have.
Can-functionality: A can-functionality is not fully implemented, and application developers must supply their own specific code at those hotspots.
Offer-functionality: An offer-functionality is a fully implemented capability, but its use is not mandatory in an instantiated application.
In the functionality-based approach to framework understanding, one reads the framework code with the intent to extract and abstract framework functionalities, trace them to framework operations (methods), and relate them to other functionalities. In the end, the reader compiles “functionality rules.”
A functionality rule categorizes a functionality (do-functionality, can-functionality, or offer-functionality), documents the code locations where the functionality is implemented, provides a concise and precise description of the functionality and lists other functionalities this functionality relates to and the relationship type (use, expand, implemented by).
Generating the functionality rules is an additional documentation effort that should happen before code reading takes place.
To develop the functionality rules, one reads the framework documents in a top-down manner, from requirements to designs, and to code as needed. The class source code is read recursively, from the top-most classes in the inheritance hierarchy to the derived ones. The implementation class is read to abstract its function.
Overriding methods are read to verify, refine, and update the abstractions established earlier. During reading and tracing, the relationships with other functionalities are established, particularly when an object sends a message to another. Lastly, the functionality is classified as a do-, can-, or offer-functionality.
Functionality-based reading is motivated by the framework-based reading and understanding of OO frameworks. Its purpose is to trace the functionality to concrete framework constructs and their associated code.
It is a hybrid reading technique: It uses the functionality rules as guidance (top-down) and reads the code from bottom-up. It has steps as shown in Panel 4: Instruction for Functionality-Based Reading.
PANEL 4: INSTRUCTION FOR FUNCTIONALITY-BASED READING
1. Locate the functionality rules. Arrange for their development if they do not exist.
2. Read the functionality rules in order of categories Do-, Can-, and Offer-functionality.
3. For each functionality rule:
a. Locate the associated method in the lowest level class; read the code with respect to the description of the functionality, and log any discrepancies as a defect.
b. Locate the related functionalities. Read them for defect detection, if not already inspected.
Abdelnabi et al. compared the functionality-based reading with checklist-based reading and abstraction-driven reading for defect detection in OO frameworks, using students as subjects.
The objects were real and professional C++ OO frameworks, with carefully seeded defects of different types. To make it manageable, subjects were asked to inspect about 1000 lines of code.
The researchers concluded that functionality-based reading was significantly more effective (more positive total defects detected) and efficient (more positive defects detected per unit of time) than the other two reading techniques.
There are a lot of legacy software applications around, and software developers are tasked to maintain them. The legacy software code may not be well documented, or the documentation is outdated or simply inaccurate.
Quite often it is necessary to continuously improve the software quality of a legacy system, particularly for safety-related and mission-critical systems. Kelly and Shepard introduced so-called task-directed inspection for legacy code reading.
Their main idea was to combine code inspection for defect detection with other software development tasks to reduce the potential resistance to the idea of code inspection, thus “task-directed.” They also reported a lightweight process far removed from Fagan-style inspection, which is not discussed here.
Based on the particular circumstances when the reading technique was introduced, Kelly and Shepard defined three tasks, all aligned with the objective to produce useful documentation for the legacy software system:
Task 1. Create a data dictionary for the module. A dictionary is simply a catalog of all variables in the module, including their definitions, units of measurement if appropriate, and the meaning of each discrete value if applicable.
The roles of these variables in module calling sequences are also of interest. The reader is to confirm that each user of the variables is consistent with its definition.
Task 2. Document the logic of the module and add a description as comments in code files.
Task 3. Compile a cross-reference between the code and specifications. Cross-reference tags are created and embedded in both code and specification to signal individual matches. Any mismatches or missing of materials in either the code or specification are recorded.
To accomplish these tasks, a reader will have to read and understand the code, trace the data flow and control flow, and cross-check the code and specification. Since the readers have clear objectives in mind, they are forced to scrutinize the code and related document closely.
Modules are assigned to readers deliberately, considering their background and expertise. All three tasks associated with the same module are assigned to a single reader, taking advantage of the potential synergies between tasks.
The three different tasks give the inspectors different viewpoints on the source code. Each reader completes their assigned tasks in parallel, and task-directed reading doesn’t dictate any interactions among individual readers if there are any.
Kelly and Shepard also conducted a case study in the industry environment using professional developers as subjects. According to the authors, 50,000 lines-of-code scientific legacy software was read and 950 findings were recorded.
Among all these findings, 6% were considered serious defects and received immediate attention for correction, 56% were related to style and maintenance issues, 33% identified inconsistencies between code and specifications, and the remaining 5% were related to enhanced functionalities.
The code was inspected at a rate of 20 lines/hour. At completion, the number of comments in the code increased from 20% to 60%, which is consistent with the code. The experiment was considered a success and the same technique was applied to other software systems.
Code Readability Factors
We all know that some articles or books are easy to read and understand, while others are not. We also know that the format of the page can affect reading speed. According to Wikipedia, readability is the ease with which a reader can understand written text, and it can be measured in many different ways.
When it comes to code reading, experience tells us that people read code at different speeds with different levels of understanding. However, we do not have a complete account of what impacts code readability.
In the early days of computing programming, researchers kept track of eye motions and focus, trying to figure out how people read the code. Based on the observational studies, concrete and sometimes radical changes were proposed regarding code display and formatting.
There are many factors that can affect code readability. Deimel and Nevada classify these into five categories. Their classification is still relevant today.
Reader characteristics: A reader’s experience and knowledge of programming, programming languages, and application domains play a significant role during code reading and comprehension.
Intrinsic factors: Similar to the intrinsic and accidental complexity of a design, the code can have its own intrinsic and accidental complexity, which affects its readability.
As we learned earlier, object-oriented programming makes delocalization more prevalent, and the delocalized code is harder to read and understand.
Representational factors: Representation factors are broad and can include the programming language, whether the code has adequate and accurate comments, the complexity of the design, the naming conventions for variables and methods, etc.
Typographic factors: Typographic factors include font, color-coding of keywords or other programming entities, usage of white space and indentation, etc.
Environmental factors: Environmental factors are meant to contain anything else, e.g., the lighting in the reading spot, the integrated development environment (IDE), etc.
Numerous books, discussions, and postings on programming styles, standards, or conventions are available. We don’t want to start another heated debate here. Rather, we make a few suggestions to improve code readability:
Pick a coding standard, including formatting and indentation, for the team that most of you agree with. Uniformity and consistency will improve code readability.
Add comments to the code and keep the comments up to date. Don’t document facts obvious from the code. Instead of capture the design rationale, assumptions, and decisions as comments.
Choose your variables, function or method names, and other identifiers carefully and wisely. Make sure the names reflect their intentions. Also, make sure the names are consistent with the usage in the application domain.
Use simple programming structures. Be aware of the KISS (keep it simple, stupid) principle. Stay away from nonstandard language features. Add white spaces whenever feasible. Don’t clutter the display. Logically group your code.
Different stakeholders will read the code you are writing today, including yourself at a later time. It might be true that the code logic is perfectly clear in your mind at the time of writing. However, you will appreciate your efforts to make the code easier to read if you come back to the code again in a few years, or even a few months.
Basic Software Reading Techniques
Software professionals are trained to write software documents. But reading, understanding, analyzing, assessing quality, and utilizing the software document are equally important.
Next, this blog defines software reading and software reading techniques and classifies the software reading techniques based on their characteristics. This blog then discusses ad hoc reading, checklist-based reading, and differential reading, which can be applied to any software artifacts.
Due to its long history, checklist-based reading has a few variations and the community accumulated some heuristics on checklist best practices. Ad hoc reading and checklist-based reading are the two most practiced reading techniques.
According to a recent industry survey, ad hoc reading is used in 35% of the software reviews and checklist-based reading is used in 50% of the reviews.
Ten percent of the reviews use some specific or advanced reading techniques and the remaining 5% use simulation or other techniques. Ad hoc and checklist-based reading techniques are also frequently chosen as a baseline and other reading techniques are compared to them.
Introduction to Software Reading
Reading is a complex cognitive process of decoding symbols in order to construct or derive meaning. … It is a complex interaction between the text and the reader which is shaped by the reader’s prior knowledge, experiences, attitude, …
The reading process requires continuous practice, development, and refinement. In addition, reading requires creativity and critical analysis. … Readers integrate the words they have read into their existing framework of knowledge and schema.
Based on this discussion, we conclude that (1) reading has a purpose, (2) the reader’s background plays a role in reading, (3) reading techniques can be learned, and (4) reading requires critical analysis. Those general characteristics exhibit in software reading as well.
Definition of Software Reading
Based on the Encyclopedia of Software Engineering, software reading is defined as the process by which a developer gains an understanding of the information encoded in a work product sufficient to accomplish a particular task. We adopt the same definition.
The “work product” refers to the software artifact, ranging from requirements specification, design documentation, code files, test plan, test cases, test reports, to user documentation, etc.
The “particular task” is related to the purpose of reading, whether the reading is for gaining knowledge of the system, detecting defects, or implementing the design. The purposes of reading are systematically treated in the next subsection.
Purposes of Software Reading
We read software artifacts to accomplish a particular task. The task is defined by the purpose of reading. Broadly speaking, we read software for analysis and for construction.
In reading for analysis, we read and understand the document, then analyze and assess the qualities and characteristics of the document. The primary objective of reading for analysis is to detect defects in the document.
When reading the requirements specifications, we may detect various types of requirement errors such as incorrect facts, omission, ambiguity, and inconsistency. When reading the code, we may detect various types of coding errors such as logic errors, assumption errors, incorrect function calls, etc.
Other objectives of reading for analysis include performance predictions, requirement tracking, usability assessment, etc. One of the main reasons to have an architecture document is to support analysis and prediction. For example, we can predict system performance by using queue theory when there are one or multiple processing queues involved.
By reading the system requirements specification and subsystem requirements specification, we can reason whether the system requirements specification is sufficiently decomposed to the subsystem requirements specification.
And whether the subsystem requirements specification can indeed trace back to its system requirements specification origin and have sufficient coverage of the system requirements specification.
In reading for construction, we attempt to identify if any requirements, design, code or test cases can be reused in the same project or in a different project.
We also examine the high-level design document to come up with the low-level design or read the design document to see how we may implement the design properly. Sometimes we read the software just to gain the knowledge. It might be true, however, we have the interest to maintain the software in the future.
Taxonomy of Software Reading Techniques
Software reading techniques can be classified along different dimensions. Based on whether the reading technique is structured (systematic) or not, the reading techniques can be put into the following categories:
Unstructured, or unsystematic reading: Ad hoc reading falls into this category. Ad hoc reading is discussed in this blog.
Semi-structured reading: Checklist-based reading is in this group and is also discussed in detail in this blog.
Structured or systematic reading: Perspective-based reading falls into this group, along with many other techniques. These reading techniques collect knowledge about the best practices for defect detection into a single procedure.
To some extent, they serve a similar role as design patterns to design.
When we discuss the benefits and shortcomings of each reading technique, keep in mind that structures of any kind simultaneously enable and limit human activities. This is known as the Paradox of Structure.
While a structured reading technique enables one to find anomalies in the software artifact, the same reading technique may limit one to find anomalies only in certain categories.
Reading techniques can be classified according to the software artifacts to which they are applied. For example, checklist-based reading can be applied to review almost every software artifact, while stepwise abstraction, which is discussed in a later blog, is only applicable to code review.
Ad hoc Reading
When there is no specific method provided to the reader to detect issues or defects in the software artifacts under review, we call it an ad hoc reading. This is an unstructured, or unsystematic, reading technique.
The reader simply attempts to uncover as many issues and defects as possible by examining the artifact using whatever intuition, skills, knowledge, and experience he or she may have.
The effectiveness of the ad hoc reading is entirely up to the individual reader. The individual defect detection performance can vary by a factor of 10 in terms of defects found per unit time. Nevertheless, it is the most common reading technique.
One of the advantages of the ad hoc reading is that there is no training needed for the reader. However, it has many disadvantages, among which is the wide variability of the results. In fact, the outcome to a large extent depends on the skills, knowledge, and experience of the reader.
It is slow for readers to acquire expertise; thus inexperienced readers will not be productive when reviewing software artifacts to uncover issues. Once the expertise has been acquired, it is very difficult to teach or transfer the expertise to others.
Since the effectiveness of the ad hoc reading depends on individual expertise, readers adopting this reading technique may miss major areas of concern.
Checklists are ubiquitous. You may use a shopping list. Grade school kids often have a school supply list for the next school year. If you are planning a family vacation, you most likely have a packing list. The example list goes on and on.
Checklists are widely used in software reading and they serve a similar purpose as the above-mentioned checklists do. In fact, Fagan (1976) suggested using a checklist during software inspection in his seminal paper.
Checklist Definition, Types, and Examples
So what is a checklist? A checklist is a list of questions to provide reviewers with hints and recommendations for finding defects during the examination of software artifacts.
Since a question can be rephrased as an imperative sentence, the checklist does not have to be composed of questions only. The questions or imperative sentences in the checklist draw reviewers’ attention to defect-prone areas based on historical data.
A checklist may also serve other purposes. For example, it can be used to ensure important areas are covered by the artifact under review.
Checklists can be classified into two groups: property-based checklists and artifact-based checklists. Checklists for coding standards and guidelines, standard or process conformance, etc. are property-based.
Checklists for requirements specifications, design documents, code files, or test cases are artifact-based.
Checklists for requirements specifications and design documents may contain items to ensure correctness, consistency, and completeness of the requirements or design, while the checklists for code review may include items for generally accepted programming practices and for particular programming languages.
Panel 1 shows a partial checklist for a requirements review. Note each checklist item is an imperative sentence. When a reader is reviewing a requirements specification document, he or she can check each requirement against individual items on this checklist.
PANEL 1: A SAMPLE (PARTIAL) CHECKLIST FOR REQUIREMENTS REVIEW
1. Requirements specifications shall be testable.
2. Requirements specifications shall not conflict with other requirements specifications.
3. Conditional requirements specifications shall cover all cases.
4. Numerical values in requirements specifications shall include physical units if applicable.
Panel 2 shows a sample checklist for code review. Here each checklist item is a question. The checklist for code review tends to be big and the items can be grouped into different areas of concern such as control, input/output, performance, etc.
For object-oriented code reading, the areas of concern can be aligned with the features of object-oriented programmings, such as encapsulation, inheritance, and polymorphism.
PANEL 2: A SAMPLE (PARTIAL) CHECKLIST FOR CODE REVIEW
1. Have resources (e.g., memory, file descriptor, database connection) been properly freed?
2. Are shared variables protected/thread-safe?
3. Is logging implemented?
4. Are comments updated and consistent with the code?
5. Is data unnecessarily copied, saved, or reloaded?
6. Is the number of cores checked before spawning threads?
Checklists are typically developed based on the analysis of past team defects in the same or different projects. They can also be based on others’ experience, but customized for one’s project team.
Checklists can be tailored to an individual as well. Individuals can have a personal defect checklist that compiles the problematic areas in which the individual tends to make mistakes. The Personal Software Process prescribes the use of such a personal checklist.
Compared to ad hoc reading, checklist-based reading reduces the variability of reading results, i.e., the results are less dependent on the reviewers’ skills, knowledge, and experience. It also ensures coverage of important areas and is thus effective at detecting omissions.
As recent research suggests, in addition to supporting defect detection, checklist-based reading improves software understanding and comprehension, which makes the subsequent software modification easier.
As the Paradox of Structure suggests, checklist-based reading might detect the defects of particular types covered by the checklist, i.e., those previously encountered from which the checklist was created. Therefore insidious defects, which require a deep understanding of the artifacts, are often missed.
The other disadvantages of checklist-based reading are related to the checklist itself. The checklist often includes generic items that may not be applicable to the project or the artifact.
A lengthy checklist may overwhelm readers. The “best practices of checklists” discussed later can remediate the issues with generic and lengthy checklists.
Checklists with Guidance
Checklist-based software reading is considered semi-structured, as it does not tell the reader how to use the checklist and there is little verification that the reader actually conducts an analysis relating to checklist items.
To remedy this shortcoming, active guidance can be added to the traditional checklist-based reading. Winkler et al. focused on design document inspection. The readers are given a tailored checklist that provides active guidance. The specific checklist leads the reader through the inspection process.
PANEL 3: AN ACTIVE GUIDANCE USED WITH CHECKLIST-BASED READING
1. Analyze requirements and system functions in the requirements document.
2. Prioritize the correlations between requirements and system function according to the reader’s own knowledge of the application domain.
3. Trace the requirements and functions in the design document according to their priorities or importance.
4. Report any differences as defects.
5. Pick the next most important requirement and repeat steps 3 and 4, until done.
This checklist with active guidance promotes a deep understanding of the specification document, the system requirements, and system functions, which enables the readers to uncover more defects in the design document.
It also allows the reader to focus on more important requirements due to the prioritization performed before the start of the inspection. Thus it uncovers crucial defects. Alternatively, guidance on how to use a checklist can be implicitly built into the checklist itself.
Best Practices of Checklists
Over the years, people have come up with a few heuristics on what makes good checklists and what to avoid in checklists. We call this general advice “the best practices of checklists”:
Checklists should be periodically revised based on historical data to include new items and remove outdated items. If the checklists are updated regularly, the reviewers may be more likely to read and use the checklists. If the checklists are updated to reflect the most common issues, more likely it will help reviewers in finding defects.
Checklists should be concise and fit on one page. A reviewer is less likely to flip through multiple pages. The single-paged checklist can be hung on the office wall or put on the desk close to where the reviewer is examining the software artifact.
Checklist items should not be too general. A general item is hard to apply or subject to varying interpretation.
Checklist items should not be used for conventions which are better checked or enforced with software tools.
Many software reading techniques assume that the reader will read the entire document, be it a requirements specification, design document, source code file, or test case. As a matter of fact, there are a few situations where developers typically deal with the difference between the existing software artifact and the one that is being modified:
Software applications are frequently released incrementally via different projects. New features and defect fixes are added in a later release. Many software artifacts including requirements, design, code implementation, and testing can be reused. New requirements, design, code, and test cases are typically embedded in the existing documents.
Even for software applications that are started from scratch, an iterative and incremental development process may be adopted. New features and defect fixes are implemented in a later iteration or sprint. Along the way, documentations are also written, revised, expanded, and reviewed incrementally.
Regardless of how the project is structured and what development process is used, a software artifact is reviewed, a rework might be required, and the updated document might be subject to re-review again.
In all those situations, there are at least two versions of the software artifacts available. It is not worthwhile for the reader to read the entire document from beginning to end each time, particularly if the reader is already familiar with the previous versions.
There is no published reading technique to deal with the situations above. We adopt the instructions in Panel 4, so that the reader can focus on the changes and assess whether those changes meet the intentions without negative side effects.
We call it differential reading, as it draws readers’ attention to the part of changes and focuses the changes in the context. Don’t be deceived by the number of changes, however. For source code, a simple innocent change may have significant ramifications.
There are many tools to highlight the changes in a document. Microsoft Word is frequently used to capture requirements and describe designs. To keep track of changes across different revisions of the document, one can enable “Track Changes”.
There are many tools to track the source code changes. For example, the Subversion client TortoiseSVN is integrated with the Windows Explorer and the Diff command from the context menu can highlight the code change against the repository.
PANEL 4: INSTRUCTIONS FOR DIFFERENTIAL READING
1. Get familiar with the existing software artifact if not already.
2. Understand what drives the modification of the existing document, be it new features, defect fixing or some other nature.
3. Use a diff tool to highlight what has been changed in the newly updated document.
a. Pick a block of changes to focus on and read the surrounding text where the changes are part of.
b. Pay attention to all change types: addition, deletion, and modification.
c. If the amount of change is significant, consider it new and use any reading techniques available to you or agreed upon by the team.
d. If the amount of change is not significant:
i. Check whether the change is consistent with the change driver, the assumptions or styles the document already took, etc. If not, log as a defect.
ii. Check if there are any side effects. If there is a side effect, log as a defect.
4. Repeat step 3 until all changes have been read and analyzed.