04 Metrics and Measurement

Software Engineering: Principles, practices (technical and nontechnical) for confidently building high-quality software.

Measurements and Metrics¶

Measurement: empirical, objective assignment of numbers, according to a rule derived from a model or theory, to attributes of objects or events with the intent of describing them

Software Quality Metrics¶

Entities: software product, modules, software development process, people
Software qualities: Functionality (e.g., data integrity), Scalability, Security, Extensibility, Bugginess, Documentation, Performance, Installability, Availability, Consistency, Portability, Regulatory compliance
Process qualities: Development efficiency, Meeting efficiency, Conformance to processes, Reliability of predictions, Fairness in decision making, Regulatory compliance, On-time release
People qualities:
- Developers: Maintainability • Performance • Employee satisfaction and well-being • Communication and collaboration • Efficiency and flow • Satisfaction with engineering system • Regulatory compliance
- Customers: Satisfaction • Ease of use • Feature usage • Regulatory compliance
Non-trivial qualities: Software: Code elegance, Code maintainability; Process: Fairness in decision making; Team: Team collaboration, Creativity

McNamara fallacy¶

Measure whatever can be easily measured.
Disregard that which cannot be measured easily.
Presume that which cannot be measured easily is not important.
Presume that which cannot be measured easily does not exist.

Code Complexity¶

Lines of Code (LOC): A simple measure of how large a codebase is, but not always indicative of complexity or quality.
Halstead Volume: Measures the size and complexity of a program based on the number of operators and operands.
Cyclomatic Complexity: Measures the number of independent paths through a program and can indicate the number of test cases required for full coverage.
Object-Oriented Metrics: Includes metrics like number of methods per class, depth of inheritance, and coupling between classes.
“Allowable mass” proxy: More complex the software task -> consumes bigger part of the mass budget (in grams)

How to use measurements and metrics?¶

Goal-Question-Metric (GQM) Framework¶

Goal: Define what you want to achieve.
Questions: Determine what you need to answer to know if your goal is met.
Metrics: Identify what measurements are necessary to answer the questions.
Examples:
- Goal: Evaluate the effectiveness of a coding standard from team's perspective.
- Questions: How comprehensible is the coding standard? What is its impact on team productivity?
- Metrics: Number of revisions required to achieve compliance, team members' understanding, code size.

Defining Goals¶

Purpose: The reason or aim for defining the goal. It can be to improve, evaluate, or monitor something.
- Example: Improve the reliability of the software.
Issue: The specific aspect or challenge you're addressing. Common issues include reliability, usability, or effectiveness.
- Example: Address the issue of poor usability in the current software design.
Object: The target of the goal. This could be the final product, a specific component, a process, or an activity.
- Example: Evaluate the usability of the user interface component.
Viewpoint: The perspective from which the goal is assessed. It can be from the perspective of any stakeholder (e.g., user, developer, manager).
- Example: Measure the usability from the perspective of end users.

Measurement for Decision Making¶

Key Decisions Metrics Help With:
- Fund project?: Should we allocate resources and continue funding this project?
- More testing?: Is the current testing sufficient, or do we need additional tests to ensure quality?
- Fast enough? Secure enough?: Is the system performing efficiently and securely?
- Code quality sufficient?: Is the codebase meeting the desired standards of quality?
- Which feature to focus on?: Deciding which features require the most attention and improvement based on their importance.
- Developer bonus?: Should performance metrics influence developer bonuses?
- Time and cost estimation? Predictions reliable?: Are our time and cost estimations accurate, and are they backed by reliable data?

Trend Analyses¶

Monitoring Test Result Trends:![[Screen Shot 2024-10-06 at 21.00.37.png]]
- The chart shows trends in test results over time, with the green areas representing passing tests and the red spikes indicating failures.
- Regular trend analysis helps track improvements, regressions, or inconsistencies in the software over time.
- Purpose: Trend analysis enables early detection of performance degradation or bugs and helps maintain a stable codebase.

Benchmarking Against Standards¶

Monitor and Compare: ![[Screen Shot 2024-10-06 at 21.01.19.png]]
- Monitor multiple projects or modules: Keep track of different projects or code modules to establish typical values for metrics like test-to-code ratios.
- Report deviations: Identify and report when a project deviates from established benchmarks or standards.

Case study: Autonomous Vehicle Software¶

By what metrics can we judge AV software (e.g., safety)? 1. Code Coverage - Definition: Code coverage refers to the amount of code that gets executed during testing. - Types of Coverage: - Statement coverage: Ensures individual lines of code are executed. - Line coverage: Measures whether specific lines of code are reached during testing. - Branch coverage: Ensures all possible branches in if-else conditions are covered. - Example: 75% branch coverage means that 3 out of 4 possible outcomes in an if-else statement have been tested.

Model Accuracy
Definition: This metric evaluates how accurately the machine learning models in AV software recognize objects, make decisions, or navigate environments.
Training: The models are trained using labeled data, which includes sensor data (e.g., from cameras, radar) and corresponding ground truth.
Testing: Accuracy is computed on a separate labeled test set.
Example: If the model has 90% accuracy, it means that the object recognition is correct for 90% of the test inputs.
Failure Rate
Definition: Failure rate tracks the frequency of crashes or fatalities involving autonomous vehicles.
Measurement Units: per 1,000 rides, per million miles, or per month.
Mileage
Definition: Mileage is a measure of how many miles an autonomous vehicle has driven, which provides data on the software's real-world performance over time.
Importance: The more miles an AV drives, the more data it accumulates to improve its algorithms, reduce failure rates, and increase reliability.
Example: Waymo's vehicles have accumulated over 15 billion autonomously driven miles in simulation and over 20 million real-world miles on public roads. Such data is used to demonstrate the safety and reliability of AV software.

Risks and challenges¶

The streetlight effect: A known observational bias. People tend to look for something only where it’s easiest to do so.
Making inferences: Provide a theory (from domain knowledge, independent of data), Show correlation, Demonstrate ability to predict new cases (replicate/validate)
Spurious Correlations:
- Confounding variables.
- Berkson's paradox: when a dataset is biased or filtered in a certain way.
- Survivorship bias: occurs when only the successes (or survivors) are considered in analysis. WWII plane example: focus on the areas without bullet holes.
Measurement Reliability: The goal is to reduce uncertainty and increase consistency in measurements, which often requires multiple observations to account for variability.

Metrics and incentives¶

Goodhart’s Law: When a measure becomes a target, it can lose its effectiveness as a measure (e.g., focusing solely on lines of code might encourage bad coding practices).
Incentivizing Productivity: Basing developer rewards on metrics (like number of bugs fixed or lines of code written) can lead to undesirable behaviors, such as writing unnecessary code or overlooking bugs.