14 21 AIML, Dynamic Analysis, OSS, Dependencies
1) SE Ethics¶
-
Human Flourishing: happiness and life satisfaction, mental and physical health, meaning and purpose, character and virtue, and close social relationships
-
Algorithmic bias
Three questions to promote human flourishing
-
Does my software respect the humanity of the users?
-
Humane design guide - Consider 6 human sensitivities: Emotional, Attention, Sense making, Decision making, Social Reasoning, and Group Dynamics
-
- In what ways does your product/feature currently engage Human Sensitivities?
-
- How might your product/feature support or elevate human sensitivities?
-
- Action Statement
-
Does my software amplify positive behavior, or negative behavior for users and society at large?
-
should have real humans to monitor and respond to your community.
-
should have community policies about what is and isn’t acceptable behavior.
-
should have accountable identities.
-
should have the technology to easily identify and stop bad behaviors.
-
should make a budget that supports having a good community, or you should find another line of work
-
Will my software’s quality impact the humanity of others?
-
Example: Malpractice vs negligence
2) ML in SE¶
ML Development: Observation, Hypothesis, Predict, Test, Reject or Refine Hypothesis
Three Fundamental Differences between ML and SE
-
Data discovery and management:
-
SE: Emphasis on designing algorithms and software logic; ML: More effort on discovering, transforming data
-
Customization and Reuse
-
SE: Can reuse modules; ML: Difficult to reuse same model as specific for certain tasks and datasets
-
No modular development of model itself
-
SE: Can be monolithic or microservices; ML: ML models are usually monolithic
Feature Engineering: Identify parameters of interest that a model may learn on
- Convert data into a useful form; Normalize data; Include context; Remove misleading things
Evaluation: Assess prediction accuracy on unseen data using metrics like false positives vs. false negatives for binary predictors (classification), error distance for numeric (regression), and top-K relevance for ranking.
Mistakes & Mitigation¶
-
Mistakes: System outage, Model outage, model not tested, deployment and updates not reliable, file corrupt, Model errors
-
Perform Hazard Analysis: What’s the worst thing that can happen? Backup strategy? Undoable? Nontechnical compensation?
Mitigating Mistakes
-
Investigating in ML e.g., more training data, better data, better features, better engineers
-
Less forceful experience
-
Instead of making the system automatically take actions, ask for user input (e.g., prompt the user for confirmation).
-
Allow options to turn off certain features if they aren't working as expected.
-
Adjust learning parameters e.g., more frequent updates, manual adjustments
-
Guardrails e.g., heuristics and constraints on outputs
-
Override errors e.g., hardcode specific results
Quality attributes of ML models¶
-
Interpretability (Explainability)
-
Model debugging
-
Auditing - fairness, safety, security
-
Trust
-
Actionable insights to improve outcomes - Helps extract useful knowledge from the model’s decisions to take better actions.
-
Regulation
-
Fairness
-
Group unaware: Ignore group data (one group could get excluded)
-
Group thresholds Different rules per group (rules differ by group)
-
Demographic parity: Same percentage in pool as outcomes (might result in random selection)
-
Equal opportunity: Equal chance out positive outcomes regardless of groups (focus on individual, rules differ per group)
-
Equal accuracy Equal chance of both outcomes per group (focus on group, rules differ per group)
-
Inference latency
-
Inference throughput
-
Scalability
Using LLMS¶
Language Modeling: Measure probability of a sequence of words (Text sequence → Most likely next word)
Problems with LLMS¶
-
Hallucinations: Factually Incorrect Output
-
High Latency: Output words generated one at a time, Larger models = slower
-
Output format: Hard to structure output (e.g. extracting date from text)
-
LLMs generate text in a natural language format, which can make it difficult to extract structured information like dates, numbers, or specific details.
Using LLMs for different daily tasks & Evaluation of Suitability¶
-
Alternative Solutions: Ask if there's a more reliable or specialized tool than the LLM for your task.
-
Example: For type-checking Java code, a Java compiler is a better choice because it's deterministic and built for the task.
-
Error Probability: Estimate how often the LLM will give correct results for your problem. This may improve as LLMs get better but varies by task.
-
Example: Grading mathematical proofs may have a higher error probability due to the complexity of logic.
-
Risk Tolerance: Assess the consequences of mistakes made by the LLM.
-
Example: Errors in answering emergency medical questions can be life-threatening, so tolerance for mistakes is low.
-
Risk Mitigation Strategies: Identify ways to minimize errors or their impact.
-
Example: For unit test generation, review and validate the LLM's output manually to catch mistakes before deployment.
Using LLMs as part of the codebase¶
→ Textual Comparison Tests to check for accuracy:
-
Syntactic Checks (correct structure and grammar);
-
Embeddings (Numeric representations of words that capture their meaning in context);
-
Cosine Similarity (Measures how similar two embeddings are based on the angle btw them)
Improving LLMs¶
-
Prompt Engineering
-
Chain of Thought Prompting
-
Fine-Tuning
3) Dynamic Analysis / Advanced Testing¶
Correctness – Static Analysis and Testing¶
Robustness – Fuzzing¶
-
What: Feed invalid, random, or unexpected inputs to a program to uncover vulnerabilities by observing crashes or abnormal behavior.
-
Common Bugs:
-
Causes: Invalid argument handling, type casting errors, untrusted code execution.
-
Effects: Buffer overflows, memory leaks, division-by-zero, use-after-free, assertion failures.
-
Impact: Affects security, reliability, performance, and correctness.
Performance – Profiling¶
-
Why? Identify performance bugs (e.g., slowdowns, degradation, cross-version/platform issues).
-
Challenges: Define "fast enough," set thresholds, ensure reliable measurements. Bugs are hard to diagnose (e.g., system load, hardware, network, workflows) but impact user experience heavily.
-
Profiling: Measures execution time and memory to find slow code (e.g., a "resize image" function taking 80% of time).
-
Tracing: Tracks event sequences to debug crashes or unexpected behaviors (e.g., identifying the step causing failure).
Scalability – Stress testing¶
-
Why? To test the system's behavior beyond the limits of normal operation. Can apply at any level of system granularity. Often to test the error-handling capabilities of the application.
-
How? throw large amounts of input / requests and see how the program behaves
-
What it tests: Its breaking point. How well it recovers after failure.
Resilience – Soak testing¶
-
Why: Test system stability under extended, slightly above-normal load to detect issues like memory leaks or resource exhaustion. Useful for major releases or infrastructure changes, despite being time-consuming.
-
How: Apply continuous load over time and observe performance.
-
Tests: Long-term reliability and resilience to prolonged usage without memory leaks or performance degradation.
-
Example: Simulating 500 users on an e-commerce site for 48 hours to find slowdowns or errors.
Reliability – Chaos Engineering¶
-
Why? To simulate a large-scale deployment and induce random failures in various components – Test in Production with Chaos Engineering
-
How:
-
Define baseline: Understand normal behavior (e.g., response time, uptime).
-
Induce failures: Turn off servers, break connections, or overload traffic.
-
Observe response: Check if the system recovers gracefully and quickly enough.
-
Fix weaknesses: Improve failover, load balancing, or other vulnerabilities.
-
Benefits: Identify weak points before inevitable real failures; enhance system reliability.
-
Examples:
-
Google: Terminate networks or data centers to uncover hidden issues.
-
Netflix: Use Chaos Monkey to randomly disrupt AWS instances or network links; monitor Stream Starts per Second (SPS) for availability.
Usability – A/B testing¶
-
What: Controlled experiment with two variants (A = current system, B = new version).
-
How: Randomly assign users to A or B and compare outcomes.
-
Use: Common for testing web or GUI changes like ads or design layouts.
4) Feedback¶
-
Feedback is composed of: 1. Appreciation; 2. Coaching (knowledge, skill); 3. Evaluation: where you stand, aligns expectations, and informs decision making
-
Effective feedback: Goal-oriented; Actionable, Specific and Timely; Supportive, Truthful and Useful; Delivered privately in a neutral, non-judgemental tone
-
Developmental Feedback
- Situation: Set the context. Help the person focus on what you are referring to.
- Behavior: Focus on the objective behavior to be repeated or changed.
- Impact: Share the direct impact of the behavior.
- Alternative: Share an alternative behavior to use next time.
5) Technical Debt¶
-
Internal Quality: code well structured; code understandable; codebase documented
-
External Quality: software does not crash; software meet requirements; UI well designed
-
Failure: Deviation of the component or system from its expected delivery, service or result. Manifested inability of a system to perform required function.
-
Fault/defect: Flaw in component or system that can cause the component or system to fail to perform its required function. A defect, if encountered during execution, may cause a failure of the component or system.
-
Error: A human action that produces an incorrect result.
-
What are bugs? Defects + Error
Principles of Testing¶
-
Avoid the absence of defects fallacy: Testing reveals defects but cannot guarantee their absence or achieve 100% detection.
-
Exhaustive testing is impossible: cannot test all inputs.
-
Start testing early: Begin testing early to guide design, get quick feedback, and catch bugs when they’re cheapest and least damaging to fix.
-
Defects are usually clustered: Focus testing on "hot" components with frequent changes, tricky logic, or high uncertainty, as defects tend to concentrate there.
-
The pesticide paradox: Repeating the same tests on evolving software misses subtler bugs, requiring varied testing methods for effectiveness.
-
Testing is context-dependent: Test goals and metrics depend on the specific requirements and acceptable risk levels of the project.
-
Verification is not validation: Verification checks if the software meets specifications, while validation ensures it meets the user's real needs.
Test Design Techniques¶
- Exploratory Testing: Testing without scripts, using intuition to find bugs.
- Pro: Flexible, good for unexpected issues.
- Specification-Based ("Black Box") Testing: Tests based on specifications, ignoring code details.
- Pro: Avoids bias, robust to code changes, requires no code familiarity.
- Structural ("White Box") Testing: Tests designed with full code knowledge, focusing on structure.
- Pro: Ensures thorough code coverage.
- Exhaustive Testing:
- Issues: Need to be small enough to finish in a useful amount of time + Need to be large enough to provide a useful amount of validation
- Alternatives: Heuristics (Focus on the most likely or important scenarios)
- Equivalence Partitioning:
- What: Group inputs into equivalence classes with similar behavior and test one per class.
- Equivalence classes derived from specifications (e.g., cases, input ranges, error conditions, fault models)
- Pro: Reduces test cases, requires domain knowledge.
- Boundary-value analysis
- Key Insight: Errors often occur at the boundaries of a variable value
- For each variable, select: minimum, min+1, medium, max-1, maximum; possibly also invalid values min-1, max+1
- Pairwise testing (can find 50% - 90% defects)
- Key Insight: some problems only occur as the result of an interaction between parameters/components
- E.g.: The bug occurs for senior citizens traveling on weekends (pairwise interaction)
Technical Debt¶
Organizations need to address the following challenges continuously: 1. Recognizing technical debt 2. Making technical debt visible 3. Deciding when and how to resolve debt 4. Living with technical debt
Types of Technical Debt:¶
- Deliberate:
- Reckless: "We don’t have time for design."
- Prudent: "We must ship now and deal with consequences later."
- Inadvertent:
- Reckless: "What’s layering?"
- Prudent: "Now we know how we should have done it."
6) Open Source Software¶
Why Go Open Source (vs. Proprietary) ?¶
-
Advantages
- Transparency, gain user trust
- Many eyes: crowd-source bug reports and fixes
- Security: more likely for vulnerabilities to be quickly identified
- Community and adoption: get others to contribute features, build stuff around you, or fork your project
-
Disadvantages
- Reveal implementation secrets
- Many eyes: users can find faults more easily
- Security: more likely for others to find vulnerabilities first
- Control: You may not be able to influence the long-term direction of your platform
License & Law¶
| License/Law | Key Purpose | Key Features |
|---|---|---|
| Copyright | Protects expressions of work | Automatic for books, music, code; exceptions for trivial ideas. |
| Intellectual Property (IP) | Protects ideas and inventions | Patents, machine designs, algorithms; licenses and expiry dates. |
| GNU General Public License (GPL) | Ensures software freedom | Four freedoms: use, change, share, and share modifications. Requires derivatives to use GPL. |
| Risks of GPL (Copyleft) | Enforces openness but can complicate usage | Derivatives must use the same license; companies may avoid GPL due to viral effects. |
| LGPL (Lesser GPL) | Allows use of libraries in proprietary code | Dynamic linking allowed; no derivative restrictions. |
| MIT License | Simple, commercial-friendly open-source | Must credit author; no liability; no restrictions on usage. |
| Apache License | Industry-friendly open-source | Allows use without source code sharing; no trademark permissions. |
| BSD License | Minimal restrictions | Requires copyright notice; no liability; allows modifications freely. |
| Creative Commons (CC) | Licensing for non-code content | Used for datasets, images, videos, documentation, etc. |
7) Dependency Management¶
What is a Dependency? - Core of what most build systems do. - Example: Foo->Bar: To build Foo, you need to built Bar. "Bar" is a dependency of "Foo." - Scopes: - Compile: Use Bar’s classes/functions during compilation. - Runtime: Use abstract APIs provided by Bar during execution (e.g., logging, database). - Test: Use Bar for testing only (e.g., JUnit, mocks). - Internal: Built/maintained by your organization. - External: Downloaded via package managers - Dependencies are typically hosted on servers and downloaded using package managers, requiring unique identifiers for each package. - Most package managers support custom repositories, which require proper management.
Dependency Pinning vs. Floating¶
- Pinning Dependencies (e.g. 1.5.3) - Specific version of the dependency. Frozen in time.
- Pro: Reproducible builds; Stable network effects
- Con: Can become vulnerable due to dependency bugs; Have to keep updating dependents as dependencies evolve
- Floating Dependencies (e.g. 1.x) - Each build will pull the latest available libFoo version
- Pro: Latest security patches & bug fixes; Less manual maintenance
- Con: Flaky builds (breaking changes) ; Floats leak transitively (A pin to B floating C; then A still sees changing version of C)
Types of Dependencies¶
-
Transitive Dependencies: a dependency that your software indirectly relies on (a dependency of your dependency)
-
Diamond Dependencies: multiple intermediate dependencies have the same transitive dependency.
- Problem: Different intermediate dependencies may require different versions of the same transitive dependency.
-
Cyclic Dependencies: Avoid at all costs, but sometimes unavoidable or intentional (e.g. GCC is written in C - needs a C compiler; Apache Maven uses the Maven build system)