Let’s tackle problems in software development in an automated, data-driven and reproducible way!
As developers, we often feel that there might be something wrong with the way we develop software. Unfortunately, a gut feeling alone isn’t sufficient for the complex, interconnected problems in software systems.
We need solid, understandable arguments to gain budgets for improvement projects or to defend us against political decisions. Though, we can help ourselves: Every step in the development or use of software leaves valuable, digital traces. With clever analysis, these data can show us root causes of problems in our software and deliver new insights – understandable for everybody.
If concrete problems and their impact are known, developers and managers can create solutions and take sustainable actions aligned to existing business goals.
In this meetup, I talk about the analysis of software data by using a digital notebook approach. This allows you to express your gut feelings explicitly with the help of hypotheses, explorations and visualizations step by step.
I show the collaboration of open source analysis tools (Jupyter, Pandas, jQAssistant and, of course, Neo4j) to inspect problems in Java applications and their environment. We have a look at performance hotspots, knowledge loss and worthless code parts – completely automated from raw data up to visualizations for management.
Participants learn how they can translate their unsafe gut feelings into solid evidence for obtaining budgets for dedicated improvement projects with the help of data analysis.
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Software Analytics with Jupyter, Pandas, jQAssistant and Neo4j
1. Software Analytics
with Jupyter, Pandas,
jQAssistant and Neo4j
Identifying Problems in Software Development
with Data Analysis
Markus Harrer
@feststelltaste
Neo4j Online Meetup
23rd November 2017
2. Markus Harrer
Software Development Analyst
Key Activities
Java Development, Data Analysis in Software
Development
Areas of Interest
Clean Code, Agile, Software Archeology, Software
Revival, Epistemology, Cognitive Psychology
@feststelltaste feststelltaste.de meetup@markusharrer.de
About me
18. Software Analytics is...
“... analytics on software data
for managers and software engineers
with the aim of empowering software
development individuals and teams
to gain and share insight from their data
to make better decisions.”
Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
19. Frequency
Questions
Use standard tools
for everyday‘s questions
Use Software Analytics to
tackle high-risk problems
Risk/Value
Right Insights for better Decisions
Adopted from Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine
20. Types of Software Data
Communitychrono-
logical
Runtimestatic
=> Problems are interconnected, so should be the data sources!
22. Why does it work now?
• Domain-Driven Design brings business language into code
• Data Science enables problem analysis for developers
• New Tools can create high-level concepts
Code Problems
Business Language
abstract
detailed
Problems can be connected to concepts in business terms!
23. My impl of Software Analytics
How can Developers use the Power of Data Analysis in their Daily Work?
24. What can you do today?
• Visualize developer contributions over time
• Identify unused, error-prone or abandoned code
• Create a code and problem inventory for legacy systems
• Find performance bottlenecks by analyzing call trees
• Visualize unwanted dependencies between modules
Make specific problems in your software system visible!
e. g. Race Conditions, Architecture Smells, Build Breaker, Programming Errors
25. Choose known tools
or tools for plan B*
Python
Neo4j, Pandas, Spark
* want to learn / profit from in near future
on a suitable platform.Jupyter, Zeppelin
=> Tools shouldn‘t stand in the way!
26. Notebookan open dialog with data
Context
Idea
Analysis
Conclusion
Problem
Context documented
Ideas, assumptions and
heuristics communicated
Preprocessing justified
Calculations understandable
Summaries conclusive
Everything automated
28. Python
Data Scientist's Best Friend: Easy, effective, fast programming
language
Pandas
Pragmatic Data Analysis Framework: Great data structures &
integrations with machine learning libraries
D3
Visualization Library for Data-Driven Document: Just beautiful,
interactive graphics!
Jupyter
Interactive Notebook: Central hub for data analysis and
documentation
Basic Tooling
30. Advanced Tooling: jQAssistant & Neo4j
Main Ideas
• Scan software structures
• Store data in Neo4j database
• Execute queries
• Examine relationships
• Add high-level concepts
• Validate rules via constraints
• Generate reports
31. jQAssistant – Use Cases
Living,
self-validating
architecture
documentation
32. jQAssistant – Use Cases
Java Class
Business‘ Subdomain
Living,
self-validating
architecture
documentation
+
Find design &
code smells
+
Add business
perspectives
33. Neo4j Schema for Software Data
Node Labels
File
Class
Method
Commit
Relationship Types
CONTAINS
DEPENDS_ON
INVOKES
CONTAINS_CHANGE
Properties
name
fqn
signature
message
File Java
key value
name “Pet”
fileName “Pet.java”
fqn “foo.bar.Pet”
TypeFile
34. Cypher Query
Example
Spring PetClinic
“Give me all database objects”
MATCH
(t:Type)-[:ANNOTATED_BY]->()-[:OF_TYPE]->(a:Type)
WHERE
a.fqn="javax.persistence.Entity"
RETURN t AS JpaEntity
37. Example JaCoCo Pandas D3
Production Coverage
1. Measure code coverage in
production
2. Calculate ratio of covered
lines to all lines
3. Visualize “usage hotspots”
with hierarchical bubble chart
https://www.feststelltaste.de/visualizing-production-coverage-with-jacoco-pandas-and-d3/
38. Example Git Pandas D3
Knowledge Island*
1. Take Git log with numstats
2. Calculate proportional
contributions for each
source code file per author
3. Visualize “ownership” with
hierarchical bubble chart
* heavily inspired by Adam Tornhillhttps://www.feststelltaste.de/knowledge-islands/
39. Example jQAssistant Neo4j Pandas D3
Dependency Analysis between Bounded Contexts
https://www.feststelltaste.de/a-graphical-approach-towards-bounded-contexts/
40. Example jQAssistant Neo4j Pandas D3
Dependency Analysis between Bounded Contexts
MATCH
(s1:Subdomain)<-[:BELONGS_TO]-
(type:Type)-[r:DEPENDS_ON*0..1]->
(dependency:Type)-[:BELONGS_TO]->(s2:Subdomain)
RETURN s1.name as type, s2.name as dep, COUNT(r) as number
https://www.feststelltaste.de/a-graphical-approach-towards-bounded-contexts/
Subdomains => Bounded Contexts that have meaning to business!
41. Example JProfiler jQAssistant Neo4j Pandas
Mining performance hotspots
1. Record Call Trees
2. Identify which parts of
the application code
is responsible for most
of the DB operations
3. Trace problems back
to the root causes
https://www.feststelltaste.de/mining-performance-hotspots-with-jprofiler-jqassistant-neo4j-and-pandas-part-1-the-call-graph/
Requests
Incoming
Outgoing
SQL Calls
42. Example jQAssistant Neo4j Pandas
Recursive Method Calls
MATCH (m:Method)-[:INVOKES*]->(m)
RETURN m
43. Example jQAssistant Neo4j Pandas
Recursive Method Calls to Database
MATCH (m:Method)-[:INVOKES*]->(m)
-[:INVOKES]->(dbMethod:Method)
<-[:DECLARES]-(dbClass:Class)
WHERE dbClass.name = "Database"
RETURN m, dbMethod, dbClass
44. Example jQAssistant Neo4j Pandas
Identify possible Race Conditions
public class OwnerController {
...
private static int ownersIndexes;
MATCH
(c:Class)-[:DECLARES]->(f:Field)<-[w:WRITES]-(m:Method)
WHERE
EXISTS(f.static) AND NOT EXISTS(f.final)
RETURN c.name, f.name, w.lineNumber, m.name
static = same field for
all instances of that class
46. Summary
• Tooling for data analysis in software development is here!
• First analyses are easy to do using tools you already know
• Specific in-depth analysis are powerful and worthwhile
• Connection between business and developers is possible!
• Problems can be attached to code that is business-related
• Making the impact of risk-taking visible is a must-have to improve!
• Jupyter/Pandas & jQAssistant/Neo4j are my favorites
• Provide many ways for identifying problems
• Help to figure out solutions as well!