UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centered Product Strategy For Emerging Spaces

FLYING BLIND ON A ROCKET CYCLE
PIONEERING EXPERIENCE-CENTERED PRODUCT
STRATEGY FOR EMERGING SPACES

JOE LAMANTIA
Currently: VP Design & Development @ Bottomline Technologies
Previous 20 years: end-to-end customer experience, all stages of product and
service development, and digital / business transformation, focusing on
emerging business and technology.
Archetype(s): Sometime Entrepreneur / Proto-academic / Arm-chair Pro Cyclist
https://www.linkedin.com/in/digitaljoelamantia/
@mojoe
JoeLamantia.com [joelamantia.net]

!3
Businesses around the world depend on Bottomline Technologies
(NASDAQ: EPAY) solutions to help them make complex business
payments simple, smart and secure, including some of the world’s largest
banks, and private and publicly traded companies.

This case study describes building a learning-
driven strategy capability to guide an
adventurous product development group
focused on the new domains of big data
analytics and machine intelligence.
I’ll share the outcomes of our efforts to launch
new products chartered directly around
customer experience value; outline the
methods, tools, and perspectives that powered
product discovery and strategic planning; share
a framework and patterns for identifying and
understanding emerging domains; and review
the application of this toolkit to new situations.

ROADS?!
WHERE WE’RE GOING, WE DON’T
NEED ROADS…!

DATA SCIENCE
MACHINE INTELLIGENCE

BUSINESS STRATEGY IS ABOUT
IDENTIFYING YOUR BUSINESS
OBJECTIVES AND DECIDING WHERE TO
INVEST TO BEST ACHIEVE THOSE
OBJECTIVES.
Marty Cagan
http://svpg.com/business-strategy-vs-product-strategy/

THE PRODUCT STRATEGY SPEAKS TO
HOW YOU HOPE TO DELIVER ON THE
BUSINESS STRATEGY.
Marty Cagan

http://rethinkingproductmanagement.blogspot.com/2012/06/product-strategy-what-does-it-mean-need.html

http://melissaperri.com/2016/07/14/what-is-good-product-strategy/
PRODUCT STRATEGY

http://mashable.com/2015/02/13/ﬁfty-shades-of-grey-mad-libs/#7Te9vMnONqqF

OPPORTUNITY ASSESSMENT
“I ASK PRODUCT MANAGERS TO ANSWER TEN FUNDAMENTAL QUESTIONS”
1. Exactly what problem will this solve? (value proposition)
2. For whom do we solve that problem? (target market)
3. How big is the opportunity? (market size)
4. What alternatives are out there? (competitive landscape)
5. Why are we best suited to pursue this? (our differentiator)
6. Why now? (market window)
7. How will we get this product to market? (go-to-market strategy)
8. How will we measure success/make money from this product? (metrics/revenue strategy)
9. What factors are critical to success? (solution requirements)
10.Given the above, what’s the recommendation? (go or no-go)
http://svpg.com/assessing-product-opportunities/
Assessing
Product
Opportunities
by Marty Cagan | Dec 13, 2006

PRODUCT DISCOVERY
MODERN PRODUCT DISCOVERY
• Introduction [:26]
• Modern Product Discovery [:54]
• The Evolution of Modern Product Discovery [4:15]
• The Agile Manifesto [7:06]
• The Rise of User Experience Design [8:47]
• The Lean Startup: Eric Ries [9:49]
• The Jobs-To-Be-Done Framework: Clayton Christensen and Anthony Ulwick [10:42]
• OKRs and Design Sprints [12:12]
• The Goal of Modern Product Discovery [14:27]
• Putting Discovery Practices Into Context: The Opportunity Solution Tree [21:32]
• The Future of Product Discovery [29:42]
https://www.producttalk.org/2017/02/evolution-product-discovery/
The Evolution of
Modern Product
Discovery

February 8, 2017 by Teresa Torres 9
Comments

PRODUCT STRATEGY CHARTS A DESIRED
SET OF COURSES THROUGH THE SPACE
OF POSSIBLE PRODUCTS FOR A DOMAIN
Joe Lamantia

OPPORTUNITY
ASSESSMENT
PRODUCT
DISCOVERY
INVEST…?
PORTFOLIO PLANNING

DEEP STRUCTURE
CHANGE VECTORS
EARLY SIGNALS
INFLECTION POINTS
EMERGING SPACES
HOLISTIC EXPERIENCES

EACH ASPECT =
POTENTIAL
LEVERAGE POINT
FOR STRATEGIC
ENGAGEMENT

DEEP STRUCTURE
ENTERPRISE / B2B
• Business process
• Activity
• Social structure: Organizational model
• Boundaries
• Regulation
• IT / Systems architecture
• Lifecycle
• Flows: capital, information, people
• Frame: shareholder value, social enterprise
CONSUMER / B2C
• Value scheme: wealth, love,
knowledge, safety
• Demographics
• Boundaries
• Mores
• Culture
• Social structure: community / group
• Frame: active lifestyle, sustainability

Information Visibility through Endeca Discovery Applications
MDEX Engine
Rapidly
changing 
data and
content
Large volumes of  
highly attributed records
Structured and 
unstructured
information
Discovery Applications
Intuitive user experience guides
untrained users to discover relationships
in data
Specialized Database
High performance database purpose
built for data-driven search, navigation,
and analytics
Flexible Data Integration
Consolidate structured and unstructured
data to bridge whitespace between
enterprise systems

1. GET IN THE HEADS OF DATA SCIENTISTS
2. BE THE SPIRIT OF THE PRODUCT

CONTINUOUS LEARNING
LEAN STRATEGY

UNDERSTAND & EMPATHIZE
WITH CUSTOMER PERSPECTIVES
>>ARTICULATE CUSTOMER VALUE SOURCES

IDENTIFY BUSINESS IMPLICATIONS
>> INFORM ALL STAGES OF PRODUCT & SERVICE DEVELOPMENT

INVESTIGATING CUSTOMERS
EXPLORING HYPOTHESES ABOUT VALUE

INVESTIGATING CUSTOMERS:
“WHAT DO AP MANAGERS NEED (TO BE
MORE EFFECTIVE (AT IMPROVING
RECONCILIATION PROCESSES))? WHY?”

OUTCOMES
VALUE CHAINS MAP, CUSTOMER
LANDSCAPE / SEGMENTS, PERSONAS,
CAPABILITY MODELS, DOMAIN MODELS

EXPLORING HYPOTHESES ABOUT VALUE:
“AUTOMATION OF RECONCILIATION ACTIVITIES
WILL ENABLE
ACCOUNTS PAYABLE GROUPS IN MID-MARKET
COMPANIES
TO
HANDLE 30% MORE TRANSACTIONS.”

PRODUCT DEVELOPMENT IMPACT
INNOVATION OPPORTUNITIES
PRODUCT HYPOTHESES FOR VALIDATION
PRODUCT CONCEPTS FOR PROTOTYPING
PLANNING GUIDANCE (ROADMAP > EPIC > QA)
DELIVERY GUIDANCE: FEATURES AND FUNCTIONS

INCREMENTAL
EXPLORATORY
PROGRESSIVE
CUMULATIVE
STRUCTURED
ADAPTIVE

DUAL-TRACK AGILE
1. Hypothesis A “Lorum ipsem…”
2. Hypothesis B
3. Investigate A
4. Hypothesis C
5. Investigate B
6. Investigate C

Data Scientist
Square - San Francisco Bay Area
Job Description
Square is hiring a Data Scientist on our Risk team. The Risk team at Square is responsible for enabling growth while mitigating financial loss associated with transactions. We work
closely with our Product and Growth teams to craft a fantastic experience for our buyers and sellers.
Desired Skills & Experience
As a Data Scientist on our Risk team, you will use machine learning and data mining techniques to assess and mitigate the risk of every entity and event in our network. You will
sift through a growing stream of payments, settlements, and customer activities to identify suspicious behavior with high precision and recall. You will explore and understand our
customer base deeply, become an expert in Risk, and contribute to a world-class underwriting system that helps Square provide delightful service to both buyers and sellers. 
 
To accomplish this, you are comfortable writing production code in Java and conducting exploratory data analysis in R and Python. You can take statistical and engineering ideas
from prototype to production. You excel in a small team setting and you apply expert knowledge in engineering and statistics. 
 
Responsibilities
1. Investigate, prototype and productionize features and machine learning models to identify good and bad behavior.
2. Design, build, and maintain robust production machine learning systems.
3. Create visualizations that enable rapid detection of suspicious activity in our user base.
4. Become a domain expert in Risk.
5. Participate in the engineering life-cycle.
6. Work closely with analysts and engineers.
Requirements
1. Ability to find a needle in the haystack. With data.
2. Extensive programming experience in Java and Python or R.
3. Knowledge of one or more of the following: classification techniques in machine learning, data mining, applied statistics, data visualization.
4. Concise verbal and written articulation of complex ideas.
Even Better
1. Contagious passion for Square’s mission.
2. Data mining or machine learning competition experience.
Company Description
Square is a revolutionary service that enables anyone to accept credit cards anywhere. Square offers an easy to use, free credit card reader that plugs into a phone or iPad. It's
simple to sign up. There is no extra equipment, complicated contracts, monthly fees or merchant account required. 
 
Co-founded by Jim McKelvey and Jack Dorsey in 2009, the company is headquartered in San Francisco.

The Conway Model The ‘Subway’ Model

WHAT SORT OF PERSON?
▸ They seem different than analysts:
▸ problem set
▸ relationship to discovery tools
▸ skills and professional proﬁle
▸ discovery / analytical methods
▸ perspective
▸ workﬂow and collaboration
▸ Are they? How?

AREAS OF INVESTIGATION
▸ Workﬂow
▸ Environment
▸ Organizational model
▸ Pain points
▸ Tools
▸ Data landscape
▸ Analytical practices
▸ Project structure
▸ Unmet needs

DISCUSSION GUIDE
Can you please walk me through a recent or current project?
a. How was the project initiated?
b. How deﬁned was the business problem in the beginning? Did the problem change?
c. Where/who did you obtain data sets from? How did you make the decision?
d.Describe the data you used: How did the data sets look like? How big were they? Were they structured or unstructured?
e. What tools or techniques did you use to do the analyses? Did they map to the speciﬁc steps you mentioned just now?
f. How did you decide these were the tools/techniques to use? To what extent were these decisions made by yourself and to what extent were
they standardized by your group/team?
g. How did you present the results of your analyses? What tools did you use? What do you like and dislike about your current tool set?
h. Which stage of this project was the most challenging? To what extent did the tools satisfy what you intended to do? What features were lacking?
i. How much collaboration was there during each stage of the project?
i. Background and role of collaborators
ii. Collaboration modes
iii. Types of information shared
Thinking about the projects you have worked on, is there a common approach you take to address these problems?
How did you decide on this approach/tools?

NEEDS
What are the most common and useful statistical techniques you use during discovery and analysis efforts?
“(1) The most commonly used statistical techniques used to date (in our strategic planning work) are: dimensionality
reduction (partition clustering, multiple correspondence analysis), factor analysis, partition clustering (k-means, k-medoids,
fuzzy clustering), cluster validation techniques (silhouette, dunn’s index, connectivity), multivariate outlier detection, linear
regression, and logistic regression.”
What statistical capabilities or functions would be very useful if provided within Endeca discovery applications, and where
would they be useful?
(2) Techniques that would assist with identifying outliers or invalid data. Much of this work seems to be done by hand. I
believe that we are also getting to the point where we could start using linear regression and splines (for showing trends).”

NEEDS
For example, would system-generated descriptive statistical visualizations be useful for whole data sets - or for smaller user-
selected groups of attributes?
“With regards to your last question on visualization, we have put in significant effort to use visualization in our Endeca
installation. We have built visualizations such as tree maps, flow diagrams, sun burst diagrams, scatter plots showing clusters,
and hierarchical edge bundling diagrams to explore our data sets.
Would it be useful for the application to analyze and suggest possible distribution models it sees in the data; for the values of
individual attributes, and/or for larger sets of data?
Our data tends to be qualitative rather than quantitative so this drives much of our visualizations.
So yes, interactive descriptive statistical visualization would be helpful – on the complete data set and individual attributes.”

Discovery/Information Needs
Support longer term strategic planning:
•How can we decrease the time-to-install service for new customers
•How can we decrease the time it takes to restore service after a storm causes wide-
spread outages
•How can we decrease operational cost for each department/line of business
•How many call center representatives do I need in my call center
•How much offsite technician headcount do we need based on historical/seasonal
trends balanced against current customer install base and ongoing sales/marketing
efforts?
Evaluate Success:
•How effective was a particular marketing campaign
•How effective is a new training program for call center representatives
•How effective is a self-install approach
Understanding variables that impact KPIs. KPIs include:
•Call center volume
•% successful resolution by support staff
•Time-to-install
•Sales volume
•Sales revenue
Understanding & Explaining Variance using Retrospective Analyses
•Why does Connecticut have a shorter time-to-install than Rhode Island
•Why did 2 identical marketing campaigns in 2 different markets have vastly
different impact on sales
•Is the variance significant, or does it represent random deviation?
Ad-hoc Reporting
•How many calls to the call center needed to be escalated to tier 2 support last
month
•How many new customers complained that a technician was later/didn't show up for
the install appointment
Analyst Profile: Scott – Operations Analyst
Summary
Education
BA Information Systems (Connecticut State College)
MBA Org Leadership (Johnson & Wales)
Scott is a mid-level analyst with a background in Business
Information Systems, and MBA in Organizational
Leadership. He works in a 6-person team at Cox-New England
(Telecommunications). His current role involves conducting data mining
analysis to support operations research and organizational decision
making/strategic planning.
Scott's work supports both sides of the profit equation: operations
research/analysis to support internal cost-cutting and process innovation,
and formative/summative evaluation to help drive effective sales/
marketing efforts to increase revenue. His group is also given target cost
savings goals that they need to help individual departments achieve to
fulfill a cost reduction organizational mandate. His group accomplishes
this by discovering inefficiencies in process through data mining,
predictive modeling and retrospective data analysis.
Cox has highly attributed enterprise data on customers, marketing
campaigns, pricing variants and special offers, demographics, geography
of the area, building and home types, school schedules, weather events,
etc. that describe customer usage patterns, consumption of media
bandwidth, etc. Each of their products (data, cable, phone, wireless) has
different usage profiles that vary along many of the dimensions and
variables listed above. His group is focused on residential customers;
business customers are handled by a separate unit.

‘FIVE THINGS ANALYSTS DO WITH DATA’
▸ Clustering
▸ Dimension Reduction
▸ Anomaly Detection
▸ Characterization
▸ Testing probability model & validation
Source: Frontiers in Massive Data Analysis
http://www.nap.edu/openbook.php?record_id=18374
}
}
Structure of data
Profile of data
} Validity of data

Business Analytics Data Science
Intuitive
Manual
Gradual
Individual
Empirical
Augmented
Accelerated
Cooperative*
Nature of sense making activity

Sense Maker Segment
Sense makers need to create and/or employ insights to accomplish
their business goals and satisfy their responsibilities.
These insights emerge from independent and collaborative discovery
efforts that involve direct interaction with discovery applications, and
participation in discovery environments.
Insight Consumer
Analyst
Casual Analyst
Data Scientist
Analytics Manager
Problem Solver

Creates data-driven insights, offerings, and resources to transform the organization
Work Experience 10 Years
Education Ph.D. Statistics, MS Bio-Informatics
Job Title Senior Data Scientist
Company LInkedIn
Summarize & Communicate
Review findings with colleagues;
summarize ,visualize, and
communicate key findings to
Insight Consumers/decision makers
Prototype & Experiment
with data driven
feature:
How can we prototype/
evaluate this w/out
disrupting the site?
Gather Data &
Analyze Results
Use descriptive,
inferential, and
predictive statistics
to evaluate results
Analyze & Identify causal/
predictive factors:
Who are the best
candidates to contact for a
job based on recruiter
needs and profile content?
Dana Data Scientist
• Defining and capturing useful measures of
online attention
• Getting all the data analytic tools to work
together properly
• No current workflow support or tools for data
wrangling, analysis, experimentation,, and
prototyping
• Effective tools to help experiment with and
evaluate value /utility of features and
activities for users
• Ability to rapidly prototype data-driven
features w/out risk of online service
disruptions
• Open source data manipulation, mining &
analysis tools including R, Pig, Hadoop, Python,
etc.
• Statistical packages such as SAS, SPSS, etc.
• Custom analytical tools built using open source
components and languages
• Leverage data to support the org mission
• Enhance products & services with data-driven
insights and features
• Use data to identify new opportunities and
prototype/drive new customer offerings
• Create useful data sets/streams, measures, &
resources (e.g., data models, algorithms, etc.
Key Goals
Tools
Pain Points
Wish List
Sample Workflow
Dana is a Senior Data Scientist who has worked at LinkedIn for 5 years.
Dana’s education includes a Ph.D. in Statistics and an MS in Bio
Informatics. Dana’s previous work includes positions in academic research
groups as a doctoral candidate and post-doc, as well as software
engineering roles in the Internet & technology industries.
•Dana works with several other data scientists and her Analytics Manager
on a centralized team
•Dana and her colleagues aim to create data driven insights, features,
resources, and offerings that deliver strategic value to LinkedIn
•Dana works with Analysts on other teams to define and create discovery
tools, data sets, and methods for use by their groups at LinkedIn.
•Dana & team are visible & well established within LinkedIn, and have a
voice in product strategy and operational context; they have a high
degree of autonomy in defining data science projects
•Dana works with Insight Consumers to suggest and determine potential
new data driven offerings to prototype and evaluate.
• How can we leverage data to increase online engagement with LinkedIn?
•How should we measure engagement & what factors drive it?
•What aspects of a personal profile are most likely to encourage /
discourage new connections between people?
•How can we increase people’s activity and contributions to topical
discussion groups?
• What factors drive the effectiveness of our marketing campaigns?
•Why did one of our marketing campaigns work exceptionally well?
• How can leverage data to help recruiters identify and communicate
effectively with qualified and potentially available candidates?
Typical Discovery Scenarios & Problems
Background
Work Context
• Mines, analyzes, & experiments with data to
identify patterns, trends, outliers, causal
factors, predictive models, & opportunities
• Defines and explains newly devised
measurements, predictive models, &
insights
• Compares effectiveness of operations at
achieving company goals for engagement,
growth, data quality
• Produces & explores new data sets
• Collaborates with other data scientists to
capture new data streams
• Prototypes new data driven site features/
offerings
• Runs data based experiments to test/
evaluate models, hypotheses & prototypes
• Communicates & explains analyses to
colleagues & Insight Consumers
I’ll do whatever it takes – wrangle,
extract, manipulate, analyze,
experiment, prototype – to use
data to drive value & innovate
“
”
Activities

Perspectives
Analytical
The analytical perspective is the center of definition for all
analytical roles. Contrast with engineers, who "make stuff".
Analytical roles figure things out for some purpose: whether a
model to inform a product prototype or provide insight.
Empirical
The empirical perspective is distinct from the analytical
perspective, and marks 'true' data scientists. This revolves
around framing and testing hypotheses formally and informally,
often requires validation and interrogation of experimental
methods and results by others, expects significant degree of
transparency at (all) stages of the analytical effort.

Empirical Method
Experiments
Hypotheses
Results
Questions or
beliefs
Predictions
Conclusions
Insights
Domain
Production
Models
Data Sets
Exploratory ValidationInvestigative TrainingModel Building
Analytical
Methods
Insight
Consumer
Data
Scientist
Articulates
Directs
& applies
Creates & refines
Effected by
Lead to
Tested by
Use / require
Motivate
Creates & refines
Generate
Achieves
Informed by & shares
Inform
Understands
Defines & evolves
Inform
Data
Engineer
Implements
Determines
Applied to validates
Data Sources
Used to define
Applied to
Development
Corpus
External
Sources
Production
Corpus
Mirrors
Applied to
Models
Reference Initial Interim New
Drawn from
Analytical Tool
Algorithm Script Test
Implemented as
Implements
Inform
What is the question?
How will we answer the question?
What data will we use?
What analytical method will we use?
What tools will we use?
What are the results?
What do the results mean?
What did we learn / discover?
Who should we inform?
What is the next question?
Manages
Data ProductsManages
EMPIRICAL DISCOVERY
“a hybrid, purposeful,
applied, augmented, iterative
and serendipitous method for
realizing novel insights for
business, through analysis of
large and diverse data sets.”
Data Science and Empirical Discovery: A New Discipline Pioneering a New
Analytical Method
https://blogs.oracle.com/serendipity/entry/data_science_and_empirical_discovery

Data Science
Insight
Model
Insight
Model
Data Product
Product
Analysts
Outcomes

Analysis Workflow & Activities
• Empirical analysis of subsets of data
–Understand topology of data, boundaries (sets / subsets, complete corpus,
totality of data)
• Outlier identification and profiling
–How significant are outliers to overall topology
»Comparative exclusion and profiling of resulting data subsets to understand their role,
discover principal components
• Find and analyze patterns, areas of interestingness / deserving attention
• Find and analyze central actors / factors (in existing model that produced
source data, in topology of working data, in patterns, etc.)
–ID and understand their impact on local and global data topology and primary metrics if in several ways
/ more than one axis / at the same time
• Discover and analyze relationships amongst central actors
–Understand cycles, trends, changes (dynamic characteristics) for core actors,
topology, patterns and structure
–Understand causal factors
• Codify / create new model reflecting insights & outcomes from experiments

Data Science Workflow
• Frame problem / goal of effort
• Identify and extract data to be used in effort from whole corpus / totality of available
data
–Exploratory identification and selection of working data for use in experiments
• Define experiment(s): hypothesis / null hypothesis, methods, success criteria
–Derive insight(s)
–Wrangle, process, visualize, interpret
• Codify / create new model reflecting insights outcomes from experiments
• Validate new model(s)
• Provision training data
• Train new model
• Validation and outcome of training model
• Hand-off for implementation on production systems / as production code

THE ESSENCE
▸Empirical perspective
▸Business imperatives drive activities
▸Analytical approach
▸Recipe is always the same
▸Engineering always present
▸Data challenges are paramount
▸consume 60% - 80% of time and effort
▸Data volumes range huge to moderate (PB > MB)
▸Domain often drives analysis
▸Data scientists already have self-service
▸Some new problems, many the same
▸Use ‘advanced’ analytics, not conventional BA
▸Innovate by applying known analyses to new data
▸Current workﬂow fragmented across tools and data stores
▸Success can be a model, product, insight, infrastructure, tool

Model of Analytical Workflow
Articulates common analytical activities
“realistic” - represents wrangling, some iterative dynamics
bounded - does not represent business perspective
Originated by Ben Lorica - O’Reilly
*consistent with our research*

“…HOUSTON, WE'VE GOT A PROBLEM”

John is tasked with analyzing 30 years of crime data collected by three different authorities. Accordingly, the data arrive in three different formats: one source is a relational database, another is a comma-separated values (CSV)
file, and the third file contains data copied from various tables within a portable document format (PDF) report. Knowing the structure required for his visualization tool, John first reviews the different data sets to identify potential
problems (step 1 in Figure 1).
The relational database allows him to specify a query and generate a file in an acceptable format. For the comma delimited data, the column headings associated with the data were unclear. Using spreadsheet software he adds a
row of header information at the top to fit the format required by the visualization tool. While updating the header, John notices that the location of a given crime is encoded in one column (as ‘City, State’) in the CSV file and
encoded in two columns (one ‘City’ column and one ‘State’ column) in the relational database.
He decides to split the column in the CSV file into two separate columns. John then opens the text file in the spreadsheet but the spreadsheet does not parse the data as desired. After manually moving data fields to appropriate
columns and some other manipulation (step 2), John finally has consistent columns and now combines the three files into one, but then notices that some columns have inconsistently formatted cells.
The ‘Date’ column is formatted as ‘dd/mm/yy’ in some cells and as ‘mm/dd/yyyy’ in others. John returns to the original files, transforms all the dates to the same format, and recombines the files. John loads the merged data file in a
visualization tool (step 3). The tool immediately gives the error message ‘Empty cells in column 3’; it cannot cope with missing data. John returns to the spreadsheet to fill in missing values using a few spreadsheet formulas (back
to step 2). He edits the data by hand; sometimes he transforms the data (e.g. one state reports data only every other year so he uses an average for the missing years). At other times there is nothing he can do after diagnosing a
new problem (i.e. return to step 1). For example, he finds out that survey question 24 did not exist before 2000, and the most recent year of data from Ohio has not been delivered yet, so he tries to pick the best possible value (e.g.
1) to indicate missing values. John detects other, more nuanced, problems; for example, some cells have a blank space instead of being empty. It took hours to notice that difference. John tries to follow a systematic approach
when evaluating the data, but it is difficult to keep track of what he has inspected and how he has modified the data, especially because he discovers different issues across different files. Even after all of this work, he is not sure if
he has examined all of the variables or overlooked any outliers. After a while, the data file seems good enough and he decides to move on.
It took a few days so it is with a great sense of accomplishment that John finally loads the data for the second time into the visualization tool he wants to use (step 3 again). He constructs several views of the data, including a
geospatial representation of the crimes and a scatterplot of age against crime. As soon as he sees the visualized data he realizes that, unfortunately, data quality issues still persist. Extreme outliers appear in the visualization.
Some outliers seem to be valid data (e.g. data from the District of Columbia are very different from data from every other state).
Others seem suspicious (criminals may vary in age from teenagers to older adults, but apparently babies are also committing crimes in certain states). John iteratively removes those outliers he believes to be dirty data (e.g.
criminals under 7 and over 120 years old). Times eries visualizations indicate that, in 1995, some causes of death disappear abruptly while new ones appear.Two days later, an email exchange with colleagues reveals that the
classification of causes of death was changed that year. John writes a transformation script to merge the data so he can analyze distinct terms referring to the same (or at least similar) cause of death.
Although the ‘real’ analysis is just about to start (step 4), John has made dozens of transformations, repeated the process several times, made important discoveries relating to the quality of the data, and made
many decisions impacting the quality of the final ‘clean’ data. He also used visualization repeatedly while walking through the process, but still does not have results to show to his boss. Finally, he is able to work
with the usable data, and useful insights come to the surface, but updated data sets arrive (step 5). Without proper documentation (step 6) of his transformations, John might be forced to repeat many of the
tedious tasks.
“Research directions in data wrangling: Visualizations and transformations for usable and credible data”
“a process of iterative data exploration and transformation that enables analysis.”
WRANGLING SCENARIO

Although the ‘real’ analysis is just about to start (step 4), John has made
dozens of transformations, repeated the process several times, made
important discoveries relating to the quality of the data, and made many
decisions impacting the quality of the final ‘clean’ data.
He also used visualization repeatedly while walking through the process, but
still does not have results to show to his boss.
Finally, he is able to work with the usable data, and useful insights come to the
surface, but updated data sets arrive (step 5).
Without proper documentation (step 6) of his transformations, John might be
forced to repeat many of the tedious tasks.
“Research directions in data wrangling: Visualizations and transformations for usable and credible data”
WRANGLING SCENARIO

One or more initial data sets may be used and new versions may
come later. The wrangling and analysis phases overlap.
While wrangling tools tend to be separated from the visual
analysis tools, the ideal system would provide integrated tools
(light yellow). The purple line illustrates a typical iterative process
with multiple back and forth steps.
Much wrangling may need to take place before the data can be loaded
within visualization and analysis tools, which typically
immediately reveals new problems with the data.
Wrangling might take place at all the stages of analysis as users
sort out interesting insights from dirty data, or new data become
available or needed.
At the bottom we illustrate how the data evolves from raw data to
usable data that leads to new insights.
WRANGLING IN THE ANALYTICAL WORKFLOW

Discovery in the Analytical Workflow
• Commonly recognizable cycle and focus for discovery activities (subset)
• Explicitly iterative, ad-hoc, dynamic
• Goal = incremental / directional advance in understanding
• Core modes of engagement with data = Explore, Analyze
• Modeling phase does not involve exploration
Discovery

designed many
discovery solutions

The Language of Discovery:
A concrete descriptive language for
human discovery activity in diverse
contexts.
A simple and consistent vocabulary that
is independent of domain, role,
information type, etc.
The Language of Discovery:
A concrete descriptive language for
human discovery activity in diverse
contexts.
A simple and consistent vocabulary that
is independent of domain, role,
information type, etc.

Enables understanding of
discovery needs and context

Generative tool for discovery
capability and experiences

Discovery Modes
“a broad, but identiﬁable discovery activity that is not
tied exclusively to a particular context or domain.”

Locate

Verify

Monitor

Compare

Comprehend

Explore

Analyze

Evaluate

Synthesize
9 modes

Locate
To find a specific (possibly known) thing
e.g. I need to find a new part with particular technical attributes and then source it from the most qualified supplier - Engineering
Verify
‘To confirm or substantiate that an item or set of items meets some specific criterion’
e.g. How can I determine if I am looking at the latest information for a part or supplier? - Supply Chain Specialist
Monitor
‘To maintain awareness of the status of an item or data set for purposes of management or
control’
e.g. I need to monitor at risk/failing customers/dealers so I can prompt my Account Reps to fix the problems - Sales Manager

Compare
To examine two or more things to identify similarities & differences
e.g. I need to compare our module set teardowns with competitive teardown information to see if we’re staying competitive for cost, quality and functionality - Engineering
Comprehend
To generate insight by understanding the nature or meaning of something
e.g. I need to analyze and understand consumer-customer-market trends to inform brand strategy & communications plan – Director, Brand Image
Explore
To proactively investigate or examine something for the purpose of knowledge discovery
e.g. I need to understand the cost drivers for this commodity so I can negotiate better terms with my suppliers and forecast business risk based on market indices -
Procurement

Analyze
To critically examine the detail of something to identify patterns & relationships
e.g. I need to know the cost drivers for a part such as materials that impact cost. Is the relationship a correlation or step function for a part cost driver? - Engineering
Evaluate
To use judgement to determine the significance or value of something with respect to a specific benchmark
or model
e.g. I need to determine my current state in my prints so I can evaluate if I have price variation to negotiate a better price - Procurement
Synthesize
To generate or communicate insight by integrating diverse inputs to create a novel artifact or composite
view
e.g. I need to prepare a weekly report for my boss (sales mgr) of how things are going - Account Rep

Locate
Verify
Monitor
Compare
Comprehend
Explore
Analyze
Evaluate
Synthesize
9 modes

Discovery Modes and Activity
Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
data quality computed / enriched data
New data triggers
new cycles
Cumulative Change
Direction & Momentum
Begin Conclude
Goal: Make data useful for
analysis
Goal: Understand the nature and
usefulness of data for analysis.
Goal: Accumulate insight through
iterative analysis
Goal: Achieve insights by
analyzing data.

Working with data
to effect outcomes
Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
New data triggers
new cycles
Cumulative Change
Begin Conclude
Advancing insight
Can’t do this…
…Without these
capabilities
Apparent Mode and Activity Affinities

Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
source data source & enriched data
New data triggers
new cycles
Cumulative incremental progress
Focus of attention: Organization
of the data and quality issues
Focus of attention: Actual &
potential insights
Real wrangling Real analysis
Actual Discovery Modes and Activity Affinities

CAPABILITIES FOR VISUAL DISCOVERY & ANALYSIS TOOLS
▸ Explore data corpus
▸via effectively characterized catalog
▸ Explore individual data sets
▸effective preview / sample / subset
▸ Analyze data
▸within ad-hoc data sets, across ad-hoc data sets
▸ Wrangle data
▸within ad-hoc data sets, across ad-hoc data sets
▸ Verify outcomes: insights, models, data products
▸ Synthesize outcomes
▸ distinct types = insights, model, data product (project)
▸ Publish outcomes
▸ distinct types = insight, data product, model (project)
▸ Integrate specialized / external analytical tools {augment}
▸ analysis tools (R, Python), reference models, validation tools
▸ Integrate external workflow tools {enhancing}
▸ e.g. figshare, model management, projects
▸ Support analytical workflow {enhancing}

Discovery Capabilities: Core
Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Core discovery capabilities

Discovery Capabilities: Enhancing
Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Publish &
operationalize
outcomes
Workflow, provenance, versioning, accelerators, collaboration
Acquire and
access data
Enhancing capabilities

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 118
Activity Cycles & Capabilities
Core
Capabilities
activity specific
progressive
Influencer
By-product
PublishImport
Precursor
• Core capabilities are necessary &
primary to complete a given cycle
• Enhancing capabilities are
secondary within a cycle
necessary to accumulate
assets(?)
necessary to advance to next
cycle(?)
asset
types
Workflow
Collaboration
PublicationAccelerators
Enhancing Capabilities
common
random access
Versioning
Successor
Provenance
Metadata
PublishImport
Curation
Governance
Import

Capabililty Evolution
Core
Capabilities
activity specific
progressive
Influencer
By-product
PublishImport
Precursor
assets(?)
cycle(?)
asset
types
Workflow
Collaboration
common
random access
Versioning
Successor
Provenance
Metadata
PublishImport
Curation
Governance
Import

Business
Analytics
(future)
Data
Science
(now)
=

OPPORTUNITY
“IS THERE ANY THERE, THERE?”

PRODUCT STRATEGY CHARTS A DESIRED
SET OF COURSES THROUGH THE SPACE
OF POSSIBLE PRODUCTS FOR A DOMAIN
Joe Lamantia
PRODUCT STRATEGY

Tools on the Market Now
Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Cumulative Change
Begin Conclude
Paxata, Trifacta
Beyond Core?
OSS / hand rolled
EID 3.x
Wave 1 wrangling
tools now in market
No good exploration
tool in market

Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Cumulative Change
Begin Conclude
Alteryx
Datameer
Modest exploration
capabilities

Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Cumulative Change
Begin Conclude
Alteryx
Modest exploration
capabilities
Qlik

Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Cumulative Change
Begin Conclude
Tableau, Platfora
Wave 1 visual
analysis tools now in
market
Modest wrangling
capabilities

BDD 1.x?
Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Cumulative Change
Begin Conclude

BDD Future 1.x?
Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Cumulative Change
Begin Conclude
‘Plugable’ external
tools

BDD Future 2.x?
Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Cumulative Change
Begin Conclude

VISUAL DISCOVERY AND ANALYSIS TOOLS: WAVE 1
Deﬁnition: traditional discovery & analysis possible on hadoop stores
Value prop = easy access to hadoop stores for analysts w/out data engineer
In / coming to market now: platfora, datameer, clearstory, sisense, etc.
Segment is viable (people understand the need & have the problem)
Tool maturity will increase incrementally, and in customary ways
alignment to workﬂow particulars
nuanced and compelling UX
broader footprint of supporting capabilities: provenance, publishing, collaboration
integration with ecosystem of related tools for activity
This class of tools competes with & may replace / displace existing non-hadoop native tools that are still rising with the general analytics wave: qlik, tableau,
microstrategy
Firms making new investments (for new stacks) will try / buy this new generation
Firms extending existing investments less likely to buy new
Long view = tools in this segment could ‘eat’ BI marketshare by adding reporting and other structured analytical capabilities that capture customers
who do not have large BI stacks now, begin investing here, and subsequently need BI capability

DATA DISCOVERY PRODUCT
AN EXAMPLE

Oracle Confidential – Internal
Oracle Big Data Discovery
Overview
Richard Tomlinson
Director, Product Management
September 25, 2014

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal
Hadoop Data Reservoir Concept Gaining Momentum
142
Data Warehouse Data Reservoir
Emerging
Sources
Existing Sources
Source: wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017
Source: 451 Research – Total Data Warehousing: 2013-2018
Source: The Forrester WaveTM: Big Data Hadoop Solutions, Q1 2014

Not Easy to Get Analytic Value from Hadoop
143
• Existing analytic tools fall short
– Fail to expose potential of data up front
– Rely on upstream ETL processes to cleanse and prepare data
– Optimized for SQL not unstructured data
– Not built for discovery (assume users know what questions to ask)
• Only point solutions emerging
– Leads to constant context switching
– Need end-to-end capabilities
• Early Hadoop tools complex
– Pig, Oozie, Sqoop, Hive, Spark, etc
• Specialized skills are scarce
– Programming languages (e.g. Map Reduce,
Python, Scala)
– Statistics and machine learning
– Command line interfaces

Requires a Fundamentally New Approach
144
A single intuitive,
interactive and visual
user interface
Explore
TransformDiscover
Find
for anyone to quickly find,
explore, transform and
analyze data in Hadoop
then share results for
enterprise leverage

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal 145
Oracle Big Data Discovery. The Visual Face of Hadoop
Explore
TransformDiscover
Find

• Navigate a rich catalog of
all data in the Hadoop
cluster
• Familiar search and guided
navigation for ease of use
• Access data set summaries,
annotation and
recommendations
• Provision your own data
through self-service upload
• Data is automatically
enriched with extracted
locations, terms, sentiment
• Browse personal big data
projects and those shared
by the community
146
Easily Find Relevant Data Sets

• Understand shape of the
data. Visualize attributes
by type
• Entropy based sorting by
information potential
• View attribute statistics,
data quality and outliers
• Use scratch pad to see
statistical correlations
between attribute
combinations
• Evaluate whether a data
set is worthy of further
investment
147
Explore the Data and Understand Potential

• Intuitive user driven data
wrangling
• Library of data
transformations to replace
values, convert types, collapse,
reshape, pivot, group, custom
tag, merge and much more
• Data enrichments for inferring
location and language. Theme,
entity and sentiment
enrichments for text
• Preview results, undo, commit
and replay transforms
• Run on sample data in memory
or full data set in Hadoop
148
Transform and Enrich Data to Make it Ready

• Mash up different data
sets for deeper
perspectives
• Drag and drop from a rich
library of interactive
visualizations to compose
discovery dashboards
• Filter through data with
powerful search and
intuitive guided
navigation
• Share projects,
bookmarks and snapshots
with team members for
collaboration
149
Analyze the Data to Discover New Insights

Share Results and Publish for Enterprise Leverage
150
• Share and collaborate with the team
– Share projects, bookmarks and snapshots then
collaborate and iterate
• Publish back to Hadoop
– Transforms and enrichments may be applied to
original data sets in Hadoop
– Publish blended data sets back to HDFS
• Leverage results in other tools
– Publish data to Hadoop in format optimized for
advanced analytic tools (e.g. ORAAH)
– Hadoop compliant BI tools (e.g. OBIFS) can
burst out to the masses
– Leverage any native Hadoop tooling (e.g. Pig,
Hive, Impala, Python, etc)
– Integrate BDD data sets with DWH to secure,
govern and optimize for query performance
(e.g. Oracle Big Data SQL)
Oracle Big Data Discovery plays well
with the big data ecosystem
Explore
Transfor
mDiscover
Find
Share &
Collaborate
raw data
transformed data
data reservoir
(HDFS)
Publish
data
warehouse
business
intelligenc
e
advanced
analytics
other
hadoop
tools
Leverage

Oracle Big Data Discovery. Technical Innovation on
Hadoop
151
Oracle Big Data Discovery Workloads
Hadoop Cluster
(BDA or Commodity)
data node
data node
data node
data node
data node
name node
Data Processing, Workflow & Monitoring
• Profiling: catalog entry creation, data type & language detection,
schema configuration
• Sampling: dgraph (index) file creation
• Transforms: >100 functions
• Enrichments: location (geo), text (cleanup, sentiment, entity, key-
phrase, whitelist tagging)
Self-Service Provisioning & Data Transfer
• Personal Data: Upload CSV, XLS and JSON to HDFS
• Enterprise Data: Provision from RDBMS to HDFS
In-Memory Discovery Indexes
• DGraph: Search, Guided Navigation, Analytics
Studio
• Web UI: Catalog, Explore, Transform, Analyze, Share
Hadoop 2.x
Filesystem
(HDFS)
Workload Mgmt
(YARN)
Metadata
(HCatalog)
Other Hadoop
Workloads
MapReduce
Spark
Hive
Pig
Oracle Big Data SQL
(BDA only)

DEEP STRUCTURE <> ANALYTICAL WORKFLOW
CHANGE VECTORS <> BIG DATA TECHNOLOGIES
EARLY SIGNALS <> RISE OF DATA SCIENCE
INFLECTION POINTS <> DATA SCIENCE MOMENT
EMERGING SPACES <> EMPIRICAL DISCOVERY
HOLISTIC EXPERIENCES <> VISUAL DISCOVERY TOOL

VISUAL DISCOVERY & ANALYSIS TOOLS: WAVE 2
Definition: Augmented discovery & analysis across full business data corpus
Value prop = deeper insights from more diverse data, faster insights,
effected via a mixed toolkit of (semi)automated analytical techniques (clustering, machine learning, regression / correlation, etc.) enhances and directs analyst
attention
Vectors of augmentation: data types, degree of automation
data = text / lingual, location / spatial, native graph, native stream
automation = which specific activities are augmented, to what degree)
Wave 2 is at the ‘pioneer’ stage: specifics of capability, value, implementation unknown
Limiting factors:
Domain specificity: value of general discovery analytics drops once domain boundaries are reached - need to align specifically to domain view of world
Expect verticalization of all analytics
Low / no tolerance for black boxes - deeper insights require transparency
Analytical literacy: level increasing, but orgs can’t benefit from advanced analytical techniques if not understood & trusted

Feature Selection
Joe Lamantia
Product Strategy(ist)
Oracle Big Data Discovery
November, 2014

Feature Selection
In machine learning and statistics, feature selection, also known as variable selection, attribute
selection or variable subset selection, is the process of selecting a subset of relevant features for use in
model construction.
The central assumption when using a feature selection technique is that the data contains
many redundant or irrelevant features.
Redundant features are those which provide no more information than the currently selected features, and
irrelevant features provide no useful information in any context. Feature selection techniques are a subset of
the more general field of feature extraction.
Feature extraction creates new features from functions of the original features, whereas feature selection
returns a subset of the features. Feature selection techniques are often used in domains where there are
many features and comparatively few samples (or data points).
Feature selection is also useful as part of the data analysis process, as it shows which features are
important for prediction, and how these features are related.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
BDD Feedback: Data Scientist Interviews
“Analysts don’t generally analyze the catalog per se - they analyze line
items, or actions, or histories, that kind of thing.”
“It’s generally actions that people are interested in.”
160

Data
Records
Catalog
Format
Entities
Product, location
Connections
Satisfaction
Goals
Acquire
Transform
Events
Purchase
Status change
Structures & Systems
User
centric
Data
centric
Networks
Business unit
Community
Loyalty factors
Themes
Profit
Efficiency
Plans
Balance budget
Launch product
Manage risks
Business Perspective
Progressive engagement
Complexity & difficulty
Value of outcome
Activities
Traffic logging
Address change
Processes
Fulfillment
Brand monitoring
Analysis PerspectiveData Perspective
Domains
Supply chain
Industry / market
Models
Conversion
Lifetime Customer Value (Decision tree)
Measures
Attrition rate
Unit cost of materials
Sensemaking Spectrum
How analysts have
to engage with data

Data
Records
Catalog
Format
Entities
Product, location
Connections
Satisfaction
Goals
Acquire
Transform
Events
Purchase
Status change
Structures & Systems
User
centric
Data
centric
Networks
Business unit
Community
Loyalty factors
Themes
Profit
Efficiency
Plans
Balance budget
Launch product
Manage risks
Business Perspective
Progressive engagement
Complexity & difficulty
Value of outcome
Activities
Traffic logging
Address change
Processes
Fulfillment
Brand monitoring
Analysis PerspectiveData Perspective
Domains
Supply chain
Industry / market
Models
Conversion
Lifetime Customer Value (Decision tree)
Measures
Attrition rate
Unit cost of materials
Sensemaking Spectrum
How analysts want
to engage with data

“The transforms are for feature engineering, right?”
“What other goals are there for the transforms?”
“I would assume that’s the only reason for the transforms…”
163

“Getting the data right is the hard part. Once you get the data right…”
164

“…feature engineering needs to be an iterative process”
“…this is an iterative process. Everything goes in a circle.”
“You’re going to do some data cleaning, you’re going to build a model,
you’re going to have to go back and look at what you’re missing and
what you’re not missing.”
•
165

Analytical Activity
Explore
Wrangle
Analyze
Augment
Sensemaking
Transformation
Features
Goals
Realize insights
Generate Models
Goals
Understand data
Make data useful
Cumulative incremental progress
Data quality
& Features

Feature
Extraction
Engineering
Generation
Selection
…

We can repurpose techniques used
during the traditional feature selection
stage of the analytical workflow to
enhance other stages of the discovery
and analysis workflow.
A likely candidate is exploration as it
is coupled with wrangling.
…Allow analyst engagement and
focus on more useful constructs like
entities or business processes,
instead of dealing only with raw
values and attributes
169
Thesis

BDD 1.0 EID
EID
170
BDD ?
Acquire
Ingest
& Clean
Store &
Manage
Featurize
Wrangle
Visual
Analysis
Interactive
Queries
Modeling Story-telling
Build
Deploy
Monitor &
Maintain
Present
Disseminate
Insight cycle Modeling cycle

How?
• Features are discovered and inferred
• statistical & other domain-independent methods
• Domain-based
• Known features used to train system
• Sources
• artifacts (scripts, models, dictionaries)
• analytical activities
• direct indication
•
171

Possible Manifestations
• Feature-based operations
• wrangling: transforms, joins,
• exploration: search, visualization,
• analysis
• Feature recognition: known features identified in new data
• Feature-based enrichment
• Interest graphs - Individual and group
• Modeling capabilities
172

• Movement toward user-centric engagement with data:
• Entity-centric navigation & event linkage across data sets (Platfora)
• Answerset (Paxata)
• semantic search & enrichments (BDD)
• thematic data lenses (platfora)
• data harmonization and data stories (clearstory)
• natural language interaction / cognitive computing (IBM)
• expert network (tamr)
173
What’s happening in this product space?

Core
Capabilities
activity specific
progressive
Influencer
By-product
PublishImport
Precursor
assets(?)
cycle(?)
asset
types
Workflow
Collaboration
common
random access
Versioning
Successor
Provenance
Metadata
PublishImport
Curation
Governance
Import

“How do you know what changes you want to make until you build
your model? Once you build your model, you know you want to take
the square root of this, or the log of this. That doesn’t happen until
you start building a model…”
178

Discovery & Analysis Workflow
Acquire
Ingest
& Clean
Store &
Manage
Featurize
Wrangle
Visual
Analysis
Interactive
Queries
Build
Deploy
Monitor &
Maintain
Present
Disseminate
Insight cycle Modeling cycle
Adapted from ‘Data Analysis Just One Component of the Data Science Workflow’
http://radar.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html
Features
Insights

BDD 1.0 EID
EID
180
Analytical Workflow
Acquire
Ingest
& Clean
Store &
Manage
Featurize
Wrangle
Visual
Analysis
Interactive
Queries
Build
Deploy
Monitor &
Maintain
Present
Disseminate
Insight cycle Modeling cycleData Ingest cycle

Old-school Modeling
• Compute is expensive
• Good (relevant) data is scarce
• All data is difficult to work with, require considerable time and attention just to get provisionally ready
• Human attention is limited - at all levels: engineer, analyst, insight consumer,
• ‘Experiments' are small, planned, receive close attention
• Rely first on a library of well known methods (carefully vetted by years of practice)
• Don’t run the experiment unless you know you can evaluate the results
• be sure you have the time
• be sure have the expertise
• be confident the results will be meaningful /insightful
• Automation is only feasible in limited circumstances
• Humans interpret experimental results
• Complete experiments before evaluating them
• ‘Small’ infrastructure - data sets, compute source, evaluation tools, archiving
• Modeling is best done by the knowledgeable
• can have negative consequences when done by novices
• Toolset aligned to: small / mid-sized data
• requires a high-quotient of human engagement, both directive / evaluative, and to enable execution
182

New-school Modeling
• Compute is cheap
• Data is abundant | good (relevant) data is often available
• Data is still challenging to work with, but tooling allows engagement with much greater quantities, of many types
• Run many experiments
• Try many approaches, using new and old methods
• Machines interpret experimental results, at least in part (batch eval for initial ranking of potential insight)
• ‘Big’ infrastructure - data sets, compute source, evaluation tools, archiving
• Automate where possible: selecting data, prepping data, choosing methods, setting parameters, executing experiments,
evaluating results
• Modeling is better done by those with knowledge, but it can have utility for non-experts
• [forward-looking analogs: genomics, bioinformatics, computational neuroscience]
• Toolset wants to be aligned to big data
• profile of human engagement varies over analytical lifecycle, seeking automation where possible in direction /
evaluation, and execution
183

Practices
• Combine old-school and new school approaches at different stages
of the analytical cycle
• Starting points vary by practitioner maturity, understanding of
problem, available resources
• Experiments often alternate approaches
• Use automation where possible
184

Modeling
186
Exploratory
Analysis
Identify features
Understand
relations between
features
Create new
features
Characterize
Dataset
Build
Baseline
Model
Build
Complex
Model
Feature
Engineering
& Model
Tuning
New features
Straight-forward
& well-known
modeling
methods
Explore &
understand
contents,
distribution,
quality, etc.
Iterative
experimentation
with several
classes of
modeling
methods
Compare to
baseline
Comparative /
reference model
Iterative &
experimental
model & feature
combination,
tuning, evaluation
Recursive feature
elimination
Modeling,
Testing,
Training,
Evaluation data
sets
Initial Predictive
Model
Final Predictive
Model
Explanatory
Model
Explanatory
Model
Discovery cycle Modeling cycle

Modeling & BDD
187
Exploratory
Analysis
Identify features
Understand
feature
relationships
Create new
features
Characterize
Dataset
Build
Baseline
Model
Build
Complex
Model
Feature
Engineering
& Model
Tuning
New features
Straight-forward
& well-known
modeling
methods
Explore &
understand
contents,
distribution,
quality, etc.
Iterative
experimentation
with several
classes of
modeling
methods
Compare to
baseline
Comparative /
reference model
Iterative &
experimental
model & feature
combination,
tuning, evaluation
Recursive feature
elimination
Modeling,
Testing,
Training,
Evaluation data
sets
Initial Predictive
Model
Final Predictive
Model
Explanatory
Model
Explanatory
Model
Initial capability…

Modeling & BDD
188
Exploratory
Analysis
Identify features
Understand
feature
relationships
Create new
features
Characterize
Dataset
Build
Baseline
Model
Build
Complex
Model
Feature
Engineering
& Model
Tuning
New features
Straight-forward
& well-known
modeling
methods
Explore &
understand
contents,
distribution,
quality, etc.
Iterative
experimentation
with several
classes of
modeling
methods
Compare to
baseline
Comparative /
reference model
Iterative &
experimental
model & feature
combination,
tuning, evaluation
Recursive feature
elimination
Modeling,
Testing,
Training,
Evaluation data
sets
Initial Predictive
Model
Final Predictive
Model
Explanatory
Model
Explanatory
Model
Subsequent capability…

Business Assets & Activity Cycles
Featurize
Wrangle
Visual
Analysis
Interactive
Queries
Discovery Modeling
Features
Data Application
VectorsEnrichments
Acquire Ingest
& Clean
Manage &
Update
Model Train
EvaluateUpdate
Build
MonitorStore &
Expose
Insights ModelsData
Train
Deploy
corpus
operational
analytical
archival
insight stream
awareness
explanatory
prescriptive
intelligence
machine
human
hybrid
systems
transactional
engagement
insight

Tool Archetypes
Featurize
Wrangle
Trifacta
Visual
Analysis
Platfora
Interactive
Queries
Datameer
Discovery ModelingData Application
Acquire Ingest
& Clean
Manage &
Update
Model Train
EvaluateUpdate
Build
Train
Deploy
MonitorStore &
Expose
Data science workbenches
Sense, yhat
Application Foundries
Azure ML, IBM
Traditional app studios
Java
Discovery Workbenches
BDD x
Data Integrators
Clover
Analysis Workbenches
Alteryx, Alpine
Analytics Platforms
Teradata, Pivotal
ML services
BigML, Wise.io, Skytree
Business Intelligence Suite
OBIEE, Cognos
Python notebooks
iPython, juPyter

VALUE CHAIN MAP (WARDLEY MAPPING)

VALUE CHAIN MAP (WARDLEY MAPPING)
ML

WORKING THE ECOSYSTEM
• Oracle = an ecosystem
• ML = commoditizing
• Someone will ‘generate the electricity’ = provide
ML capability within the Oracle ecosystem
• Everyone’s going to need it…

Oracle Machine Learning Service

Genesis
204

Offering
• Machine learning service exposed as
• Stand-alone productized service (public cloud)
• ‘Product’ integrated with relevant Oracle cloud offerings
• enable machine learning / analytics pipelines for data spanning service
boundaries
• ‘White-label’ ML capability within cloud offerings (SaaS, IaaS, PaaS, DaaS, etc.)
• enables localized ML / analytics pipelines w/in service boundaries
• Collection of Oracle-specific ML accelerators
• Data sets & streams, pipelines, algorithms, R / python libs, project templates,
etc.
•
205

Oracle Value Prop
• Provides ML capability across cloud offerings for expanded data
landscape
• Big data
• Big data + Traditional Enterprise in combination
• Streaming Data
• IOT
• Reinforces ‘data gravity’ effect across Oracle cloud offerings
• Entry point for ‘new stack’ (cloud-only) customers needing ML
capability
• ‘Missing link’ completes analytical pipelines across tool boundaries
206

Data
Landscape
207
Complexity
Quantity
Traditional
Enterprise
Big Data
IOT
Oracle Machine Learning Service
Product-native ML
Stream /
Real-time

Customer Value Prop
• Easy machine learning w/in ecosystem of Oracle cloud offerings
• Turnkey
• Elasticity and adaptivity: resources, pricing,
• Portability across Oracle product / service boundaries
• Manifests appropriately for product / service contexts
• Application Developers
• Analysts / Data Scientists
• Business users
• Machine consumers
208

SaaS
ML For the Oracle Cloud Ecosystem
209
Oracle Machine
Learning Service
DaaS
Data Service
IaaS
Infrastructure
Service
PaaS
Platform
Service
Data &
Models
Data &
Models
Data &
Models
‘Public’ OML
product
Customer
Applications
& Data sources
Data &
Models

SaaS
White Label ML Capability
210
DaaS
Data Service
IaaS
Infrastructure
Service
PaaS
Platform
Service
Machine Learning
ML ToolsML Tools
ML Tools
ML Tools
ML Tools
ML Tools
ML Tools
ML Tools
ML Tools
Customer
Applications
& Data sources
Oracle Machine
Learning Service
‘Public’ OML
product
Data &
Models
Data &
Models
Data &
Models
Data &
Models

Oracle Ecosystem
• All cloud services can be
• data sources for ML service
• consumers of published data & models from ML service
• OML can publish augmented datasets (e.g. pre-scored matrices) as
part of multistep & multi-tool analytical pipelines
•
211

Initial Capability
• Core ML functions:
• data upload (no transform - BDD integration) from Oracle sources
• modeling / analysis via general purpose, interpretable, methods
• model training
• model evaluation
• Model publication
• Processed data publication
212

Featurize
Wrangle
Visual
Analysis
Interactive
Queries
Acquire Ingest
& Clean
Manage &
Update
Model Train
EvaluateUpdate
Build
Train
Deploy
MonitorStore &
Expose
BDD (now)
ML services
Oracle Machine Learning
Discovery & Modeling Platform
BDD & ML (combined analysis offering ?)

Automation Potential
Featurize
Wrangle
Visual
Analysis
Interactive
Queries
Discovery Modeling
Features
Data Application
VectorsEnrichments
Acquire Ingest
& Clean
Manage &
Update
Model Train
EvaluateUpdate
Build
Train
Deploy
MonitorStore &
Expose
Insights ModelsData

Machine Intelligence Value Chain
Featuriz
Wrangl
Visual
Analys
Interactiv
e
Discover Modeling
Feature
Data Application
VectorEnrichmen
Acquir Ingest
&
Manage
&
Mode Trai
EvaluatUpdat
Buil
MonitoStore
&
Insight ModelsData
Trai
Deplo
corpus
operational
analytical
archival
insight
stream
awareness
intelligence
machine
human
hybrid
systems
transactional
engagement
insight
Process
operations?
transactional
engagement
insight
Apps
Metric
Create
Machine Intelligence
Operationalize

DEEP STRUCTURE <> PRODUCT DEVELOPMENT
CHANGE VECTORS <> ACQUISITION
EARLY SIGNALS <> MARKET ACTIVITY
INFLECTION POINTS <> INNOVATION MOMENTS
EMERGING SPACES <> PRODUCT STRATEGY GIG
HOLISTIC EXPERIENCES <> EXPERIENCE FOCUS

The Language of Discovery
Category: Primary Research, Design Systems
Outcomes: Building on already-published original
applied research into information retrieval and
usage, the language of discovery posits a domain-
independent framework describing the activity
primitives of discovery in terms of ‘modes’.
Succeeding professional and industry publications
outline the application of this descriptive vocabulary
in settings including product design and
development, product strategy, and information
management.
Reference:
• Russell-Rose, T., Lamantia, J. and Burrell, M. 2011. A Taxonomy of
Enterprise Search and Discovery. Proceedings of EuroHCIR 2011,
London, UK. http://ceur-ws.org/Vol-763/paper4.pdf
• Russell-Rose, T., Lamantia, J. and Burrell, M. 2011. A Taxonomy of
Enterprise Search and Discovery. Proceedings of HCIR 2011, California,
USA. https://docs.google.com/a/kent.edu/viewer?
a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8
Z3g6NzdmYjc3OWY2ZjQ2Zjg4MQ
• Russell-Rose, T. and Makri, S. 2012 A Model of Consumer Search
Behavior. Proceedings of EuroHCIR 2012, Nijmegen, NL.
• Designing the Search Experience: http://www.amazon.com/Designing-
Search-Experience-Information-Architecture/dp/0123969816
• Presentation - Strata: http://conferences.oreilly.com/strata/
stratany2012/public/schedule/detail/25411
• Presentation - UX Lisbon conference: http://www.joelamantia.com/
user-experience-ux/slides-for-uxlx-talk-the-language-of-discovery-a-
grammar-for-designing-big-data-interactions

Domain & Market Study: Data Science
Outcomes: Comprehensive portrait of all major facets of a new
analytical discipline, including its practices, roles,
methodology, tools and technologies, workflows,
organizational models, skillsets, alignment with business, areas
of innovation, and relation to the landscape of business
analytics.
Research outcomes and synthesized insights guided product
design, management, and strategy efforts including;
opportunity identification and profiling, landscape /
competitive modeling, technology lifecycle and evolution
models, product discovery, concept creation and evaluation,
prototyping.
Notable aspects: Consistently delivered insights twelve or
more months ahead of leading industry analysts pursuing
similar agendas.
Artifacts & Synthesis
• Data Science Highlights: http://www.joelamantia.com/user-
research/data-science-highlights-an-investigation-of-the-discipline
• Empirical Discovery Concept and Workflow Model: https://
blogs.oracle.com/serendipity/entry/
empirical_discovery_concept_and_workflow
• Empirical Discovery: A New Discipline https://blogs.oracle.com/
serendipity/entry/data_science_and_empirical_discovery
• Defining Discovery: Core Concepts: https://blogs.oracle.com/
serendipity/entry/defining_discovery_core_concepts
• Discovery and the Age of Insight http://www.joelamantia.com/
language-of-discovery/discovery-and-the-age-of-insight
• Big Data Is Not Enough http://www.joelamantia.com/user-
experience-ux/big-data-is-not-the-insight-slides-from-enterprise-
search-europe

DEEP STRUCTURES
ENTERPRISE / B2B
• Business process
• Activity
• Social structure: Organizational model
• Boundaries
• Regulation
• IT / Systems architecture
• Lifecycle
• Flows: capital, information, people
• Frame: shareholder value, social enterprise
CONSUMER / B2C
• Value scheme: wealth, love,
knowledge, safety
• Demographics
• Boundaries
• Mores
• Culture
• Social structure: community / group
• Frame: active lifestyle, sustainability

Activity Cycles [Structural View]
Initial
Activity
Final
Activity
Cycle Successor
InfluencerBy-product
OutcomeInput
Precursor
Interim
Activity
Interim
Activity
• Cycles are iterative
• Activities are progressive
• Can begin w/ any activity
• Best to begin w/ initial activity
• Impact of activity increases with
‘distance’ - can span cycles
• Inputs are necessary
• Precursors can be incomplete (?)
• Influencers are ‘from the future’
• Influencers enhance the local
cycle
• By-products enhance the
precursor
• Assets are cumulative
• Assets depend on precursor
cycles
• Assets communicate via cycles
asset
types

Business Assets & Activity Cycles
Featurize
Wrangle
Visual
Analysis
Interactive
Queries
Discovery Modeling
Features
Data Application
VectorsEnrichments
Acquire Ingest
& Clean
Manage &
Update
Model Train
EvaluateUpdate
Build
MonitorStore &
Expose
Insights ModelsData
Train
Deploy
corpus
operational
analytical
archival
insight stream
awareness
explanatory
prescriptive
intelligence
machine
human
hybrid
systems
transactional
engagement
insight

Activity Integration Points / Interfaces
Initial
Activity
Final
Activity
Cycle Successor
InfluencerBy-product
OutcomeInput
Precursor
Interim
Activity
Interim
Activity
• Integration necessary for
individual activities to
communicate w/ one another w/in
a cycle
• Gaps = demand for enhancing
capabilities
• Integration is made possible by
enhancing capabilities
• Cycles = accelerated by good
integration
• Cycles = slowed by poor
integration
• Activity speed is not affected by
integration?
asset
types

Data Pipeline
Featurize
Wrangle
Visual
Analysis
Interactive
Queries
Discovery Modeling
Features
Data Application
VectorsEnrichments
Acquire Ingest
& Clean
Manage &
Update
Model Train
EvaluateUpdate
Build
Train
Deploy
MonitorStore &
Expose
Insights ModelsData

Machine Intelligence Value Chain
Featuriz
Wrangl
Visual
Analys
Interactiv
e
Discover Modeling
Feature
Data Application
VectorEnrichmen
Acquir Ingest
&
Manage
&
Mode Trai
EvaluatUpdat
Buil
MonitoStore
&
Insight ModelsData
Trai
Deplo
corpus
operational
analytical
archival
insight
stream
awareness
intelligence
machine
human
hybrid
systems
transactional
engagement
insight
Process
operations?
transactional
engagement
insight
Apps
Metric
Create
Operationalize

Tool Archetypes
Featurize
Wrangle
Trifacta
Visual
Analysis
Platfora
Interactive
Queries
Datameer
Acquire Ingest
& Clean
Manage &
Update
Model Train
EvaluateUpdate
Build
Train
Deploy
MonitorStore &
Expose
Data science workbenches
Sense, yhat
Application Foundries
Azure ML, IBM
Traditional app studios
Java
BDD x
Data Integrators
Clover
Analysis Workbenches
Alteryx, Alpine
Analytics Platforms
Teradata, Pivotal
ML services
BigML, Wise.io, Skytree
Business Intelligence Suite
OBIEE, Cognos
Python notebooks
iPython, juPyter

Activity Cycles & Capabilities
Core
Capabilities
activity specific
progressive
Influencer
By-product
PublishImport
Precursor
assets(?)
cycle(?)
asset
types
Workflow
Collaboration
common
random access
Versioning
Successor
Provenance
Metadata
PublishImport
Curation
Governance
Import

common
239
Assets & Capabilities
core
capabilities
asset specific
Workflow
Collaboration
Versioning
Provenance
Metadata Curation
Governance
Import

Asset Scope
Enterprise
Line of Business
Enterprise
Localized
Line of Business
Localized
• Scope determines / implies boundaries, metrics
• Distinct systems (IT) and processes (biz) for
each asset, at each level of scope
• Each distinct system and process = integration
point, create barrier to flow, require interface

Enterprise
241
Asset Communication
Line of Business
Localized
• Scope determines / implies boundaries, metrics
• Distinct systems (IT) and processes (biz) for
each asset, at each level of scope
• Each distinct system and process = integration
point, create barrier to flow, require interfaceenhancing capabilities
common
common
enhancing
capabilities

Core
Capabilities
activity specific
progressive
Influencer
By-product
PublishImport
Precursor
assets(?)
cycle(?)
asset
types
Workflow
Collaboration
common
random access
Versioning
Successor
Provenance
Metadata
PublishImport
Curation
Governance
Import

UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centered Product Strategy For Emerging Spaces

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centered Product Strategy For Emerging Spaces

Similaire à UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centered Product Strategy For Emerging Spaces (20)

Plus de Joe Lamantia

Plus de Joe Lamantia (20)

Dernier

Dernier (20)

UX STRAT 2018 | Flying Blind On a Rocket Cycle: Pioneering Experience Centered Product Strategy For Emerging Spaces