This document provides guidance for starting a data science startup, including key focus areas, funding stages, business planning steps, challenges, and a reference architecture. It recommends initially focusing on research, defining the problem and business model, and identifying the target market. Key planning steps include analyzing business needs, evaluating and selecting technologies, and obtaining funding and sponsorships. Challenges may include time pressures, lack of resources, and integrating with existing systems. The reference architecture outlines technology options at different layers, from infrastructure to analytics and decision-making tools.
1. BIG DATA & DATA SCIENCE
START-UP FOCUS POINTS
+ BUSINESS AND TECHNOLOGY
REFERENCE ARCHITECTURE
@TomZorde
2. I HAVE AN IDEA FOR A DATA SCIENCE START-UP
• Use these slides to focus conversation
• What stage are you at?
• What is the problem you’re trying to solve?
• What type of business model would work?
• Tools? – A rapidly evolving space.
• Reference Architecture helps identify what level of the stack
we’re talking about.
3. AREAS OF EARLY FOCUS
SEED STAGE - Research & Development
1. Research & Define Concept, business model, internal & sourced capabilities
2. Define customer value proposition and identify target market
ANGEL – Business Planning & Product Development
1. Identify services and products required and evaluate gaps for go-to-market readiness
2. Source funding partner to build minimum viable product and get commitment for round 2 funding
3. Assemble team and build MVP prototype exceeding expectations
ROUND 1/ SERIES A FUNDING – Commercially operational
ROUND 2 / SERIES B FUNDING – Fully Operational
ROUND 3 / SERIES C FUNDING – Expansion
IPO/ ACQUISITION
4. BUSINESS PLANNING & DEVELOPMENT - LOGICAL STEPS
1. Full business needs and information requirements
analysis. Business Drivers
• Revenue generation? Cost reduction? Customer
retention? Compliance?
• Process Improvement? Fraud detection?
Analytics? Dashboard?
• Solving a tough problem? Retiring/replacing
assets, technologies and systems?
2. Technology Evaluation and Selection
• Define requirements and objective first
• Evaluation a variety of technology stacks –
develop a framework first
3. Board Support for Start-up Resources
4. Prototyping, Discovery, and Planning
• Rent Infrastructure in Cloud – VMWare, AWS, MS
Azure and others
• Use Spare Hardware and Network Bandwidth
• Assessment, Proposal. Project/Program Plan for
next steps
• Start small and keep delivering
5. Architecture Design, Estimation, Business Case
6. Obtain funding and executive sponsorships,
owners, etc.
7. SDLC, don’t forget Hardware, Security, Testing,
Data governance etc.
5. FORESEEABLE CHALLENGES
Business urgency, time to market pressures
• Big Data /Data Science start up needs careful planning
• Big Data needs infrastructure, software stacks, people, start up plan
Lack of Big Data Resources, Lack of Sponsorships (except in some companies)
• Big Data is complex and multiple skill sets (mostly new to many companies) – Infrastructure, Administration,
Security, Programming, Testing, etc.
• Skepticism about Big Data
Integration with Existing Technologies and Systems
• Can not develop isolated big data solutions
• Integration with existing systems will be a top challenge (requires both sides to do additional work)
Open Sources: Stability, Maturity, and Security
6. INFORMATION AS A PRODUCT/SERVICE
TYPES OF RELEVANT BUSINESS MODELS
Differentiation
New Services
Customers Experience
Contextual Relevance
Brokering
Raw Data
Benchmarking
Analysis and Insight
(Meta Data)
Delivery
Market Place
Facilitator
Advertising
7. REFERENCE ARCHITECTURE
Decisions & Insight
Analytics & Discovery
Data Access and Distribution
Data Collection& Organisation
Infrastructure Platform
Monitoring,Alerts,Tools,
Security,Governance
• The technology stack is rapidly evolving with all traditional as well as new vendors providing offerings
• Open source tools remain at the foundation layers.
• Different use cases will require different technology tools.
8. REFERENCE ARCHITECTURE
Decisions & Insight
• IBM Watson
• Industry Specific
Analytics & Discovery
• SAP Business Objects
• IBM Cognos
• SAS Analytics
• Dell Statistica
• Oracle Hyperion
• Microsoft BI
• KNIME
• Pentaho
• Informatica
9. REFERENCE ARCHITECTURE
Data Access and Distribution
• Document: MongoDB, CouchDB
• Graph: Neo4j, Titan
• Key Value Pair: Riak, Redis
• Columnar: Cassandra, Hbase
• Search: Lucene, Solr, ElasticSearch
Monitoring, Alerts, Tools, Security, Governance:
• Hadoop:Apache, CloudEra, Hortonworks,
MapR, IBM
• SQL Mapping: Hive
• Big Data Transformation: Pig
• Hadoop Load: Sqoop
• Realtime-ETL: Storm
• Cluster Computing: Apache Spark
• Languages: Python, Java, R, Scala
10. REFERENCE ARCHITECTURE
Data Collection& Organisation (Batch & Real-Time)
• Hadoop
• Hadoop Map Reduce
• Mahout
Infrastructure Platform
• AWS
• Azure
• Mortar
• Google BigQuery
• Qubole
• Dell
• HP
• IBM
11. BIG DATA & DATA SCIENCE
START-UP FOCUS POINTS
@TomZorde
Thank you