2. #DPL15 | @sarahcat21
Mattermark is a deal intelligence platform and
private company database used by
●
investors
●
business and corporate development
●
sales
Mattermark
6. #DPL15 | @sarahcat21
Stealth
●
Private companies do not have strong incentives
(e.g. legal obligations) to share data. Many may
have competitive incentives to obfuscate
information.
●
Investors may request non-disclosure.
8. #DPL15 | @sarahcat21
Software-oriented approach
●
A must, due to the scale of our dataset
○ 1.3 million companies
○ 16.5k investors
○ 110k funding events
●
Leverage a lean data team
9. #DPL15 | @sarahcat21
Data collection strategy
●
Web scraping
●
Machine learning
●
Direct submission
●
Manual data entry
11. #DPL15 | @sarahcat21
Investors ask questions like
What start-ups
might raise capital
in the next 6
months? What startups is
Stephanie Palmeri
investing in?
12. #DPL15 | @sarahcat21
Our data analysts seek to understand:
●
Why does this question matter?
●
What data is required to answer this question?
●
Where can this data be accessed?
13. #DPL15 | @sarahcat21
Next, data analysts:
1.
Define repeatable processes for data collection.
2.
Determine whether processes can be replicated
through web scraping and/or machine learning
algorithms to collect data at scale.
3.
Write functional specifications, reviewed by
sales and engineering team members.
14. #DPL15 | @sarahcat21
Next, web and/or machine learning
engineers
1.
Write dev designs, reviewed by data analysts.
2.
Upon implementation and marketing release,
this data becomes available to customers.
3.
New questions arise and the cycle starts again.
16. #DPL15 | @sarahcat21
Investors ask questions like
How much funding
has a company
already raised?
Who were the
investors at each of
those rounds?
17. #DPL15 | @sarahcat21
Problems with existing sources
Rely on wiki-style data collection (cannot confirm
the credibility of sources)
News reports are better; but
●
facts are harder to extricate
●
different sources report different figures
18. #DPL15 | @sarahcat21
Solution: funding automation
A new framework for collecting and synthesizing
funding data.
1.
News article fact extraction (machine learning)
2.
Funding override system (web engineering)
3.
Funding confirmation email campaign
(marketing)
21. #DPL15 | @sarahcat21
2. News article fact extraction
●
Identify sentences containing information about
investors, amount, and/or series
22. #DPL15 | @sarahcat21
2. News article fact extraction
● Extract facts
● Match companies and
investors to entities in our
database
○ 30% of extracted articles
are entered automatically
23. #DPL15 | @sarahcat21
1. Funding override system
●
Identify reports about the same funding event
●
Combine information from multiple reports using wongi rules engine
24. #DPL15 | @sarahcat21
3. Funding confirmation email
campaign
Use CRM and Hubspot
to automatically send
emails to founders
after equity financing.
26. #DPL15 | @sarahcat21
Where we struggled
Our initial implementation of a funding override
system was inefficient. Why?
Because our data analysts and developers were
not aligned on functional requirements.
27. #DPL15 | @sarahcat21
Solution
●
Analysts must work closely with developers
○ Pre-spec check-ins
○ Analysts review dev designs to ensure that
the system design addresses the use case.
●
Analysts must avoid being prescriptive
●
Analysts must understand data mining and
machine learning concepts
28. #DPL15 | @sarahcat21
Where we succeeded
Implementation of news article fact extraction
was successful. Why?
Because data analysts and developers worked as
service providers to each other.
30. #DPL15 | @sarahcat21
1. Tighter Analyst + Dev Communication
Tiger teams: 1 ML developer, 1 web/infrastructure
developer, 1 data analyst, 1 project lead
Define milestones & hold daily stand-ups.
31. #DPL15 | @sarahcat21
3. Track II interaction reinforce symbiotic
relationship
●
Devs lead Python learning group
●
Data analysts hold seminars on topics like admin
tooling and alternative assets