SlideShare une entreprise Scribd logo
1  sur  10
Integrating Disparate Data May 27, 2010 Steve Newman – CTO/Gist.com
the WHY? What we believe in… All your important people already reside in email, calendar, contact lists, social sites The web is a rich source of information about the people you care about One tool should exist that can pull all this together in a single, rich, integrated experience
Pain Points (External) Disparate Data/API sources and protocols e.g. GNIP Change notification (when/what) e.g. Linked Open Data Dataset Dynamics, pubsubhub Standard entity data structures e.g. Portable Contacts, vcard, hcard 3
The Problem (Internal) Need a single, disambiguated set of entities where an entity itself contains accurate/disambiguated attributes Entity attributes can be sourced from one or more endpoints Email Twitter/Facebook  Calendar Google Contacts, Outlook Contacts, Plaxo Google Social Graph API Rapleaf API
The Problem (Internal) Now that we have this data, we need to process and make sense of it Need to support reoccurring updates Merge and unmerge support Recursive derivation is a huge win if done correctly Historical Tracking is necessary both to drive operations but also for debugging (and it’s a cool user feature)
How we did it Enhancers Execute the request and creation of attribute data Can be called synch or asynch Cached, Logged, Rate Limited Meta data about attributes Source, Source Type, When created, Derived?, Derived Source, Score Rules for ‘enhancement’ Rules for recursion Scoring methodology (accuracy and relative prioritization) 6
Example – Email Enhancer “Brad Feld” vs “Brad” Data/Time Score State Value
Key Takeaways Worry about integration both external and internal to your application Lots of good work on the external issues…take advantage of it! Create a strong object model for internal data representation (workers, meta data, engines) so you can perform concise/discrete operations
Additional Info GIST API coming out this Summer Direct interface to Fragments  Standard and Third party Enhancer support @stevepnewman, @gist
« We know now that the source of wealth is something specificallyhuman : knowledge. Applied to tasksthatwealready know how to do, itbecomes'productivity'. Applied to tasksthat are new and differentwe call it'innovation'. Onlyknowledgeallows us to achievethesetwo goals. » Peter Drucker Management challenges of the XXIst Century-1999

Contenu connexe

Tendances

Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
Dataiku
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
Dataiku
 

Tendances (20)

Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big Data Maturity Model and Governance
Big Data Maturity Model and GovernanceBig Data Maturity Model and Governance
Big Data Maturity Model and Governance
 
Big data Competitions by Komes Chandavimol
Big data Competitions by Komes ChandavimolBig data Competitions by Komes Chandavimol
Big data Competitions by Komes Chandavimol
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Life of a data scientist (pub)
Life of a data scientist (pub)Life of a data scientist (pub)
Life of a data scientist (pub)
 
Data Skills for Digital Era
Data Skills for Digital EraData Skills for Digital Era
Data Skills for Digital Era
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data Catalog
 
Data analytics
Data analyticsData analytics
Data analytics
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
 
Best practices in building machine learning models in Azure ML
Best practices in building machine learning models in Azure MLBest practices in building machine learning models in Azure ML
Best practices in building machine learning models in Azure ML
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 

En vedette

концгмп новая евразия 24.03.11
концгмп новая евразия 24.03.11концгмп новая евразия 24.03.11
концгмп новая евразия 24.03.11
migrocenter
 
Poletaev khabarovsk
Poletaev khabarovskPoletaev khabarovsk
Poletaev khabarovsk
migrocenter
 
The Production Process: The Behavior of Profit Maximizing Firms
The Production Process: The Behavior of Profit Maximizing FirmsThe Production Process: The Behavior of Profit Maximizing Firms
The Production Process: The Behavior of Profit Maximizing Firms
Noel Buensuceso
 
Strategy Review, Evaluation, and Control
Strategy Review, Evaluation, and ControlStrategy Review, Evaluation, and Control
Strategy Review, Evaluation, and Control
Noel Buensuceso
 
The Nature of Strategic Management
The Nature of Strategic ManagementThe Nature of Strategic Management
The Nature of Strategic Management
Noel Buensuceso
 

En vedette (18)

концгмп новая евразия 24.03.11
концгмп новая евразия 24.03.11концгмп новая евразия 24.03.11
концгмп новая евразия 24.03.11
 
Poletaev khabarovsk
Poletaev khabarovskPoletaev khabarovsk
Poletaev khabarovsk
 
Destinys Word
Destinys WordDestinys Word
Destinys Word
 
The ADS Group
The ADS GroupThe ADS Group
The ADS Group
 
Destinys Word
Destinys WordDestinys Word
Destinys Word
 
25 февр 2010
25 февр 201025 февр 2010
25 февр 2010
 
Destinys Word
Destinys WordDestinys Word
Destinys Word
 
2009 12 12 памяти А
2009 12 12 памяти А2009 12 12 памяти А
2009 12 12 памяти А
 
Benchmark Index is BACK!
Benchmark Index is BACK!Benchmark Index is BACK!
Benchmark Index is BACK!
 
Curiosity is the core of innovation.
Curiosity is the core of innovation.Curiosity is the core of innovation.
Curiosity is the core of innovation.
 
The Internal Assessment
The Internal AssessmentThe Internal Assessment
The Internal Assessment
 
The Production Process: The Behavior of Profit Maximizing Firms
The Production Process: The Behavior of Profit Maximizing FirmsThe Production Process: The Behavior of Profit Maximizing Firms
The Production Process: The Behavior of Profit Maximizing Firms
 
Strategy Review, Evaluation, and Control
Strategy Review, Evaluation, and ControlStrategy Review, Evaluation, and Control
Strategy Review, Evaluation, and Control
 
Monopoly
MonopolyMonopoly
Monopoly
 
Fiscal Policy
Fiscal PolicyFiscal Policy
Fiscal Policy
 
The Nature of Strategic Management
The Nature of Strategic ManagementThe Nature of Strategic Management
The Nature of Strategic Management
 
Introduction to Macroeconomics
Introduction to MacroeconomicsIntroduction to Macroeconomics
Introduction to Macroeconomics
 
General Equilibrium and the Efficiency of Perfect Competition
General Equilibrium and the Efficiency of Perfect CompetitionGeneral Equilibrium and the Efficiency of Perfect Competition
General Equilibrium and the Efficiency of Perfect Competition
 

Similaire à Glue Conference

Salesforce mumbai user group june meetup
Salesforce mumbai user group   june meetupSalesforce mumbai user group   june meetup
Salesforce mumbai user group june meetup
Rakesh Gupta
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
Inside Analysis
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
NATASHABANO
 

Similaire à Glue Conference (20)

Accelerate Data Discovery
Accelerate Data Discovery   Accelerate Data Discovery
Accelerate Data Discovery
 
Big data
Big dataBig data
Big data
 
Salesforce mumbai user group june meetup
Salesforce mumbai user group   june meetupSalesforce mumbai user group   june meetup
Salesforce mumbai user group june meetup
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Bigdataanalytics
BigdataanalyticsBigdataanalytics
Bigdataanalytics
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of data
 
Not What You Think: A Simple Approach to Scalable Access of CMS Data
Not What You Think: A Simple Approach to Scalable Access of CMS DataNot What You Think: A Simple Approach to Scalable Access of CMS Data
Not What You Think: A Simple Approach to Scalable Access of CMS Data
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
Joe C
Joe CJoe C
Joe C
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
One Size Does Not Fit All
One Size Does Not Fit AllOne Size Does Not Fit All
One Size Does Not Fit All
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Glue Conference

  • 1. Integrating Disparate Data May 27, 2010 Steve Newman – CTO/Gist.com
  • 2. the WHY? What we believe in… All your important people already reside in email, calendar, contact lists, social sites The web is a rich source of information about the people you care about One tool should exist that can pull all this together in a single, rich, integrated experience
  • 3. Pain Points (External) Disparate Data/API sources and protocols e.g. GNIP Change notification (when/what) e.g. Linked Open Data Dataset Dynamics, pubsubhub Standard entity data structures e.g. Portable Contacts, vcard, hcard 3
  • 4. The Problem (Internal) Need a single, disambiguated set of entities where an entity itself contains accurate/disambiguated attributes Entity attributes can be sourced from one or more endpoints Email Twitter/Facebook Calendar Google Contacts, Outlook Contacts, Plaxo Google Social Graph API Rapleaf API
  • 5. The Problem (Internal) Now that we have this data, we need to process and make sense of it Need to support reoccurring updates Merge and unmerge support Recursive derivation is a huge win if done correctly Historical Tracking is necessary both to drive operations but also for debugging (and it’s a cool user feature)
  • 6. How we did it Enhancers Execute the request and creation of attribute data Can be called synch or asynch Cached, Logged, Rate Limited Meta data about attributes Source, Source Type, When created, Derived?, Derived Source, Score Rules for ‘enhancement’ Rules for recursion Scoring methodology (accuracy and relative prioritization) 6
  • 7. Example – Email Enhancer “Brad Feld” vs “Brad” Data/Time Score State Value
  • 8. Key Takeaways Worry about integration both external and internal to your application Lots of good work on the external issues…take advantage of it! Create a strong object model for internal data representation (workers, meta data, engines) so you can perform concise/discrete operations
  • 9. Additional Info GIST API coming out this Summer Direct interface to Fragments Standard and Third party Enhancer support @stevepnewman, @gist
  • 10. « We know now that the source of wealth is something specificallyhuman : knowledge. Applied to tasksthatwealready know how to do, itbecomes'productivity'. Applied to tasksthat are new and differentwe call it'innovation'. Onlyknowledgeallows us to achievethesetwo goals. » Peter Drucker Management challenges of the XXIst Century-1999