Overview of modern software ecosystem for big data analysis

•

3 j'aime•1,053 vues

Brief summary of modern software available today to provide the core infrastructure to provide collection and analysis of big data collected from sensors (internet of everything). Presented at the Dec 2015 Trillion Sensors Summit in Orlando FL.

Logiciels

Overview of Modern Software Ecosystem
for Big Data Analysis
Michael Bryzek
mbryzek@alum.mit.edu / @mbryzek
Co-Founder and Chairman Flow Commerce
Co-Founder and ex-CTO Gilt Groupe
Trillion Sensors Summit - Dec 9 2015

Overview of modern practices related to software architecture for high volume
big data applications
Encourage reuse of infrastructure that has already been built so you can
focus on analysis and information
Goals

Representational State Transfer (REST)
a uniform connector interface
● Resources - “nouns”
● Clear set of limited methods
● Standard (e.g. authorization)
Cost of integration of nth
service approaches 0
Roy Thomas Fielding’s Dissertation - https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf
examples
Stripe
Twilio
Github

Frameworks for REST
API first - the most critical design element
● http://apidoc.me *
● http://swagger.io
● http://apiary.io
Some companies (incl. Amazon) focus on API
and care very little about the implementation.
* my personal open source project

JSON - The “fat-free” alternative to XML

Javascript Object Notation (JSON)
It’s just javascript - the most widely adopted
programming language in the world
Pros Cons
Simple
Readable
Dense
Still verbose
No strong typing
CPU overhead

Binary Protocols - Ideal for sensor data
Key Features
● Language to describe schema
● Space efficient
● Fast serialization / deserialization
Leading Protocols
● Protocol Buffers https://developers.google.com/protocol-buffers
● Avro https://avro.apache.org/ - tight integration with Hadoop
● Thrift https://thrift.apache.org/

Data processing and analytics
From https://aws.amazon.com/iot/

Data Platforms
● https://aws.amazon.com/iot/ - Amazon Kinesis, S3, Redshift, IOT -
● http://influxdata.com -open source time series database + analytics platform*
● http://confluent.io - data pipeline / real time processing built by Jay Kreps
● http://spark.apache.org/ - UC Berkeley / Cloudera led effort
Currently seeing high activity and investment in both open source and commercial
ventures.
* I am an investor in influx

Summary and Recommendation
Learning from history of evolution of software on internet…
● Define standards for interconnectivity (ala REST)
○ Avoid standards for data types (e.g. ECG)
● Choose simplicity as number one requirement
○ Avoid XML
● Adopt existing binary protocols, w/ code generation at boundaries
○ Avoid creating new protocols focused on last 5-10% improvement
● Adopt existing messaging / storage platforms for large data sets
Keeping up to date: https://www.thoughtworks.com/radar

Thank you
Michael Bryzek
mbryzek@alum.mit.edu / @mbryzek
Co-Founder and Chairman Flow Commerce
Co-Founder and ex-CTO Gilt
Trillion Sensors Summit - Dec 9 2015

Contenu connexe

Tendances

Case Study: How did we reduce the build time to one fifth?Péter Takács

Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1Rodolfo Finochietti

Azure FunctionsRodolfo Finochietti

Logic Apps – DeploymentsBizTalk360

Blazing fast sites using Blaze, Hybrid CMS NYCJesus Manuel Olivas

O365Con18 - Working with PowerShell, VS Code and GitHub - Thomas VochtenNCCOMMS

London React August - GraphQL at The Financial Times - Viktor CharyparReact London Community

Create a modern(ish) BAM portal in (roughly) one hour!BizTalk360

C#: Past, Present and FutureRodolfo Finochietti

Tce automation-d4Tikal Knowledge

Taking Control of your Data with GraphQLVinci Rufus

Lessons learned: Choosing your documentation systemPronovix

Advancing Your API Strategy in an Infrastructure WorldPronovix

Introduction to GrailsHiten Pratap Singh

CI/CD with Bitbucket pipelinesTheophilus Omoregbee

GraphQL in an Age of RESTYos Riady

Getting started with mobile application developmentColdFusionConference

Azure Integration in Production with Logic Apps and moreBizTalk360

Azure Web JobsBizTalk360

Net developer days presentationAlexandre Malavasi

Tendances (20)

Case Study: How did we reduce the build time to one fifth?

Que hay de nuevo en Visual Studio 2013 y ASP.NET 5.1

Azure Functions

Logic Apps – Deployments

Blazing fast sites using Blaze, Hybrid CMS NYC

O365Con18 - Working with PowerShell, VS Code and GitHub - Thomas Vochten

London React August - GraphQL at The Financial Times - Viktor Charypar

Create a modern(ish) BAM portal in (roughly) one hour!

C#: Past, Present and Future

Tce automation-d4

Taking Control of your Data with GraphQL

Lessons learned: Choosing your documentation system

Advancing Your API Strategy in an Infrastructure World

Introduction to Grails

CI/CD with Bitbucket pipelines

GraphQL in an Age of REST

Getting started with mobile application development

Azure Integration in Production with Logic Apps and more

Azure Web Jobs

Net developer days presentation

Similaire à Overview of modern software ecosystem for big data analysis

DevOps-RoadmapBnhNguynHuy1

Enterprise guide to building a Data MeshSion Smith

The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4

Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...MakoLab SA

Industry Ontologies: Case Studies in Creating and Extending Schema.org sopekmir

Better integrations through open interfacesSteve Speicher

Sharepoint 2010 architecture, ha and dr (tig)Tihomir Ignatov

Analysis of Major Trends in Big Data AnalyticsDataWorks Summit/Hadoop Summit

Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi

NLP and the Webmattthemathman

Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)Sergio Fernández

Msr2021 tutorial-di pentaMassimiliano Di Penta

[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...Srijan Technologies

I want to be a Data DJ!Paul Groth

Building and deploying LLM applications with Apache AirflowKaxil Naik

IWMW 2002: Web standards briefing (session C2)IWMW

Big Data Analytics from Azure Cloud to Power BI MobileRoy Kim

Cloud Native DevelopmentManuel Garcia

Deploying and Managing Artificial Intelligence Services using the Open Data H...Orgad Kimchi

Similaire à Overview of modern software ecosystem for big data analysis (20)

DevOps-Roadmap

Enterprise guide to building a Data Mesh

The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh

Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...

Industry Ontologies: Case Studies in Creating and Extending Schema.org

Better integrations through open interfaces

Sharepoint 2010 architecture, ha and dr (tig)

Analysis of Major Trends in Big Data Analytics

Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit

NLP and the Web

Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)

Msr2021 tutorial-di penta

[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...

I want to be a Data DJ!

Building and deploying LLM applications with Apache Airflow

IWMW 2002: Web standards briefing (session C2)

Big Data Analytics from Azure Cloud to Power BI Mobile

Cloud Native Development

Deploying and Managing Artificial Intelligence Services using the Open Data H...

Dernier

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky

Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed

英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0

Advantages of Odoo ERP 17 for Your BusinessEnvertis Software Solutions

What is Advanced Excel and what are some best practices for designing and cre...Technogeeks

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

Sending Calendar Invites on SES and Calendarsnack.pdf31events.com

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services

Recruitment Management Software Benefits (Infographic)Hr365.us smith

Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki

Precise and Complete Requirements? An Elusive GoalLionel Briand

Cyber security and its impact on E commercemanigoyal112

Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz

Odoo Development Company in India | Devintelle Consulting ServiceDevintelle Consulting Service Pvt Ltd Odoo OpenERP

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Dernier (20)

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...

Unveiling Design Patterns: A Visual Guide with UML Diagrams

英国UN学位证,北安普顿大学毕业证书1:1制作

Advantages of Odoo ERP 17 for Your Business

What is Advanced Excel and what are some best practices for designing and cre...

Unveiling the Future: Sylius 2.0 New Features

Sending Calendar Invites on SES and Calendarsnack.pdf

Implementing Zero Trust strategy with Azure

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Odoo 14 - eLearning Module In Odoo 14 Enterprise

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...

Recruitment Management Software Benefits (Infographic)

Machine Learning Software Engineering Patterns and Their Engineering

Precise and Complete Requirements? An Elusive Goal

Cyber security and its impact on E commerce

Folding Cheat Sheet #4 - fourth in a series

Odoo Development Company in India | Devintelle Consulting Service

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Overview of modern software ecosystem for big data analysis

1. Overview of Modern Software Ecosystem for Big Data Analysis Michael Bryzek mbryzek@alum.mit.edu / @mbryzek Co-Founder and Chairman Flow Commerce Co-Founder and ex-CTO Gilt Groupe Trillion Sensors Summit - Dec 9 2015

2. Overview of modern practices related to software architecture for high volume big data applications Encourage reuse of infrastructure that has already been built so you can focus on analysis and information Goals

3. Representational State Transfer (REST) a uniform connector interface ● Resources - “nouns” ● Clear set of limited methods ● Standard (e.g. authorization) Cost of integration of nth service approaches 0 Roy Thomas Fielding’s Dissertation - https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf examples Stripe Twilio Github

4. Frameworks for REST API first - the most critical design element ● http://apidoc.me * ● http://swagger.io ● http://apiary.io Some companies (incl. Amazon) focus on API and care very little about the implementation. * my personal open source project

5. JSON - The “fat-free” alternative to XML

6. Javascript Object Notation (JSON) It’s just javascript - the most widely adopted programming language in the world Pros Cons Simple Readable Dense Still verbose No strong typing CPU overhead

8. Binary Protocols - Ideal for sensor data Key Features ● Language to describe schema ● Space efficient ● Fast serialization / deserialization Leading Protocols ● Protocol Buffers https://developers.google.com/protocol-buffers ● Avro https://avro.apache.org/ - tight integration with Hadoop ● Thrift https://thrift.apache.org/

9. Example: Avro Schema Definition

10. Data processing and analytics From https://aws.amazon.com/iot/

11. Data Platforms ● https://aws.amazon.com/iot/ - Amazon Kinesis, S3, Redshift, IOT - ● http://influxdata.com -open source time series database + analytics platform* ● http://confluent.io - data pipeline / real time processing built by Jay Kreps ● http://spark.apache.org/ - UC Berkeley / Cloudera led effort Currently seeing high activity and investment in both open source and commercial ventures. * I am an investor in influx

12. Summary and Recommendation Learning from history of evolution of software on internet… ● Define standards for interconnectivity (ala REST) ○ Avoid standards for data types (e.g. ECG) ● Choose simplicity as number one requirement ○ Avoid XML ● Adopt existing binary protocols, w/ code generation at boundaries ○ Avoid creating new protocols focused on last 5-10% improvement ● Adopt existing messaging / storage platforms for large data sets Keeping up to date: https://www.thoughtworks.com/radar

13. Thank you Michael Bryzek mbryzek@alum.mit.edu / @mbryzek Co-Founder and Chairman Flow Commerce Co-Founder and ex-CTO Gilt Trillion Sensors Summit - Dec 9 2015

Overview of modern software ecosystem for big data analysis

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Overview of modern software ecosystem for big data analysis

Similaire à Overview of modern software ecosystem for big data analysis (20)

Dernier

Dernier (20)

Overview of modern software ecosystem for big data analysis