SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Big Data in Today’s Businesses
Presenter: Salman Jaffer, CFA
March 22nd 2018
REUTERS / Salman Jaffer, CFA
2Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Contents
• Introduction
• What is Big Data?
• Big, Open and Linked Data (BOLD)
• Application Programming Interfaces
3 Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Introduction
4Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Introduction
Salman Jaffer, CFA
Singapore Technology Lead, TMS
Office: (65) 6870 3563
salman.jaffer@thomsonreuters.com
Salman is a Chartered Financial Analyst and has led many financial services risk, trading and technology
implementations from concept to finish across the globe for over 15 years.
Previously, as Head of Data Science at Sentifi, Salman combined his rare skill set of strong knowledge in
technology and finance to formulate and deliver unique solutions to classical machine learning and
advanced deep learning problems. Salman holds a degree and a number of professional qualifications in
the fields of Computer Science, Machine Learning, Big Data and Finance.
He is currently the Head of TMS / BOLD in Singapore focusing on NLP Problems for clients.
In his spare time, Salman likes rock climbing, Muay Thai and running.
Domain
Expertise
Statistical
Skills
Software
Engineering Skills
Data Science + some other skills
Figure 1: What is Data Science?
5
What is Big Data?
• 2.5 million price updates per second
• More than 3,000 data experts managing TR Data Globally
• Over 12,000 software engineers, systems architects, operations
experts, information security specialists, technical support analysts
and data scientists
• Over 30 Billion triples in our Knowledge Graph
• 1 Billion people worldwide read or see Reuters news every day
• Over 30 years of expertise managing People data such as PEPs
• Training Intelligent Tagging Models since 2007
• FX Trading Community of 4,000+ institutions and 15,000+ users in
more than
120 countries.
• 2,500 journalists in 200 locations worldwide in 16 languages
• 60,000 terabytes of data in our data centers (The U.S. Library of
Congress contains 200 terabytes of data, and the total size of
Wikipedia is 3 terabytes).
• Over 50,000 developers use our APIs globally
• 850,000 photos and images are captured and published by
Reuters every year
• Thomson Reuters Regulatory Intelligence includes global
coverage of over 750 regulatory bodies
Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
6
Big, Open and Linked Data
• BOLD stands for Big, Open and Linked Data
• Big. Large sets of data. Reuters has over 60,000 TBs of data
• Open. Publicly available data such as Google, Reddit, Stanford Core NLP, Thomson Reuters NewsScope Data
• Linked. Using PermID and methodologies to link Organizations, People, Topics Events and Facts
• Data. News Feeds, Analyst Research, Global Filings, Call Transcripts
TRIT
• Open Calais
• PermID
• Contextual Tagging
• DIY
Thomson Reuters Intelligent
Tagging
Knowledge Graph
• Subject-Object-Predicate
• Draw Relationships
• Distance and Relevance
• Provide via a Graph Feed
Knowledge Graph in RDF available
via the Graph Feed
Data Fusion
• Map. Public and Private Data
• Stitch. Un/Structured Data
• Tag. Using TRIT
• Index. Speed and Scale
Graph management, integration
and analytics platform
Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
7
Application Programming Interfaces
Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Big Data in Businesses Today :
Salman JAFFER, CFA
Technology Lead, TMS
Thomson Reuters
Big Data Landscape
2018
Big Data in Today’s Businesses
Presenter: Salman Jaffer, CFA
March 22nd 2018
REUTERS / Salman Jaffer, CFA
Contents
• Why Now for Big Data?
• Trends in Big Data
• Big Data Technologies
• Applications of Big Data Technologies
https://xkcd.com/1897/
Why Now for Big Data?
Evolution of Storage, Processing, Networks
and Bandwidth
Evolution of Mobile Hardware
Evolution of Internet of Things
Trends in Big Data
Trends in Big Data – Mobile Technology
• Accelerometer
• Gyroscope
• Linear Acceleration
• Magnetometer
• GPS
• Barometer
• Proximity Sensor
• Ambient Light Sensor
• Infrared Sensor
• Ambient Temperature
• Relative Humidity
• Fingerprint
• Microphone
• Camera
Combine sensor data from:
• Watches
• Phones
• Computers
• Home
• Transport
• Work
• Environment
• Interactions
• Internet
Internet vs. Artificial Intelligence Era
• Strategic Data Acquisition - Almost any reputable Data Science Team can get their hands on some great
computing power via Nvidia, AWS, GCP or Azure. Papers are widely published on various approaches to
develop the deepest and widest neural network but the one thing AI companies such as Google, WeChat,
Baidu and Facebook have done, is to build a moat around themselves, capturing their users data
• Many of the ways in which organizations need to approach these problems has changed, in the same way
the shift from waterfall to agile took a generation, in the same way the shift from the internet era to the
AI era will also take time
Internet Era AI Era
A/B Testing Strategic Data Acquisition
Bricks and Mortar -> eCommerce Tech Company -> AI Company
Decision making from CxOs Decisions made by Product Managers and Engineers
Internet Company AI Company
Many Databases Unified Data Lakes
Short Cycles Training Epochs
Traditional Job Descriptions New Job Descriptions
Wireframes Design Thinking
Big Data Technologies
Big Data Technologies
Big Data Technologies – Elastic Search
• Distributed search and analytics engine design for horizontal scalability
• Important Terms
• Cluster – Collection of nodes
• Node – Single server, part of a cluster
• Index – Collection of shards akin to a `database`
• Shard – Collection of documents
• Type – Category within an index akin to a `database table`
• Document – A record, JSON object
More Info:
slideshare.net/duydo/elasticsearch-for-data-
engineers
Relational Non-Relational
SQL No-SQL
SQL Server, Oracle BigTable, ElasticSearch
Enterprise Open Source
Pre-defined Sizes Elastic Scalability
Column-Row Store Document Store
Pre-defined Data Model Dynamic Mapping
ACID
• Atomicity
• Consistency
• Isolation
• Durability
Brewer’s CAP
• Consistency
• Availability
• Partition Tolerance
Applications of
Big Data Technologies
Applications of Big Data Technologies
Computer Vision
• Captcha
• OCR
• Images
• Video
Speech Recognition
• Speech to Text
• Sentiment Analysis
• Security
Natural Language
Processing
• Named Entity
Recognition
• Sentiment Analysis
• Natural Language
Generation
Robotics
• Warehouse
Logistics
• Assisted Living
• Drones
Robot and Frank, 2012
www.loseit.com
Applications of Big Data in Financial Services
• Satellite Images
• Credit Card Transaction Data
• Trade Processing
• High Frequency Trading
• Algorithmic Trading
• Investment decision making support
• Customer churn prediction
• Retail sales trends analysis and prediction
• Research Automation
• Fraud Analysis
How can I learn more about Big Data?
• Big Data Leaders
• Andrew Ng
• Fei Fei Li
• Yann LeCun
• Richard Soucher
• Demis Hassabis
• Research Papers, Blogs and Videos
• CB Insights.com
• arxiv.org - Recent trends in Deep Learning Based NLP
• NLP News
• Online Courses and Competitions
• Coursera
• Stanford Online
• Kaggle
Question
I want to learn more about…
– Jurgen Schmidhuber
– Alex Pentland
– Corinna Cortes
– Daphne Koller
– Hilary Mason
– Doug Cutting
– Kirk Borne
– Gilberto Titericz Jr.
– Stanislav Semenov
– Monica Rogati
– Heroes of Deep Learning - YouTube
– Thomson Reuters – Harvard Business Publishing
– Stanford Unversity School of Engineering
• Tools
– Jupyter Notebook
– Google Cloud Platform
– Amazon SageMaker
– Chris Manning
– Yoshua Bengio
– Geoff Hinton
– David Blei
– Nando de Freitas
– Andrej Karpathy
– Ian Goodfellow
– Ilya Sutskever
– Daniela Rus
– Yoav Goldberg
– Data Camp
– HortonWorks
– CodeCademy
THANK YOU

Contenu connexe

Tendances

Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 finalAmjid Ali
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data can be fun!
Big Data can be fun!Big Data can be fun!
Big Data can be fun!Bruno Aziza
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesShilpi Sharma
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for BeginnersMichael Perez
 
The Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldThe Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldPYA, P.C.
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big DataRené Kuipers
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data typesPro Guide
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 

Tendances (20)

Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Big data
Big dataBig data
Big data
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big data
Big dataBig data
Big data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data can be fun!
Big Data can be fun!Big Data can be fun!
Big Data can be fun!
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & Challenges
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for Beginners
 
The Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldThe Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient World
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data types
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Jobs Complexity
Jobs ComplexityJobs Complexity
Jobs Complexity
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 

Similaire à Big Data in Today’s Businesses - Salman Jaffer

Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageSteven Ramage
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data! B Spot
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applicationsPadma Metta
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )fmarukanda
 

Similaire à Big Data in Today’s Businesses - Salman Jaffer (20)

Big data
Big dataBig data
Big data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data!
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 

Dernier

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Dernier (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Big Data in Today’s Businesses - Salman Jaffer

  • 1. Big Data in Today’s Businesses Presenter: Salman Jaffer, CFA March 22nd 2018 REUTERS / Salman Jaffer, CFA
  • 2. 2Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters Contents • Introduction • What is Big Data? • Big, Open and Linked Data (BOLD) • Application Programming Interfaces
  • 3. 3 Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters Introduction
  • 4. 4Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters Introduction Salman Jaffer, CFA Singapore Technology Lead, TMS Office: (65) 6870 3563 salman.jaffer@thomsonreuters.com Salman is a Chartered Financial Analyst and has led many financial services risk, trading and technology implementations from concept to finish across the globe for over 15 years. Previously, as Head of Data Science at Sentifi, Salman combined his rare skill set of strong knowledge in technology and finance to formulate and deliver unique solutions to classical machine learning and advanced deep learning problems. Salman holds a degree and a number of professional qualifications in the fields of Computer Science, Machine Learning, Big Data and Finance. He is currently the Head of TMS / BOLD in Singapore focusing on NLP Problems for clients. In his spare time, Salman likes rock climbing, Muay Thai and running. Domain Expertise Statistical Skills Software Engineering Skills Data Science + some other skills Figure 1: What is Data Science?
  • 5. 5 What is Big Data? • 2.5 million price updates per second • More than 3,000 data experts managing TR Data Globally • Over 12,000 software engineers, systems architects, operations experts, information security specialists, technical support analysts and data scientists • Over 30 Billion triples in our Knowledge Graph • 1 Billion people worldwide read or see Reuters news every day • Over 30 years of expertise managing People data such as PEPs • Training Intelligent Tagging Models since 2007 • FX Trading Community of 4,000+ institutions and 15,000+ users in more than 120 countries. • 2,500 journalists in 200 locations worldwide in 16 languages • 60,000 terabytes of data in our data centers (The U.S. Library of Congress contains 200 terabytes of data, and the total size of Wikipedia is 3 terabytes). • Over 50,000 developers use our APIs globally • 850,000 photos and images are captured and published by Reuters every year • Thomson Reuters Regulatory Intelligence includes global coverage of over 750 regulatory bodies Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
  • 6. 6 Big, Open and Linked Data • BOLD stands for Big, Open and Linked Data • Big. Large sets of data. Reuters has over 60,000 TBs of data • Open. Publicly available data such as Google, Reddit, Stanford Core NLP, Thomson Reuters NewsScope Data • Linked. Using PermID and methodologies to link Organizations, People, Topics Events and Facts • Data. News Feeds, Analyst Research, Global Filings, Call Transcripts TRIT • Open Calais • PermID • Contextual Tagging • DIY Thomson Reuters Intelligent Tagging Knowledge Graph • Subject-Object-Predicate • Draw Relationships • Distance and Relevance • Provide via a Graph Feed Knowledge Graph in RDF available via the Graph Feed Data Fusion • Map. Public and Private Data • Stitch. Un/Structured Data • Tag. Using TRIT • Index. Speed and Scale Graph management, integration and analytics platform Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
  • 7. 7 Application Programming Interfaces Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
  • 8. Big Data in Businesses Today : Salman JAFFER, CFA Technology Lead, TMS Thomson Reuters Big Data Landscape 2018
  • 9. Big Data in Today’s Businesses Presenter: Salman Jaffer, CFA March 22nd 2018 REUTERS / Salman Jaffer, CFA
  • 10. Contents • Why Now for Big Data? • Trends in Big Data • Big Data Technologies • Applications of Big Data Technologies https://xkcd.com/1897/
  • 11. Why Now for Big Data?
  • 12. Evolution of Storage, Processing, Networks and Bandwidth
  • 16. Trends in Big Data – Mobile Technology • Accelerometer • Gyroscope • Linear Acceleration • Magnetometer • GPS • Barometer • Proximity Sensor • Ambient Light Sensor • Infrared Sensor • Ambient Temperature • Relative Humidity • Fingerprint • Microphone • Camera Combine sensor data from: • Watches • Phones • Computers • Home • Transport • Work • Environment • Interactions • Internet
  • 17. Internet vs. Artificial Intelligence Era • Strategic Data Acquisition - Almost any reputable Data Science Team can get their hands on some great computing power via Nvidia, AWS, GCP or Azure. Papers are widely published on various approaches to develop the deepest and widest neural network but the one thing AI companies such as Google, WeChat, Baidu and Facebook have done, is to build a moat around themselves, capturing their users data • Many of the ways in which organizations need to approach these problems has changed, in the same way the shift from waterfall to agile took a generation, in the same way the shift from the internet era to the AI era will also take time Internet Era AI Era A/B Testing Strategic Data Acquisition Bricks and Mortar -> eCommerce Tech Company -> AI Company Decision making from CxOs Decisions made by Product Managers and Engineers Internet Company AI Company Many Databases Unified Data Lakes Short Cycles Training Epochs Traditional Job Descriptions New Job Descriptions Wireframes Design Thinking
  • 20. Big Data Technologies – Elastic Search • Distributed search and analytics engine design for horizontal scalability • Important Terms • Cluster – Collection of nodes • Node – Single server, part of a cluster • Index – Collection of shards akin to a `database` • Shard – Collection of documents • Type – Category within an index akin to a `database table` • Document – A record, JSON object More Info: slideshare.net/duydo/elasticsearch-for-data- engineers Relational Non-Relational SQL No-SQL SQL Server, Oracle BigTable, ElasticSearch Enterprise Open Source Pre-defined Sizes Elastic Scalability Column-Row Store Document Store Pre-defined Data Model Dynamic Mapping ACID • Atomicity • Consistency • Isolation • Durability Brewer’s CAP • Consistency • Availability • Partition Tolerance
  • 21. Applications of Big Data Technologies
  • 22. Applications of Big Data Technologies Computer Vision • Captcha • OCR • Images • Video Speech Recognition • Speech to Text • Sentiment Analysis • Security Natural Language Processing • Named Entity Recognition • Sentiment Analysis • Natural Language Generation Robotics • Warehouse Logistics • Assisted Living • Drones Robot and Frank, 2012 www.loseit.com
  • 23. Applications of Big Data in Financial Services • Satellite Images • Credit Card Transaction Data • Trade Processing • High Frequency Trading • Algorithmic Trading • Investment decision making support • Customer churn prediction • Retail sales trends analysis and prediction • Research Automation • Fraud Analysis
  • 24. How can I learn more about Big Data? • Big Data Leaders • Andrew Ng • Fei Fei Li • Yann LeCun • Richard Soucher • Demis Hassabis • Research Papers, Blogs and Videos • CB Insights.com • arxiv.org - Recent trends in Deep Learning Based NLP • NLP News • Online Courses and Competitions • Coursera • Stanford Online • Kaggle Question I want to learn more about… – Jurgen Schmidhuber – Alex Pentland – Corinna Cortes – Daphne Koller – Hilary Mason – Doug Cutting – Kirk Borne – Gilberto Titericz Jr. – Stanislav Semenov – Monica Rogati – Heroes of Deep Learning - YouTube – Thomson Reuters – Harvard Business Publishing – Stanford Unversity School of Engineering • Tools – Jupyter Notebook – Google Cloud Platform – Amazon SageMaker – Chris Manning – Yoshua Bengio – Geoff Hinton – David Blei – Nando de Freitas – Andrej Karpathy – Ian Goodfellow – Ilya Sutskever – Daniela Rus – Yoav Goldberg – Data Camp – HortonWorks – CodeCademy