1. Big Data in Today’s Businesses
Presenter: Salman Jaffer, CFA
March 22nd 2018
REUTERS / Salman Jaffer, CFA
2. 2Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Contents
• Introduction
• What is Big Data?
• Big, Open and Linked Data (BOLD)
• Application Programming Interfaces
3. 3 Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Introduction
4. 4Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
Introduction
Salman Jaffer, CFA
Singapore Technology Lead, TMS
Office: (65) 6870 3563
salman.jaffer@thomsonreuters.com
Salman is a Chartered Financial Analyst and has led many financial services risk, trading and technology
implementations from concept to finish across the globe for over 15 years.
Previously, as Head of Data Science at Sentifi, Salman combined his rare skill set of strong knowledge in
technology and finance to formulate and deliver unique solutions to classical machine learning and
advanced deep learning problems. Salman holds a degree and a number of professional qualifications in
the fields of Computer Science, Machine Learning, Big Data and Finance.
He is currently the Head of TMS / BOLD in Singapore focusing on NLP Problems for clients.
In his spare time, Salman likes rock climbing, Muay Thai and running.
Domain
Expertise
Statistical
Skills
Software
Engineering Skills
Data Science + some other skills
Figure 1: What is Data Science?
5. 5
What is Big Data?
• 2.5 million price updates per second
• More than 3,000 data experts managing TR Data Globally
• Over 12,000 software engineers, systems architects, operations
experts, information security specialists, technical support analysts
and data scientists
• Over 30 Billion triples in our Knowledge Graph
• 1 Billion people worldwide read or see Reuters news every day
• Over 30 years of expertise managing People data such as PEPs
• Training Intelligent Tagging Models since 2007
• FX Trading Community of 4,000+ institutions and 15,000+ users in
more than
120 countries.
• 2,500 journalists in 200 locations worldwide in 16 languages
• 60,000 terabytes of data in our data centers (The U.S. Library of
Congress contains 200 terabytes of data, and the total size of
Wikipedia is 3 terabytes).
• Over 50,000 developers use our APIs globally
• 850,000 photos and images are captured and published by
Reuters every year
• Thomson Reuters Regulatory Intelligence includes global
coverage of over 750 regulatory bodies
Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
6. 6
Big, Open and Linked Data
• BOLD stands for Big, Open and Linked Data
• Big. Large sets of data. Reuters has over 60,000 TBs of data
• Open. Publicly available data such as Google, Reddit, Stanford Core NLP, Thomson Reuters NewsScope Data
• Linked. Using PermID and methodologies to link Organizations, People, Topics Events and Facts
• Data. News Feeds, Analyst Research, Global Filings, Call Transcripts
TRIT
• Open Calais
• PermID
• Contextual Tagging
• DIY
Thomson Reuters Intelligent
Tagging
Knowledge Graph
• Subject-Object-Predicate
• Draw Relationships
• Distance and Relevance
• Provide via a Graph Feed
Knowledge Graph in RDF available
via the Graph Feed
Data Fusion
• Map. Public and Private Data
• Stitch. Un/Structured Data
• Tag. Using TRIT
• Index. Speed and Scale
Graph management, integration
and analytics platform
Big Data in Today’s Businesses – Salman Jaffer, CFA. Thomson Reuters
16. Trends in Big Data – Mobile Technology
• Accelerometer
• Gyroscope
• Linear Acceleration
• Magnetometer
• GPS
• Barometer
• Proximity Sensor
• Ambient Light Sensor
• Infrared Sensor
• Ambient Temperature
• Relative Humidity
• Fingerprint
• Microphone
• Camera
Combine sensor data from:
• Watches
• Phones
• Computers
• Home
• Transport
• Work
• Environment
• Interactions
• Internet
17. Internet vs. Artificial Intelligence Era
• Strategic Data Acquisition - Almost any reputable Data Science Team can get their hands on some great
computing power via Nvidia, AWS, GCP or Azure. Papers are widely published on various approaches to
develop the deepest and widest neural network but the one thing AI companies such as Google, WeChat,
Baidu and Facebook have done, is to build a moat around themselves, capturing their users data
• Many of the ways in which organizations need to approach these problems has changed, in the same way
the shift from waterfall to agile took a generation, in the same way the shift from the internet era to the
AI era will also take time
Internet Era AI Era
A/B Testing Strategic Data Acquisition
Bricks and Mortar -> eCommerce Tech Company -> AI Company
Decision making from CxOs Decisions made by Product Managers and Engineers
Internet Company AI Company
Many Databases Unified Data Lakes
Short Cycles Training Epochs
Traditional Job Descriptions New Job Descriptions
Wireframes Design Thinking
20. Big Data Technologies – Elastic Search
• Distributed search and analytics engine design for horizontal scalability
• Important Terms
• Cluster – Collection of nodes
• Node – Single server, part of a cluster
• Index – Collection of shards akin to a `database`
• Shard – Collection of documents
• Type – Category within an index akin to a `database table`
• Document – A record, JSON object
More Info:
slideshare.net/duydo/elasticsearch-for-data-
engineers
Relational Non-Relational
SQL No-SQL
SQL Server, Oracle BigTable, ElasticSearch
Enterprise Open Source
Pre-defined Sizes Elastic Scalability
Column-Row Store Document Store
Pre-defined Data Model Dynamic Mapping
ACID
• Atomicity
• Consistency
• Isolation
• Durability
Brewer’s CAP
• Consistency
• Availability
• Partition Tolerance
22. Applications of Big Data Technologies
Computer Vision
• Captcha
• OCR
• Images
• Video
Speech Recognition
• Speech to Text
• Sentiment Analysis
• Security
Natural Language
Processing
• Named Entity
Recognition
• Sentiment Analysis
• Natural Language
Generation
Robotics
• Warehouse
Logistics
• Assisted Living
• Drones
Robot and Frank, 2012
www.loseit.com
23. Applications of Big Data in Financial Services
• Satellite Images
• Credit Card Transaction Data
• Trade Processing
• High Frequency Trading
• Algorithmic Trading
• Investment decision making support
• Customer churn prediction
• Retail sales trends analysis and prediction
• Research Automation
• Fraud Analysis
24. How can I learn more about Big Data?
• Big Data Leaders
• Andrew Ng
• Fei Fei Li
• Yann LeCun
• Richard Soucher
• Demis Hassabis
• Research Papers, Blogs and Videos
• CB Insights.com
• arxiv.org - Recent trends in Deep Learning Based NLP
• NLP News
• Online Courses and Competitions
• Coursera
• Stanford Online
• Kaggle
Question
I want to learn more about…
– Jurgen Schmidhuber
– Alex Pentland
– Corinna Cortes
– Daphne Koller
– Hilary Mason
– Doug Cutting
– Kirk Borne
– Gilberto Titericz Jr.
– Stanislav Semenov
– Monica Rogati
– Heroes of Deep Learning - YouTube
– Thomson Reuters – Harvard Business Publishing
– Stanford Unversity School of Engineering
• Tools
– Jupyter Notebook
– Google Cloud Platform
– Amazon SageMaker
– Chris Manning
– Yoshua Bengio
– Geoff Hinton
– David Blei
– Nando de Freitas
– Andrej Karpathy
– Ian Goodfellow
– Ilya Sutskever
– Daniela Rus
– Yoav Goldberg
– Data Camp
– HortonWorks
– CodeCademy