SlideShare une entreprise Scribd logo
1  sur  12
Structuring big data
 Mark Wilson
 January 2012




#CloudCamp              UNCLASSIFIED   © Copyright 2012 Fujitsu Services Limited
The problem with big data: and a solution
The problem:
        “New reference architectures will include both big data and enterprise
         data warehouses”
                                                              [IDC, 19 January 2012]
        Two worlds: structured and unstructured data (plus external data
         sources, documents stored in structured databases, etc.)
        Siloes create issues with management, integration, etc.
The solution:
        Linked data – a single reference point for all data in the enterprise




#CloudCamp                                 1                                 UNCLASSIFIED
Some history



               Fixed structure
                   Difficult to change schema
               Simple reporting capabilities
                   Complex to create new reports




#CloudCamp                     2                    UNCLASSIFIED
Some history


                   Completed
                    transactions
                    transferred to separate
                    database for analysis
                       “Data warehouse”
                   Better reporting, data
                    mining, etc.
                       Still highly structured
                   Data is historical
                       May be aggregated




#CloudCamp     3                            UNCLASSIFIED
The smart guys



Real-time update of completed
 transactions
        Transactions moved to data warehouse
         upon completion
        Smaller transactional database
Allows for alerts to be generated when
 specific conditions met and action
 taken




#CloudCamp                             4        UNCLASSIFIED
A third “data silo”



                      Masses of unstructured/semi-
                       structured data being processed in
                       NoSQL databases
                      May, or may not be transferred
                       to/from structured databases
                          Time-consuming and inefficient
                      Three types of data, each with their
                       own limitations and own
                       management considerations




#CloudCamp                   5                              UNCLASSIFIED
Data everywhere!




#CloudCamp         6   UNCLASSIFIED
Linked Data
Tie records together – even from separate data sets
We can express as triples with a specific grammar:




Build up a graph to show machine-readable data in human
 form




#CloudCamp                     7                       UNCLASSIFIED
Then add lots more data…




Source: http://lod-cloud.net/
        Each node is itself another graph (zoom in)
#CloudCamp                               8             UNCLASSIFIED
Aren’t we missing a trick?
Use linked data as a the
 optimal reference source
        Broker of all data sources
Single view on structured and
 unstructured data
        Bring in external sources too
Mapping, interconnecting,
 indexing and feeding
        In real time
Query linked data to derive
 new value from old
        Infer relationships
        Gain new insights


#CloudCamp                               9   UNCLASSIFIED
About the author
Mark Wilson, Strategy Manager, Fujitsu
Mark is an analyst working within Fujitsu’s UK and
Ireland Office of the CTO, providing thought
leadership both internally and to customers,
shaping business and technology strategy. He has
17 years' experience of working in the IT industry,
12 of which have been with Fujitsu. Mark has a
background in leading large IT infrastructure
projects with customers in the UK, mainland
Europe and Australia. He has a degree in
Computer Studies from the University of
Glamorgan. Mark is also active in social media and
won the Individual IT Professional (Male) award in
the 2010 Computer Weekly IT Blog Awards. Mark
may be found on Twitter @markwilsonit.

If you would like to comment on the topics in this
presentation, Mark would welcome your feedback,
by email to mark.a.wilson@uk.fujitsu.com.

Contenu connexe

Tendances

Mobile dbms
Mobile dbmsMobile dbms
Mobile dbms
Tech_MX
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS
koolkampus
 
Data Replication In Cloud Computing
Data Replication In Cloud ComputingData Replication In Cloud Computing
Data Replication In Cloud Computing
Rahul Garg
 
Data cube computation
Data cube computationData cube computation
Data cube computation
Rashmi Sheikh
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 

Tendances (20)

Mobile dbms
Mobile dbmsMobile dbms
Mobile dbms
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
RAID LEVELS
RAID LEVELSRAID LEVELS
RAID LEVELS
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS11. Storage and File Structure in DBMS
11. Storage and File Structure in DBMS
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Multidimensional data models
Multidimensional data  modelsMultidimensional data  models
Multidimensional data models
 
High–Performance Computing
High–Performance ComputingHigh–Performance Computing
High–Performance Computing
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Extensible hashing
Extensible hashingExtensible hashing
Extensible hashing
 
Data Models
Data ModelsData Models
Data Models
 
Data Replication In Cloud Computing
Data Replication In Cloud ComputingData Replication In Cloud Computing
Data Replication In Cloud Computing
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 

En vedette

En vedette (7)

Journey Through the AWS Cloud; Disaster Recovery
 Journey Through the AWS Cloud; Disaster Recovery Journey Through the AWS Cloud; Disaster Recovery
Journey Through the AWS Cloud; Disaster Recovery
 
Making a Cleaner Cloud with Open Source
Making a Cleaner Cloud with Open SourceMaking a Cleaner Cloud with Open Source
Making a Cleaner Cloud with Open Source
 
Making The Most Of Your Fears
Making The Most Of Your Fears Making The Most Of Your Fears
Making The Most Of Your Fears
 
Adaptive Brands
Adaptive BrandsAdaptive Brands
Adaptive Brands
 
Good presentations matter
Good presentations matterGood presentations matter
Good presentations matter
 
The History of Pets vs. Cattle ... And Using It Properly
The History of Pets vs. Cattle ... And Using It ProperlyThe History of Pets vs. Cattle ... And Using It Properly
The History of Pets vs. Cattle ... And Using It Properly
 
(Graham Brown mobileYouth) The London Riots - wtf?
(Graham Brown mobileYouth) The London Riots - wtf? (Graham Brown mobileYouth) The London Riots - wtf?
(Graham Brown mobileYouth) The London Riots - wtf?
 

Similaire à Structuring Big Data

Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computing
João Gabriel Lima
 
The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data Solution
DATAVERSITY
 

Similaire à Structuring Big Data (20)

Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
 
Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
NOSQL
NOSQLNOSQL
NOSQL
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
A novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computingA novel solution of distributed memory no sql database for cloud computing
A novel solution of distributed memory no sql database for cloud computing
 
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_singC cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
C cloud organizational_impacts_big_data_on-prem_vs_off-premise_john_sing
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
 
The Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data SolutionThe Top 5 Factors to Consider When Choosing a Big Data Solution
The Top 5 Factors to Consider When Choosing a Big Data Solution
 
No sql database
No sql databaseNo sql database
No sql database
 
Top 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data SolutionTop 5 Considerations for a Big Data Solution
Top 5 Considerations for a Big Data Solution
 
Snowflake Cloning.pdf
Snowflake Cloning.pdfSnowflake Cloning.pdf
Snowflake Cloning.pdf
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 

Plus de Fujitsu UK

Plus de Fujitsu UK (14)

Fujitsu Graduate and Industrial Placement Career Opportunities 2013
Fujitsu Graduate and Industrial Placement Career Opportunities 2013Fujitsu Graduate and Industrial Placement Career Opportunities 2013
Fujitsu Graduate and Industrial Placement Career Opportunities 2013
 
Futurology: art, science, nonsense?
Futurology: art, science, nonsense?Futurology: art, science, nonsense?
Futurology: art, science, nonsense?
 
High Performance Computing: Luxury, Vanity or Essential?
High Performance Computing: Luxury, Vanity or Essential?High Performance Computing: Luxury, Vanity or Essential?
High Performance Computing: Luxury, Vanity or Essential?
 
What do we know about the future, today? 12 changes and their implications fo...
What do we know about the future, today? 12 changes and their implications fo...What do we know about the future, today? 12 changes and their implications fo...
What do we know about the future, today? 12 changes and their implications fo...
 
What in the world?
What in the world?What in the world?
What in the world?
 
Separation Services from Fujitsu
Separation Services from FujitsuSeparation Services from Fujitsu
Separation Services from Fujitsu
 
Integration Services from Fujitsu
Integration Services from FujitsuIntegration Services from Fujitsu
Integration Services from Fujitsu
 
Technology, Inside the Black Box
Technology, Inside the Black BoxTechnology, Inside the Black Box
Technology, Inside the Black Box
 
Journey Into The Cloud
Journey Into The CloudJourney Into The Cloud
Journey Into The Cloud
 
Cloud Computing Infrastructure: Practical Insights
Cloud Computing Infrastructure: Practical InsightsCloud Computing Infrastructure: Practical Insights
Cloud Computing Infrastructure: Practical Insights
 
The Changing Landscape
The Changing LandscapeThe Changing Landscape
The Changing Landscape
 
A Journey into the Cloud
A Journey into the CloudA Journey into the Cloud
A Journey into the Cloud
 
An Innovation Perspective
An Innovation PerspectiveAn Innovation Perspective
An Innovation Perspective
 
Time is an illusion, cloud time doubly so!
Time is an illusion, cloud time doubly so!Time is an illusion, cloud time doubly so!
Time is an illusion, cloud time doubly so!
 

Dernier

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Structuring Big Data

  • 1. Structuring big data Mark Wilson January 2012 #CloudCamp UNCLASSIFIED © Copyright 2012 Fujitsu Services Limited
  • 2. The problem with big data: and a solution The problem:  “New reference architectures will include both big data and enterprise data warehouses” [IDC, 19 January 2012]  Two worlds: structured and unstructured data (plus external data sources, documents stored in structured databases, etc.)  Siloes create issues with management, integration, etc. The solution:  Linked data – a single reference point for all data in the enterprise #CloudCamp 1 UNCLASSIFIED
  • 3. Some history Fixed structure  Difficult to change schema Simple reporting capabilities  Complex to create new reports #CloudCamp 2 UNCLASSIFIED
  • 4. Some history Completed transactions transferred to separate database for analysis  “Data warehouse” Better reporting, data mining, etc.  Still highly structured Data is historical  May be aggregated #CloudCamp 3 UNCLASSIFIED
  • 5. The smart guys Real-time update of completed transactions  Transactions moved to data warehouse upon completion  Smaller transactional database Allows for alerts to be generated when specific conditions met and action taken #CloudCamp 4 UNCLASSIFIED
  • 6. A third “data silo” Masses of unstructured/semi- structured data being processed in NoSQL databases May, or may not be transferred to/from structured databases  Time-consuming and inefficient Three types of data, each with their own limitations and own management considerations #CloudCamp 5 UNCLASSIFIED
  • 8. Linked Data Tie records together – even from separate data sets We can express as triples with a specific grammar: Build up a graph to show machine-readable data in human form #CloudCamp 7 UNCLASSIFIED
  • 9. Then add lots more data… Source: http://lod-cloud.net/  Each node is itself another graph (zoom in) #CloudCamp 8 UNCLASSIFIED
  • 10. Aren’t we missing a trick? Use linked data as a the optimal reference source  Broker of all data sources Single view on structured and unstructured data  Bring in external sources too Mapping, interconnecting, indexing and feeding  In real time Query linked data to derive new value from old  Infer relationships  Gain new insights #CloudCamp 9 UNCLASSIFIED
  • 11.
  • 12. About the author Mark Wilson, Strategy Manager, Fujitsu Mark is an analyst working within Fujitsu’s UK and Ireland Office of the CTO, providing thought leadership both internally and to customers, shaping business and technology strategy. He has 17 years' experience of working in the IT industry, 12 of which have been with Fujitsu. Mark has a background in leading large IT infrastructure projects with customers in the UK, mainland Europe and Australia. He has a degree in Computer Studies from the University of Glamorgan. Mark is also active in social media and won the Individual IT Professional (Male) award in the 2010 Computer Weekly IT Blog Awards. Mark may be found on Twitter @markwilsonit. If you would like to comment on the topics in this presentation, Mark would welcome your feedback, by email to mark.a.wilson@uk.fujitsu.com.

Notes de l'éditeur

  1. Everyone’s talking about big data but the bulk of the conversation seems to focus on a new level of business intelligence and an ever-increasing volume of data organised into OLTP, OLAP and NoSQLsiloes.  In this talk, Mark Wilson puts forward a view that the real value is not from the big data itself but how we can employ linked data concepts to integrate structured, unstructured and semistructured data sets – and then use this unified data source to derive new value.