SlideShare une entreprise Scribd logo
1  sur  23
Hadoop In Education: The advent of data-driven
applications



                  © 2010 Apollo Group – Confidential & Proprietary
Online Learning is in high demand

 Adults learn at the online University of Phoenix on their
  own schedules of available time and numbers who prefer
  that modality more than the ground (“traditional”)
  equivalent is on the rise.
 Online students and faculty do not have to be
  geographically co-located as in the traditional
  settings, allowing for richer and diverse interactions
  across geographical boundaries and time-differences.
 As people spend more time online, it is only natural to
  expect that the learners will want their education online as
  well.
 Recent press on huge enrolment in MOOCs (Massive
  Open Online Course) again proves that there is a great
  latent demand for online courses.
                     © 2010 Apollo Group – Confidential & Proprietary   2
What should online learning look like?




        © 2010 Apollo Group – Confidential & Proprietary   3
Online Education challenges

 Every learner is unique in aptitude, preparation, and
  motivation.
 A good teacher is continuously observing and
  intervening appropriately to keep the learners engaged
  and learning.
    –If we just take the traditional classroom online, all the
     visual and audio feedback are taken away from the
     trained teacher!




                      © 2010 Apollo Group – Confidential & Proprietary   4
Online Education Opportunities

 What if, instead,
    –We collect detailed interaction data-sets and
     converted them into actionable insights for the
     teacher so that (s)he can focus only where (s)he is
     needed and not exhaust her/himself by being the filter?
    –With algorithms we harness the best practices that
     are working for student and teacher and recommend
     them in appropriate contexts and take away
     unnecessary and inefficient guessing?
 Wait, would that not be Web 2.0 in Education?
With top-name universities, start-up companies, learning
platform or learning content companies … this innovation
race is already on!

                    © 2010 Apollo Group – Confidential & Proprietary   5
Data driven learner guidance

             Data Driven Apps:
             Assignments,
             discussions, Faculty
             Guidance,
             Recommendations
                   Faculty                     Processed Content Usage
Student Logs       Interaction                 Student/Faculty Interaction
Student Assessment Logs                        Logs

                        Data Driven Apps
                        for
                        Effectiveness/Reco
                        mmendation of
                        Content &                                      Instruction Designer
   Faculty
                        Assessments
                      © 2010 Apollo Group – Confidential & Proprietary                      6
System Architecture




               © 2010 Apollo Group – Confidential & Proprietary   7
New Learning System Architecture

                   Browser                                             Mobile
                    Client                                             Client



                                                                                      RESTful Services
Log data




            Curriculum                   Class
                                       Curriculum                               Quizzes &
                                                                                Curriculum




                                                                                                         Log data
           Curriculum
                                      Discussions                                Content




                   Data Collection and Log Processing Pipeline



                             © 2010 Apollo Group – Confidential & Proprietary                                       8
Considerations

   Enable logging              • Built GWT/JavaScript framework to
                                 automatically enable client side logging.
without much effort            • Automatically enable server-side logging
 from developers                 using servlet filters.


Common pipelines               • Used canonical log records with Avro as the
                                 serialization format.
for processing log             • Service specific information logged as JSON
       data                      and processed using Hive UDFs.


  Time-sync clients            • Server responds with its timestamp on
                                 every call.
    and server to              • Client includes this information in the next
simplify log ordering            call.


                        © 2010 Apollo Group – Confidential & Proprietary                9
Client/Server – Built for Log Collection


           View                                Controller                   Model




                       Event Bus
                                                                             API Calls


                                                                     Instrumentation
                                                                           Filter
                                        Log Data
                                                                        RESTful
Log and Data                  Canonical                                 Services
 Processing                    Log File
  Pipeline                      (Avro)


                  © 2010 Apollo Group – Confidential & Proprietary                       10
Connecting the Data and Processing Pipeline

                                                      S3                                    Log
Application &
      Server                                                                             Processing
     Server
Log Collection                                                                            Pipeline
   Servers        ~7 TB/Week/Class


                                                                              Oozie
                                                                             Workflows
                                                   HBase                                    Hive
                                                   Tables                                  Tables
                  Services,
                 Dashboards,                                                         ~700 GB /Week/Class
                  M/L Tools

                                                   RDBMS

                 Traditional
                   BI Tools

                          © 2010 Apollo Group – Confidential & Proprietary                             11
Considerations


User session       • User in a discussion forum in a browser
split across       • User receives grade notifications on
  multiple           mobile phone
                   • User views notification
  devices


Merging and        • Only partial ordering of events possible
                     without application specific information
ordering of        • Full ordering required to extract features
  events             from logs



               © 2010 Apollo Group – Confidential & Proprietary                12
Feature Extraction after Joins – Some challenges

                   View Question                            Get Question                                     Request Question
User Interaction




                                      Partial Event Order




                                                                                          Reordered Events
                   Select Answer                            Submit Answer                                    Get Question
                   View Hint                                Request Question                                 Display Question
                   Select Another                           Display Question                                 Select Answer
                   Answer                                   Select Answer                                    View Hint
                   Submit Answer                            View Hint                                        Select Answer
                   Receive Feedback                         Select Answer                                    Submit Answer
                                                            Submit Answer                                    Submit Answer
                                                            Question Feedback                                Question Feedback


    Exploring generic alignment algorithms that use declared application
     semantics



                                       © 2010 Apollo Group – Confidential & Proprietary                                          13
Data Driven
Applications




               © 2010 Apollo Group – Confidential & Proprietary   14
A Data Driven Application: The Faculty
                Dashboard for Action




      © 2010 Apollo Group – Confidential & Proprietary   15
A Data Driven Application: The Faculty
                Dashboard for Action




      © 2010 Apollo Group – Confidential & Proprietary   16
Story 1: How detailed logs help




 © 2010 Apollo Group – Confidential & Proprietary   17
Story 1: How detailed logs help




 © 2010 Apollo Group – Confidential & Proprietary   18
Story 2: The Carnegie Learning Math Tutor
 Enhanced Activities: Adaptive
  CL’s Cognitive Tutor provides adaptive online curriculum in high school and middle school math.
    – Interactive lessons
    – Practice problems
    – Response-sensitive feedback and support (e.g. hints, examples)
    – Intelligent guidance through curricular units, with detailed tracking of skill proficiency
    – Personalized preferences




                                   © 2010 Apollo Group – Confidential & Proprietary                 19
Example Features from Detailed Logs from the
                                        Math Tutor
 Baker, et.al: Towards Sensor Free Affect Detection in Cognitive Tutor
  Algebra, retrieved from http://users.wpi.edu/~rsbaker/publications.html
Frustration                                    Engaged Concentration
The percent of past actions on                 The minimum number of previous incorrect
the skills involved in the clip that           actions and help requests for any skill in the
were incorrect.                                clip.
Were there any actions in the clip             Among the skills involved in the clip, the
where the student made a wrong                 minimum value for previous incorrect actions
answer rather than requesting                  and help requests for that skill.
help when their probability of                 The duration (in seconds) of the fastest action
knowing the skill was under 0.7?               in the clip.
                                               The percentage of clip actions involving a hint
                                               followed by an error.




                                © 2010 Apollo Group – Confidential & Proprietary                 20
Why the features matter

From Stephen Fancsali, Variable Construction and Causal Discovery for
Cognitive Tutor Log Data: Initial Results, Educational Data Mining 2012




   Helps design “intervention” features in the data driven math product to help the
   learner
                             © 2010 Apollo Group – Confidential & Proprietary         21
Questions?




             © 2010 Apollo Group – Confidential & Proprietary   22
Sessions will resume at 4:30pm




                             Page 23

Contenu connexe

En vedette

Big Data in Education
Big Data in EducationBig Data in Education
Big Data in EducationAlfred Essa
 
Hadoop infrastructure for education
Hadoop infrastructure for educationHadoop infrastructure for education
Hadoop infrastructure for educationDarko Marjanovic
 
Online Educa Berlin conference: Big Data in Education - theory and practice
Online Educa Berlin conference: Big Data in Education - theory and practiceOnline Educa Berlin conference: Big Data in Education - theory and practice
Online Educa Berlin conference: Big Data in Education - theory and practiceMike Moore
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve TeachingRafael Scapin, Ph.D.
 
Big data and education 2015 leon
Big data and education 2015   leonBig data and education 2015   leon
Big data and education 2015 leoncruetic2015
 
Visions of the Future of Learning Analytics
Visions of the Future of Learning AnalyticsVisions of the Future of Learning Analytics
Visions of the Future of Learning AnalyticsDoug Clow
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
Presentation from the Release of the NMC/CoSN Horizon Report > 2016 K-12 Edit...
Presentation from the Release of the NMC/CoSN Horizon Report > 2016 K-12 Edit...Presentation from the Release of the NMC/CoSN Horizon Report > 2016 K-12 Edit...
Presentation from the Release of the NMC/CoSN Horizon Report > 2016 K-12 Edit...New Media Consortium
 
Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining MehrnooshV
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
uts-learning-futures-learning-analytics
uts-learning-futures-learning-analyticsuts-learning-futures-learning-analytics
uts-learning-futures-learning-analyticsSimon Buckingham Shum
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

En vedette (14)

Big Data in Education
Big Data in EducationBig Data in Education
Big Data in Education
 
Hadoop infrastructure for education
Hadoop infrastructure for educationHadoop infrastructure for education
Hadoop infrastructure for education
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Online Educa Berlin conference: Big Data in Education - theory and practice
Online Educa Berlin conference: Big Data in Education - theory and practiceOnline Educa Berlin conference: Big Data in Education - theory and practice
Online Educa Berlin conference: Big Data in Education - theory and practice
 
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
Learning Analytics in Education:  Using Student’s Big Data to Improve TeachingLearning Analytics in Education:  Using Student’s Big Data to Improve Teaching
Learning Analytics in Education: Using Student’s Big Data to Improve Teaching
 
Big data and education 2015 leon
Big data and education 2015   leonBig data and education 2015   leon
Big data and education 2015 leon
 
Visions of the Future of Learning Analytics
Visions of the Future of Learning AnalyticsVisions of the Future of Learning Analytics
Visions of the Future of Learning Analytics
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Presentation from the Release of the NMC/CoSN Horizon Report > 2016 K-12 Edit...
Presentation from the Release of the NMC/CoSN Horizon Report > 2016 K-12 Edit...Presentation from the Release of the NMC/CoSN Horizon Report > 2016 K-12 Edit...
Presentation from the Release of the NMC/CoSN Horizon Report > 2016 K-12 Edit...
 
Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining Advances in Learning Analytics and Educational Data Mining
Advances in Learning Analytics and Educational Data Mining
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
uts-learning-futures-learning-analytics
uts-learning-futures-learning-analyticsuts-learning-futures-learning-analytics
uts-learning-futures-learning-analytics
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similaire à Hadoop in Education

Slc technology101 boston-sep2012
Slc technology101 boston-sep2012Slc technology101 boston-sep2012
Slc technology101 boston-sep2012SLC is now inBloom!
 
IBM Pulse 2013 session - DevOps for Mobile Apps
IBM Pulse 2013 session - DevOps for Mobile AppsIBM Pulse 2013 session - DevOps for Mobile Apps
IBM Pulse 2013 session - DevOps for Mobile AppsSanjeev Sharma
 
Leadership Symposium on Digital Media in Healthcare
Leadership Symposium on Digital Media in HealthcareLeadership Symposium on Digital Media in Healthcare
Leadership Symposium on Digital Media in Healthcaresetstanford
 
What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...
What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...
What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...Robert H. McDonald
 
Content is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
Content is King - ECM in SharePoint 2010 - SharePoint Saturday DenverContent is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
Content is King - ECM in SharePoint 2010 - SharePoint Saturday DenverChris McNulty
 
Enterprise Sharepoint Portal
Enterprise Sharepoint PortalEnterprise Sharepoint Portal
Enterprise Sharepoint PortalCurtis Timmons
 
Kuali OLE: A Look at our Software Deliverables Roadmap One Year On
Kuali OLE: A Look at our Software Deliverables Roadmap One Year OnKuali OLE: A Look at our Software Deliverables Roadmap One Year On
Kuali OLE: A Look at our Software Deliverables Roadmap One Year OnRobert H. McDonald
 
Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Alex Hardisty
 
Sp2010 high availlability
Sp2010 high availlabilitySp2010 high availlability
Sp2010 high availlabilitySamuel Zürcher
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Chris McNulty
 
Customer Case - Oracle B2B Critical Mission Hub
Customer Case - Oracle B2B Critical Mission HubCustomer Case - Oracle B2B Critical Mission Hub
Customer Case - Oracle B2B Critical Mission HubBruno Alves
 
Pentaho Roadmap 2011
Pentaho Roadmap 2011Pentaho Roadmap 2011
Pentaho Roadmap 2011Datalytics
 
Latest Innovations in Database as a Service Enabled by Oracle Enterprise Manager
Latest Innovations in Database as a Service Enabled by Oracle Enterprise ManagerLatest Innovations in Database as a Service Enabled by Oracle Enterprise Manager
Latest Innovations in Database as a Service Enabled by Oracle Enterprise ManagerHari Srinivasan
 
Experience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale PaganoExperience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale PaganoBlue BRIDGE
 
02 Ms Online Identity Session 1
02 Ms Online Identity   Session 102 Ms Online Identity   Session 1
02 Ms Online Identity Session 1Sivadon Chaisiri
 
Oracle BI Big Data and Bics
Oracle BI Big Data and BicsOracle BI Big Data and Bics
Oracle BI Big Data and BicsDarren Grogan
 
Intro to Advanced PLM Capabilities in Aras Innovator
Intro to Advanced PLM Capabilities in Aras InnovatorIntro to Advanced PLM Capabilities in Aras Innovator
Intro to Advanced PLM Capabilities in Aras InnovatorAras
 
Building up cloud infrastructure
Building up cloud infrastructureBuilding up cloud infrastructure
Building up cloud infrastructureOlga Lavrentieva
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataRay Schwartz
 
ALM Integration in a Web 2.0 World
ALM Integration in a Web 2.0 WorldALM Integration in a Web 2.0 World
ALM Integration in a Web 2.0 Worldoslc
 

Similaire à Hadoop in Education (20)

Slc technology101 boston-sep2012
Slc technology101 boston-sep2012Slc technology101 boston-sep2012
Slc technology101 boston-sep2012
 
IBM Pulse 2013 session - DevOps for Mobile Apps
IBM Pulse 2013 session - DevOps for Mobile AppsIBM Pulse 2013 session - DevOps for Mobile Apps
IBM Pulse 2013 session - DevOps for Mobile Apps
 
Leadership Symposium on Digital Media in Healthcare
Leadership Symposium on Digital Media in HealthcareLeadership Symposium on Digital Media in Healthcare
Leadership Symposium on Digital Media in Healthcare
 
What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...
What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...
What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...
 
Content is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
Content is King - ECM in SharePoint 2010 - SharePoint Saturday DenverContent is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
Content is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
 
Enterprise Sharepoint Portal
Enterprise Sharepoint PortalEnterprise Sharepoint Portal
Enterprise Sharepoint Portal
 
Kuali OLE: A Look at our Software Deliverables Roadmap One Year On
Kuali OLE: A Look at our Software Deliverables Roadmap One Year OnKuali OLE: A Look at our Software Deliverables Roadmap One Year On
Kuali OLE: A Look at our Software Deliverables Roadmap One Year On
 
Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3
 
Sp2010 high availlability
Sp2010 high availlabilitySp2010 high availlability
Sp2010 high availlability
 
Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010Tagging Up - MMS and Taxonomy In SharePoint 2010
Tagging Up - MMS and Taxonomy In SharePoint 2010
 
Customer Case - Oracle B2B Critical Mission Hub
Customer Case - Oracle B2B Critical Mission HubCustomer Case - Oracle B2B Critical Mission Hub
Customer Case - Oracle B2B Critical Mission Hub
 
Pentaho Roadmap 2011
Pentaho Roadmap 2011Pentaho Roadmap 2011
Pentaho Roadmap 2011
 
Latest Innovations in Database as a Service Enabled by Oracle Enterprise Manager
Latest Innovations in Database as a Service Enabled by Oracle Enterprise ManagerLatest Innovations in Database as a Service Enabled by Oracle Enterprise Manager
Latest Innovations in Database as a Service Enabled by Oracle Enterprise Manager
 
Experience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale PaganoExperience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale Pagano
 
02 Ms Online Identity Session 1
02 Ms Online Identity   Session 102 Ms Online Identity   Session 1
02 Ms Online Identity Session 1
 
Oracle BI Big Data and Bics
Oracle BI Big Data and BicsOracle BI Big Data and Bics
Oracle BI Big Data and Bics
 
Intro to Advanced PLM Capabilities in Aras Innovator
Intro to Advanced PLM Capabilities in Aras InnovatorIntro to Advanced PLM Capabilities in Aras Innovator
Intro to Advanced PLM Capabilities in Aras Innovator
 
Building up cloud infrastructure
Building up cloud infrastructureBuilding up cloud infrastructure
Building up cloud infrastructure
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching Data
 
ALM Integration in a Web 2.0 World
ALM Integration in a Web 2.0 WorldALM Integration in a Web 2.0 World
ALM Integration in a Web 2.0 World
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Hadoop in Education

  • 1. Hadoop In Education: The advent of data-driven applications © 2010 Apollo Group – Confidential & Proprietary
  • 2. Online Learning is in high demand  Adults learn at the online University of Phoenix on their own schedules of available time and numbers who prefer that modality more than the ground (“traditional”) equivalent is on the rise.  Online students and faculty do not have to be geographically co-located as in the traditional settings, allowing for richer and diverse interactions across geographical boundaries and time-differences.  As people spend more time online, it is only natural to expect that the learners will want their education online as well.  Recent press on huge enrolment in MOOCs (Massive Open Online Course) again proves that there is a great latent demand for online courses. © 2010 Apollo Group – Confidential & Proprietary 2
  • 3. What should online learning look like? © 2010 Apollo Group – Confidential & Proprietary 3
  • 4. Online Education challenges  Every learner is unique in aptitude, preparation, and motivation.  A good teacher is continuously observing and intervening appropriately to keep the learners engaged and learning. –If we just take the traditional classroom online, all the visual and audio feedback are taken away from the trained teacher! © 2010 Apollo Group – Confidential & Proprietary 4
  • 5. Online Education Opportunities  What if, instead, –We collect detailed interaction data-sets and converted them into actionable insights for the teacher so that (s)he can focus only where (s)he is needed and not exhaust her/himself by being the filter? –With algorithms we harness the best practices that are working for student and teacher and recommend them in appropriate contexts and take away unnecessary and inefficient guessing?  Wait, would that not be Web 2.0 in Education? With top-name universities, start-up companies, learning platform or learning content companies … this innovation race is already on! © 2010 Apollo Group – Confidential & Proprietary 5
  • 6. Data driven learner guidance Data Driven Apps: Assignments, discussions, Faculty Guidance, Recommendations Faculty Processed Content Usage Student Logs Interaction Student/Faculty Interaction Student Assessment Logs Logs Data Driven Apps for Effectiveness/Reco mmendation of Content & Instruction Designer Faculty Assessments © 2010 Apollo Group – Confidential & Proprietary 6
  • 7. System Architecture © 2010 Apollo Group – Confidential & Proprietary 7
  • 8. New Learning System Architecture Browser Mobile Client Client RESTful Services Log data Curriculum Class Curriculum Quizzes & Curriculum Log data Curriculum Discussions Content Data Collection and Log Processing Pipeline © 2010 Apollo Group – Confidential & Proprietary 8
  • 9. Considerations Enable logging • Built GWT/JavaScript framework to automatically enable client side logging. without much effort • Automatically enable server-side logging from developers using servlet filters. Common pipelines • Used canonical log records with Avro as the serialization format. for processing log • Service specific information logged as JSON data and processed using Hive UDFs. Time-sync clients • Server responds with its timestamp on every call. and server to • Client includes this information in the next simplify log ordering call. © 2010 Apollo Group – Confidential & Proprietary 9
  • 10. Client/Server – Built for Log Collection View Controller Model Event Bus API Calls Instrumentation Filter Log Data RESTful Log and Data Canonical Services Processing Log File Pipeline (Avro) © 2010 Apollo Group – Confidential & Proprietary 10
  • 11. Connecting the Data and Processing Pipeline S3 Log Application & Server Processing Server Log Collection Pipeline Servers ~7 TB/Week/Class Oozie Workflows HBase Hive Tables Tables Services, Dashboards, ~700 GB /Week/Class M/L Tools RDBMS Traditional BI Tools © 2010 Apollo Group – Confidential & Proprietary 11
  • 12. Considerations User session • User in a discussion forum in a browser split across • User receives grade notifications on multiple mobile phone • User views notification devices Merging and • Only partial ordering of events possible without application specific information ordering of • Full ordering required to extract features events from logs © 2010 Apollo Group – Confidential & Proprietary 12
  • 13. Feature Extraction after Joins – Some challenges View Question Get Question Request Question User Interaction Partial Event Order Reordered Events Select Answer Submit Answer Get Question View Hint Request Question Display Question Select Another Display Question Select Answer Answer Select Answer View Hint Submit Answer View Hint Select Answer Receive Feedback Select Answer Submit Answer Submit Answer Submit Answer Question Feedback Question Feedback  Exploring generic alignment algorithms that use declared application semantics © 2010 Apollo Group – Confidential & Proprietary 13
  • 14. Data Driven Applications © 2010 Apollo Group – Confidential & Proprietary 14
  • 15. A Data Driven Application: The Faculty Dashboard for Action © 2010 Apollo Group – Confidential & Proprietary 15
  • 16. A Data Driven Application: The Faculty Dashboard for Action © 2010 Apollo Group – Confidential & Proprietary 16
  • 17. Story 1: How detailed logs help © 2010 Apollo Group – Confidential & Proprietary 17
  • 18. Story 1: How detailed logs help © 2010 Apollo Group – Confidential & Proprietary 18
  • 19. Story 2: The Carnegie Learning Math Tutor  Enhanced Activities: Adaptive CL’s Cognitive Tutor provides adaptive online curriculum in high school and middle school math. – Interactive lessons – Practice problems – Response-sensitive feedback and support (e.g. hints, examples) – Intelligent guidance through curricular units, with detailed tracking of skill proficiency – Personalized preferences © 2010 Apollo Group – Confidential & Proprietary 19
  • 20. Example Features from Detailed Logs from the Math Tutor  Baker, et.al: Towards Sensor Free Affect Detection in Cognitive Tutor Algebra, retrieved from http://users.wpi.edu/~rsbaker/publications.html Frustration Engaged Concentration The percent of past actions on The minimum number of previous incorrect the skills involved in the clip that actions and help requests for any skill in the were incorrect. clip. Were there any actions in the clip Among the skills involved in the clip, the where the student made a wrong minimum value for previous incorrect actions answer rather than requesting and help requests for that skill. help when their probability of The duration (in seconds) of the fastest action knowing the skill was under 0.7? in the clip. The percentage of clip actions involving a hint followed by an error. © 2010 Apollo Group – Confidential & Proprietary 20
  • 21. Why the features matter From Stephen Fancsali, Variable Construction and Causal Discovery for Cognitive Tutor Log Data: Initial Results, Educational Data Mining 2012 Helps design “intervention” features in the data driven math product to help the learner © 2010 Apollo Group – Confidential & Proprietary 21
  • 22. Questions? © 2010 Apollo Group – Confidential & Proprietary 22
  • 23. Sessions will resume at 4:30pm Page 23