The keynote presented @ DMS2016 has explored following critical issues:
how successfully extract, prepare and manage consistent data and multimedia content from distributed multimedia systems,
how they will implement next generation data discovery techniques on data and multimedia content,
how modern data science is evolving to deliver more agile and high value analytics to all users on multimedia information.
17. Bottom Line:
Usage of data should be faster, cost less with minimum data
movement requirements
• materialize reality and language in a
consistent database
• couple language and reality using
keyback features
• Bind external algorithm using Open
(Standard?) User Exits
• foster holistic views of data through
Grid Data Unification
19. rowId Nname Ncity
1 1 1
2 2 2
3 3 3
4 2 2
Key Value NValue
Name Aldo 1
Name Sara 2
Name Anna 3
City Miami 1
… … …
DateBirth UDateB Age
11/1/90 1/11/90 26
12/2/89 2/12/89 26
1.1.68 1/1/68 48
31-1-61 1/31/61 56
Ncity city state
1 Miami Fl
2 NYC NY
3 Rome Italy
Map DictionaryLuggage
hierarchy
Data complex Storage group
name city DateBirth
Aldo Miami 11/1/90
Sara NYC 12/2/89
Anna Rome 1.1.68
Sara NYC 31-1-61
Data source
Fractal conversion
Transform
DateBirth
Add Geo
classification
ADC is a fractal like algorithm that converts an input raw data and related data processing into a set of
chained binary blocks, formulas and long pointers.
We show that ADC represents an important set of computations… The advantages of ADC are that:
it is described by a small number of parameters and has a priori known sizes of the views , the views can be generated
independently, the overhead of combining the generated views is predictable, the data set can be partitioned into a
number of independently generated subsets, the elements of the data set are pseudo random
These properties make ADC a strong candidate for a data intensive grid benchmark < M. Frumkin NASA NAS Division >
22. MATERIAL TESTING
• Complex Json, Oracle, csv, wmv data
• Manual data processing executed using
Mathlab
• Hours of Scientist work to detect outlier
• Impossibility to replicate tests with same
results
• Scarce know how capitalization
• Blend of data happens at Narrative
writing time