This document discusses opening government data and realizing benefits from open data. It provides tips for ensuring benefits, such as adopting a user-focused approach, publishing high-value data, and automating data wherever possible. Additional sections discuss preparing data for opening by cleaning sheets and standardizing formats, automating reporting and updates, and measuring success through data-driven decision making. The goal is to improve services, policy outcomes, and opportunities for collaboration through open data.
call girls in Mukherjee Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝...
Making your data lovely with open data benefits and best practices
1. 1
Making your data lovely!
Prioritising, cleaning, extraction, transformation, automation
Pia Waugh
Director of Gov 2.0 and Data
Department of Finance
Soon to be Prime Minister & Cabinet
2. 22
Key Benefits to the Public Service in Opening Data
• Efficiencies from proactively publishing common requests
• Cheaper and more modular services delivery
• Reduced regulatory burden through machine readable
data supporting compliance and automated reporting
• Better policy outcomes by leveraging cross-agency data
• More consistency & less duplication across government
• Improved opportunities to leverage innovation and
collaboration (citizens, industry, other depts)
• Opportunities to improve data quality through
verifiable public contributions
3. 33
Tips for ensuring benefits realisation of open data
• Adopt an approach of “data user and developer empathy”
• Data publishing built into your BAU
• Initial focus on data that supports you build capability
• Consume your own data APIs (apps, datavis, BI, etc)
• Ensure you consider:
• Quality – no one can use bad data, but perfect is enemy of the good
• Currency – is it up to date? How often is it updated?
• APIs – is it programmatically available?
• Publishing – have you provided supporting materials (taxonomies)?
• Discoverability – is it hosted or linked on data.gov.au?
• Reusability – have you tested it with data users?
• Licensing – Creative Commons By Attribution the default
• Automation wherever possible!
4. 44
Data on the inside
• Do you know what data you have internally?
• Are you considering all data types?
• How embedded is data driven decision making?
• How can you upskill the whole organisation?
• Do you know what your external data needs are?
• How are you measuring and monitoring success?
Data infrastructure to support your organisation
should be extendable to support sharing/publishing
5. 55
Rub a dub data
• If a machine can’t read it, a machine can’t make an API
• Some data has specialised data formats, some commonalities
• Tabular, spatial, real time, unstructured, etc
• Most data comes from somewhere, use the source Luke!
• Machines and humans have different needs
6. 66
What you need is clean sheets
• Don’t merge cells. Sorting and other manipulations people may want to apply to your data assume
that each cell belongs to one row and column.
• Don’t mix data and metadata (e.g. date of release, name of author) in the same sheet.
• The first row of a data sheet should contain column headers. None of these headers should be
duplicates or blank. The column header should clearly indicate which units are used in that column,
where this makes sense.
• The remaining rows should contain data, one datum per row. Don’t include aggregate statistics such
as TOTAL or AVERAGE. You can put aggregate statistics in a separate sheet, if they are important.
• Numbers in cells should just be numbers. Don’t put commas in them, or stars after them, or
anything else. If you need to add an annotation to some rows, use a separate column.
• Use standard identifiers: e.g. identify countries using ISO 3166 codes rather than names.
• Don’t use only colour or other stylistic cues to encode information. If you want to colour cells
according to their value, use conditional formatting.
• Leave the cell blank if a value is not available.
• If you provide pivot tables, make sure the underlying data is available separately too.
• If you also want to create a human-friendly presentation of the data, do so by creating another sheet
in the same workbook and referencing the appropriate cells in the canonical data sheet
http://www.clean-sheet.org/
8. 88
Automating updates
Automation involves system to system updates to save you time & money.
Three broad approaches:
1. Write scripts to push or pull data updates using an API directly from
the source. Usually doesn’t require much data manipulation.
2. Adopt a tool like Taverna, FME or Splunk to extract, clean/manipulate,
and then push data to the data.gov.au (CKAN/geoserver) API directly.
3. Use the data.gov.au (CKAN) to schedule pull updates from your data,
but most agencies don’t do that as they prefer to push updates.
The data.gov.au team strongly encourage you to gain at least one geek in you
data team so you can experiment with code and tools to best meet your needs.
“With much help and encouragement from the support team at data.gov.au, we dipped our toes into the CKAN API waters. As a
DotNet shop we were keen to limit the technology landscape and sought to automate the upload using DotNet. The CKAN API is
refreshingly lightweight with a simple authentication process and messaging.” -- ABN Lookup Team
Code at https://github.com/datagovau/ckan-api-examples
9. 99
Support
• http://toolkit.data.gov.au is updated regularly. Recent updates include:
• How to automate data updates to data.gov.au with FME
• Improved information on how to clean data
• How to manage your own catalogue harvesting
• Government data landscape to identify projects of use
• Open Data Community Forum – soon to be moved to analyticsspace
• Talk to your colleagues across government(s)
• Other sources
• Communities of interest: Data Science Meetup groups, Data
Analytics Centre of Excellence, Linked Data Working Group,
National Statistical Service, etc
• GovHack Developers Kit: Become a data scientist in an hour, data
tools, APIs, datavis, spatial, mashup techniques, statistical
10. 1010
Quality – improve over time
The 5 Star Data Quality standard developed by
Sir Tim Berners-Lee will be used on data.gov.au in
the coming month or two to indicate data quality.
Aim for quality web services.
API quality will also be looked at
soon, including potentially
a 5 star API standard.
http://5stardata.info/en/
11. 1111
Data integration and aggregation
• Challenging but great potential for improved policy/services.
• Unit record sharing is complex, privacy concerns for personal data.
• Personal unit record data is mostly useful to researchers, appropriate
mechanisms with legal, technical, ethical constraints to access such data.
• Data aggregated by common spatial boundaries is comparative across
datasets and over time.
• Unfortunately, data owners traditionally aggregate to boundaries that
constantly change (electorates, postcodes, etc).
• The Australian Statistical Geography Standard (ASGS) provides a
consistent set of spatial boundaries that can be mapped to other needs.
• Anonymisation on the fly APIs also provide mechanism for appropriate
public/agency access to unit record level data (e.g. ABS.Stat)
http://statistical-data-integration.govspace.gov.au/
https://toolkit.data.gov.au/index.php?title=Definitions#Types_of_data
12. 1212
data.gov.au
Free, cloud, scalable API enabled platform for hosting government data.
Staged approach
1. Publishing (2013 – mid 2014)
Improving the functionality and ease of
publishing for agencies with training and
documentation
2. Value realisation (2014-2015)
Providing useful front end tools for data.gov.au
including data visualisation and analysis tools.
Publishing quality data a pre-requisite.
3. Data quality (2014-2015)
Looking at ways to provide agencies the ability
to accept iterative data improvements in a
verifiable way
Features
• Support for tabular, spatial and data models
• Options for hosting, linking or catalogue harvesting
• Manual and automated publishing options
• API access to government data
• Easy to publish, download & interact
• Use cases and site|data|org analytics
• Data Request Site
• Metadata harvesting from gov data gateways
• National Map integration
• Federated search for discoverability
In Planning
• 5 star quality plugin
• Selective crowdsourcing for updates
• League Table
14. 1414
Some Case Studies
• Publishing Budget 2014 Data Report
• Open data – Transforming the Provider / Stakeholder Paradigm
• On the Value of Open Roof Prints
• 100 years of patent and IP data released on data.gov.au
More available along with tech support at http://toolkit.data.gov.au
Other Australian case studies/documentation
• SA Open Data Toolkit
• QLD Government Case Studies
• Victorian Government Showcase
• NSW Apps Showcase
• ACT examples
15. 1515
The future is here....
And it is already widely distributed
http://www.flickr.com/photos/mr_matt/35688926
22/
Challenge #1: Collaborate
Challenge #2: Share
Challenge #3: Measure
Challenge #4: Play
Questions?
@piawaugh
@datagovau
data.gov.au
toolkit.data.gov.au