Schedule

Schedule#

Coursera is the primary source of schedule information for this class.

Note: To access the slides you may need to enable Box for your NetID.

Module

Topic

Slides

Course Overview

pdf

1

Introduction to Data Curation

Introduction to Data Curation

pdf

Data Curation Universe

pdf

Data Lifecycle Models

pdf

The Curated Data Lake

pdf

Curation Profile: GHCN

pdf

Curation Profile: Common Crawl

pdf

2

Ethics, Law, and Policy

Introduction to Ethics, Laws, and Policies

pdf

Research and Data Ethics

pdf

Privacy Laws

pdf

De-Identification Methods

pdf

Intellectual Property Laws

pdf

AI Law and Data Curation

pdf

Curation Profile: Census ACS

pdf

3

Data Abstractions - Relations

Data Models

pdf

The Problem

pdf

The Relational Model

pdf

How is the Relational Model Implemented?

pdf

Abstraction, Indirection & Data Independence

pdf

Relational Model and Curation Activities

pdf

Tidy Data

pdf

4

Data Abstractions - Trees

Text and Documents

pdf

The Problem

pdf

The Solution: (1) Descriptive Markup

pdf

The Solution: (2) Trees

pdf

Why The Solution Works

pdf

Implementing The Solution with XML and JSON

pdf

5

Data Abstractions - Ontologies

The Problem: Connecting Data to Information

pdf

The Solution: Ontologies

pdf

An ER/Ontology Example: FRBR

pdf

Implementing Ontologies in RDF/RDFS

pdf

Practical Ontologies with JSON-LD

pdf

6

Data Integration

Data Cleaning, Data Integration

pdf

Managing Heterogeneity

pdf

Schema Integration

pdf

Schema Integration: an example

pdf

Example: The Curated Data Lake

pdf

7

Data Concepts

What is data? A first attempt

pdf

The Identity Problem

pdf

Some Ontological Analysis

pdf

A Way Forward: Roles and Types

pdf

An Ontology for Data Concepts

pdf

What is data?

pdf

8

Metadata

What is Metadata?

pdf

Metadata Schemas

pdf

Common Metadata Ambiguities

pdf

How Does Metadata Support Data Curation?

pdf

Metadata in Practice

pdf

9

Identity

Why is Identification Important?

pdf

What Are We Identifying?

pdf

How Do We Identify?

pdf

Canonicalization

pdf

Identifiers and Identifier Systems

pdf

10

Preservation

Introduction to Data Preservation Challenges

pdf

What is Data Preservation?

pdf

The Preservation Integration Parallels

pdf

Standard Data Preservation Strategies

pdf

Two Data Preservation Standards

pdf

11

Standards

Standards and Standards Organizations

pdf

Some Standard Standards Maneuvers

pdf

Compatibility

pdf

Standards Organizations

pdf

12

Workflow, Provenance and Reproducibility

Workflow

pdf

Provenance

pdf

Workflow Systems

pdf

Provenance Standards

pdf extended

Computational Reproducibility

pdf

13

Data Practices

Data Practices

pdf

What’s Going on in the Lab?

pdf

Data Sharing

pdf

Data Reuse

pdf

Trends in Data Curation Research

pdf

14

Fall Break

15

Communication

Communication and Data Curation

pdf

Information Overload

pdf

Limited Access to Research

pdf

Research Integrity

pdf

Beyond the PDF

pdf

16

Course Review

pdf