Schedule

Schedule#

Update

Course website and materials are currently being revised in preparation for Fall 2025.

Note: To access the slides you may need to enable Box for your NetID.

Module

Lecture #

Topic

Slides

Quizzes

Course Overview

pdf

Orientation Quiz

1

1

Introduction to Data Curation

Module 1 Quiz

1.1

What is Data Science?

pdf

1.2

What is Data Curation?

pdf

1.3

Objectives, Activities and Methods

pdf

1.4

Organizations, Conferences and Literature

pdf

1.5

Data Curation Perspectives

pdf

1.6

Trends in Data Curation

pdf

2

2

Data Abstractions - Relations

Module 2 Quiz

2.1

Data Models

pdf

2.2

The Problem

pdf

2.3

The Relational Model

pdf

2.4

How is the Relational Model Implemented?

pdf

2.5

Abstraction, Indirection & Data Independence

pdf

2.6

Relational Model and Curation Activities

pdf

2.7

Tidy Data

pdf

3

3

Data Abstractions - Trees

Module 3 Quiz

3.1

Text and Documents

pdf

3.2

The Problem

pdf

3.3

The Solution: (1) Descriptive Markup

pdf

3.4

The Solution: (2) Trees

pdf

3.5

Why The Solution Works

pdf

3.6

Implementing The Solution with XML and JSON

pdf

4

4

Data Abstractions - Ontologies

Module 4 Quiz

4.1

The Problem: Connecting Data to Information

pdf

4.2

The Solution: Ontologies

pdf

4.3

An ER/Ontology Example: FRBR

pdf

4.4

Implementing Ontologies in RDF/RDFS

pdf

4.5

Practical Ontologies with JSON-LD

pdf

5

5

Data Integration

Module 5 Quiz

5.1

Data Cleaning, Data Integration

pdf

5.2

Managing Heterogeneity

pdf

5.3

Schema Integration

pdf

5.4

Schema Integration: an example

pdf

5.5

Example: The Curated Data Lake

pdf

6

6

Data Concepts

Module 6 Quiz

6.1

What is data? A first attempt

pdf

6.2

The Identity Problem

pdf

6.3

Some Ontological Analysis

pdf

6.4

A Way Forward: Roles and Types

pdf

6.5

An Ontology for Data Concepts

pdf

6.6

What is data?

pdf

7

7

Metadata

Module 7 Quiz

7.1

What is Metadata?

pdf

7.2

Metadata Schemas

pdf

7.3

Common Metadata Ambiguities

pdf

7.4

How Does Metadata Support Data Curation?

pdf

7.5

Metadata in Practice

pdf

8

8

Identity

Module 8 Quiz

8.1

Why is Identification Important?

pdf

8.2

What Are We Identifying?

pdf

8.3

How Do We Identify?

pdf

8.4

Canonicalization

pdf

8.5

Identifiers and Identifier Systems

pdf

9

9

Preservation

Module 9 Quiz

9.1

Introduction to Data Preservation Challenges

pdf

9.2

What is Data Preservation?

pdf

9.3

The Preservation Integration Parallels

pdf

9.4

Standard Data Preservation Strategies

pdf

9.5

Two Data Preservation Standards

pdf

10

10

Standards

Module 10 Quiz

10.1

Standards and Standards Organizations

pdf

10.2

Some Standard Standards Maneuvers

pdf

10.3

Compatibility

pdf

10.4

Standards Organizations

pdf

11

11

Workflow, Provenance and Reproducibility

Module 11 Quiz

11.1

Workflow

pdf

11.2

Provenance

pdf

11.3

Workflow Systems

pdf

11.5

Provenance Standards

pdf extended

11.6

Computational Reproducibility

pdf

12

12

Ethics, Law, Governance, and Policy

Module 12 Quiz

12.1

Definitions, types, scope, issues

pdf

12.2

Research and Data Ethics

pdf

12.3

Privacy Laws

pdf

12.4

De-Identification Methods

pdf

12.5

Intellectual Property Laws

pdf

12.7

AI Law and Data Curation

pdf

13

13

Data Practices

Module 13 Quiz

13.1

Data Practices

pdf

13.2

What’s Going on in the Lab?

pdf

13.3

Data Sharing

pdf

13.4

Data Reuse

pdf

13.5

Trends in Data Curation Research

pdf

14

-

Fall Break

15

15

Communication

Module 15 Quiz

15.1

Communication and Data Curation

pdf

15.2

Information Overload

pdf

15.3

Limited Access to Research

pdf

15.4

Research Integrity

pdf

15.5

Beyond the PDF

pdf

16

16

Course Review

pdf