Syllabus#
Update
Course website and materials are currently being revised in preparation for Fall 2025.
Course Description#
Data curation, the management of data in support of analysis and use, is a critical activity within data science. Without adequate data curation data cannot be used effectively, efficiently, or reliably. Because of its importance and its challenging nature, data curation typically involves more time, jttention, staff, and expenditures, than does data analytics. Activities of particular concern to data curation are data modeling, integration, workflow management, format conversion, provenance documentation, preservation, integrity and validity determination, data cleaning, standards conformance, identifier management, metadata management, retrieval support, re-use issues, governance, regulatory compliance, and security, among others. This course provides a survey of theoretical and practical topics in data curation.
Textbook#
There is no required textbook for this course. Weekly readings supplement lecture videos and can be found on the overview page for each week. Concepts covered in required readings are included in quizzes and the final exam.
Time commitment#
This 4-credit hour course is 16 weeks long. You should invest 10-12 hours every week in this course.
Course Reviews (Previous offering)#
Fall 2022: Anonymous Student Reviews
Expected Prior Knowledge#
Completion of an introductory databases course such as Database Systems (CS411) or related experience:
Database basics: database design, schemas, constraints, queries
Conceptual modeling, entity relationship diagrams
Ability to create and manipulate commonly used data formats including JSON and XML.
Basic Python programming skills including familiarity with Pandas, use of Python libraries, and web-based APIs.
Course Expectations#
Unlike other courses in data science programs, introductor data curation courses are not primarily concerned with learning analytic techniques and instead focus on many other aspects of the data lifecycle.
Course Objectives#
Upon successful completion of this course, you will:
Recognize the importance of physical and logical data independence as critical system requirements and be able to identify data curation strategies that use abstraction and indirection to achieve those requirements.
Recognize how key data abstraction strategies are related to each other both at the same level of abstraction and also across levels of abstraction.
Understand the role of conceptual modeling, ontologies, and knowledge graphs in data modeling.
Learn a framework of fundamental concepts of data representation.
Recognize the common challenges in data integration, and how they can be addressed.
Recognize the importance of workflow management and provenance documentation in ensuring efficiency, trustworthiness, and reproducibility.
Understand the trade-offs for common data preservation strategies.
Recognize the complex problems with data set identity at different levels of abstraction and be familiar with challenges in determining both data identity and data representation identity.
Understand the cross-cutting importance of metadata in data curation activities.
Recognize the different kinds and levels of data standards and the different levels of both data conformance and processor conformance.
Recognize the relevance of empirical research on the information behavior of data creators, managers, analysts, and users.
Recognize the distinctive data curation needs of machine learning.
Recognize how machine learning can help address curatorial problems.
Be familiar with the of significance of governance, policy, law, and ethics in data science.
Elements of this Course#
The course is comprised of the following elements:
Lecture Videos. For each module, the concepts you need to know will be presented through a collection of short video lectures. You may stream these videos for playback within the browser by clicking on their titles or download the videos to watch locally. The videos usually total 1.5 to 2 hours for each module. You generally should spend at least the same amount of time digesting content in the video.
Readings. Readings are included in each module to supplement and reinforce lecture materials. Concepts from required readings are included in module quizzes. Supplemental readings are optional and provided for students who want to explore a particular are in more depth.
Module Quizzes. Each module concludes with a graded quiz to help ensure you understood that module’s content. You will be allowed only three attempts for each quiz.
Assignments. There are four assignments for you to complete in this course. Assignments are delivered via the PrairieLearn system.
Final Exam. The course concludes with a comprehensive final exam.
Participation. Student participation will be based on engagement via Campuswire and office hours.
Extra Credit. At least two extra credit opportunities will be available worth approximately 3% of the final grade.
Grading#
Your final grade will be calculated based on the activities listed below. Your official final course grade will be listed in Enterprise. The course grade you see displayed in Coursera may not match your official final course grade.
Quizzes (20%): The lowest grade will be dropped out of 14 weekly quizzes
Assignments (15% each): There will be 4 assignments worth a total of 60% of the final grade
Final exam (20%): There will be a cummulative final exam
Grade distribution (square brackets and parentheses are used to indicate inclusive and exclusive endpoints). Maximum grade is 105% given 2% EC for survey and 3% for CW reputation level.
A+ [97-105]
A [93-97)
A- [90-93)
B+ [87-90)
B [83-87)
B- [80-83)
C+ [77-80)
C [73-77)
C- [70-73)
D+ [67-70)
D+ [63-67)
D- [60-63)
F [0-60)
Assigment Deadlines#
For all assignment deadlines, please refer to the Course Assignment Deadlines, Late Policy, and Academic Calendar page in Coursera.
Late Policy#
Unless otherwise specified, all assignments are due at 11:59 PM US Central Time on the due date. We encourage everyone to finish the assignments before the deadline.
[Update 9/13/2024] Beginning Fall 2024, auto-graded assignments will receive an automatic 5% penalty for up to 10 days after the due date. After 10 days no points will be awarded. Note that this does not apply to quizzes. (See Campuswire #237).
If you encounter special circumstances that prevent you from completing assignments on time, please reach out to course staff via Campuswire. Note that you may be required to submit an Absence Letter.
No assignments will be accepted after the last day of class.
For course extensions or incompletes, please email mcs-support@illinois.edu
Campuswire Policy#
Do not post complete solutions publicly. While the intent is to get hlep, posting solutions may encourage cheating.
Submit to Instructors and TAs for detailed feedback on your solution to a problem. Course staff will provide private feedback.
If content posted to Campuswire is too detailed, course staff reserve the right to remove content or change the post to private at any time.
Student Code and Policies#
A student at the University of Illinois at the Urbana‑Champaign campus is a member of a University community of which all members have at least the rights and responsibilities common to all citizens, free from institutional censorship; affiliation with the University as a student does not diminish the rights or responsibilities held by a student or any other community member as a citizen of larger communities of the state, the nation, and the world. See the University of Illinois Student Code for more information.
Academic Integrity#
All students are expected to abide by campus regulations on academic integrity found in the Student Code of Conduct. These standards will be enforced and infractions of these rules will not be tolerated in this course. Sharing, copying, or providing any part of a homework solution or code is an infraction of the University’s rules on academic integrity. We will be actively looking for violations of this policy in homework and project submissions. Any violation will be punished as severely as possible with sanctions and penalties typically ranging from a failing grade on this assignment up to a failing grade in the course, including a letter of the offending infraction kept in the student’s permanent university record. Again, a good rule of thumb: Keep every typed word and piece of code your own. If you think you are operating in a gray area, you probably are. If you would like clarification on specifics, please contact the course staff.
Disability Accommodations#
Students with learning, physical, or other disabilities requiring assistance should contact the instructor as soon as possible. If you’re unsure if this applies to you or think it may, please contact the instructor and Disability Resources and Educational Services(DRES) as soon as possible. You can contact DRES at 1207 S. Oak Street, Champaign, via phone at (217) 333-1970, or via email at disability@illinois.edu
Mental Health#
Significant stress, mood changes, excessive worry, substance/alcohol misuse or interferences in eating or sleep can have an impact on academic performance, social development, and emotional wellbeing. The University of Illinois offers a variety of confidential services including individual and group counseling, crisis intervention, psychiatric services, and specialized screenings which are covered through the Student Health Fee. If you or someone you know experiences any of the above mental health concerns, it is strongly encouraged to contact or visit any of the University’s resources provided below. Getting help is a smart and courageous thing to do for yourself and for those who care about you.
Counseling Center (217) 333-3704
McKinley Health Center (217) 333-2700
National Suicide Prevention Lifeline (800) 273-8255
Rosecrance Crisis Line (217) 359-4141 (available 24/7, 365 days a year) If you are in immediate danger call 911.
Acknolwedgements#
This course includes material adapted from work by Carole Palmer, Melissa Cragin, David Dubin, Karen Wickett, Bertram Ludaescher, Ruth Duerr, Simone Sacchi.