Choosing a Narrative and Dataset for a Lesson#

Writing your lesson as a story helps learners stay motivated and engaged, and can prevent you from making leaps from one topic to the next without covering a step that learners will find to be important later. The narrative you create can also help learners more easily connect how the skills they are learning now could be useful after the workshop. You can enable learners to make connections between what they learn in your lesson and their own work, by creating a narrative that resembles a situation the learners might encounter there.

For a lot of lessons developed in The Carpentries community, the narrative is closely tied to the example data used in the lesson. A good example dataset makes it easier to teach the relevant skills, helps learners manage their cognitive load by focusing on what is most important. Just like the narrative, finding the right dataset involves striking a balance between authenticity and clarity.

Top Tips: Lesson Narrative#

When choosing a narrative, consider:
- Complexity: how easy will it be for learners to follow the narrative of the lesson?
- Order: when should you introduce each of the key concepts to manage cognitive load most effectively? Could you sacrifice some realism to keep things simple at the beginning? How can you position the most important things as early as possible in the lesson?
- Authenticity: will learners be able to relate to the story being told and the examples being used?
- Efficiency: can you avoid or remove tangents and “side quests” that might distract learners (and instructors) from the lesson’s objectives?
Devote time at the beginning of the lesson for a short introduction to the narrative. An effective introduction should help learners understand how the story relates to their work, the problems they encounter, and the things they want to do with the skills they will learn.
Make use of images and figures to enhance your narrative. Consider the licensing and terms of reuse on those images, and respect the intellectual property rights of image creators. Images can be distracting when they appear in the wrong place or are irrelevant or contradictory to the lesson content they accompany.
A single narrative throughout the lesson is often preferable but can be difficult to achieve. Use multiple, mini narratives for individual or groups of episodes if you need to. Where you wish for learners to develop understanding of more abstract concepts, provide a variety of examples with a common theme. This could be achieved by providing multiple opportunities to recognise and apply the concept in taught examples, activities, and exercises.
Look for example data that will support your chosen narrative (see below).

Examples of Lesson Narratives#

Software Carpentry Git lesson uses the story of Alfredo, a chef working with his team to create a repository of his favorite recipes.
Data Carpentry for Ecologists uses a narrative of working through a data analysis project from data organization to data cleaning to data manipulations and visualizations.
Building Better Research Software uses the narrative of a poorly designed software project (which analyses NASA’s open data on spacewalks undertaken by astronauts from 1965 to 2013) that over the course of the lesson gets improved in terms of code accessibility, readability, correctness and reusability.

Finding Images#

Copying an image from a website is technologically simple but can be legally and ethically complex. Images are intellectual property and are subject to intellectual property laws including, but not limited to, copyright and trademark laws. These laws differ by country but are consistent in theme: do not use intellectual property that does not belong to you without permission.

When looking for images that illustrate the narrative of your lesson, avoid copying images that do not include a reuse license. Assume that you cannot reuse these images unless you seek written permission from the image creator or owner. Instead, look for images that indicate that they are in the public domain or carry a permissive reuse license such as CC0 or CC-BY. Public domain images can be freely reused and adapted. Images carrying a reuse license can be used and adapted in accordance with their license terms.

If you cannot find reusable images that match your narrative, you can create your own images or seek help from others in the Carpentries community. When incorporating original images into your lesson, be sure to license these images to be compatible with the license on the rest of your lesson materials.

The guidance in this section is not a substitute for legal advice.

Compatible Licenses#

Creative Commons offers a chart that identifies which CC licenses are compatible with each other for adaptation (“remix”) purposes: Creative Commons license comparison chart.

GNU offers commentary about a variety of licenses for free software; this resource may be valuable when considering a license for code: GNU: Various Licenses and Comments about Them.

Catalogues of Open Images#

The following repositories are good places to start looking for openly-licensed images to use in your lesson.

Top Tips: Example Datasets

Examples of Example Datasets

The Ecology Data Carpentry curriculum’s dataset comes from the Portal Project Teaching Database. This dataset is an actual ecological research project’s data that was simplified for teaching. The reuse of this dataset throughout the Data Carpentry Ecology lessons helps stitch together the process of data analysis throughout the workshop, from data entry and cleaning to analysis and visualisation.
The Social Sciences Data Carpentry curriculum’s dataset is the teaching version of the full Studying African Farmer-Led Irrigation (SAFI) dataset. The SAFI dataset represents interviews of farmers in Mozambique and Tanzania, conducted between November 2016 and June 2017. The interviews surveyed household features (e.g. construction materials used for dwellings, number of household members), agricultural practices (e.g. water usage) and assets (e.g. number and types of livestock). The teaching version of the SAFI dataset has been simplified and intentionally “messed up”” to enable demonstrating common data cleaning issues often found in real-life data.
Patient inflammation dataset - from the Software Carpentry Python novice and the incubating intermediate lessons - is used to study the effect of a new treatment for arthritis by analysing the inflammation levels in patients who have been given this treatment.
A river catchment dataset from the Lowland Catchment Research (LOCAR) Datasets is used in the Earth and Environmental Sciences Intermediate Python lesson to analyse hydrological, hydrogeological, geomorphological and ecological interactions within permeable catchment systems.
Data Carpentry’s Astronomical Data Science with Python lesson uses two astronomical datasets, from the Gaia satellite and the Pan-STARRS photometric survey, to reproduce part of an analysis described in a published article.

Public Repositories for Data

The following repositories are good places to start looking for example data to use in your lesson, and/or to deposit the example data you produce.

GitHub is not a good place to store data, especially when it is large and/or does not consist of text files. Instead, we recommend that you publish your example data elsewhere and link to it from your lesson website. This has the added advantages that you can publish the data under its own license (ideally CC0, as discussed above), obtain a separate DOI for it, and create another backup of your data. Dryad, Figshare, the Open Science Framework, and Zenodo are good general platforms for publishing data. However, if your lesson covers a particular domain with its own established standard for publishing data, we recommend that you use that. The Generalist Repositories Ecosystem Initiative (GREI) includes several more general options, and provides a decision tree to help you choose the most appropriate location for your data.

When you publish the data for your lesson, make sure to include:

a description of each of the files included.
information about the provenance of those files.
the lesson in which those files are used.
the license terms.
anything else you think people need to know about the data.

See the Figshare entry of data used in Data Carpentry Image Processing workshops for an example.