james baker and I recently launched the sourecaster, a command line resource for working with digital primary sources.
commands fall into the following categories:
casting – changing one type of data to another type (e.g. PDF to TXT for text analysis purposes)
wrangling – manipulating and navigating data (e.g. remove punctuation, normalize case)
getting – grabbing data from various locations (e.g. webscraping all relevant images from portions of a website)
managing – editing and managing your work with data (e.g. save command line history)
the hope is that this resource will continue to grow. feel free to contribute your solutions by emailing me directly or starting an issue on github.