English-language resources have dominated the discourse of digital humanities across the globe. This course takes a broader view, focusing on the methods, tools, and discourse of digital humanities as applied to textual materials in languages other than English. Students will develop practical skills in applying digital humanities research methodologies to texts in any language of their choosing. In addition, students will become familiar with major digital humanities scholarly organizations, movements, and debates that have their origins in different linguistic and cultural identities. No prior technical or digital humanities experience required, but students must have a reading knowledge of at least one non-English language (modern or historical).
Credits / Grading
This course will be offered for between 1-5 credits, and for either a grade or credit/no credit. Students who wish to take it as part of the DH Minor must choose 5 credits (if taken as a core course) or 3 credits (if taken as an elective) and must take it for a letter grade.
The course will use contract grading, where students choose what grade they wish to receive, and write a contract (within defined parameters) at the beginning of the quarter that lays out the requirements for receiving that grade. Individual assignments will receive extensive feedback but will be graded as satisfactory / unsatisfactory. Students will have one week to revise unsatisfactory assignments to fulfill the terms of their contract. If a student is unable to fulfill the terms of their original contract, they will meet with the instructor and sign a new contract for a different grade. Parameters for a “A” grade contracts for each of the credit levels are included an appendix to this draft syllabus, and the full rubrics for each level will be available on the first day of class.
This course will involve reading less primary and secondary literature than most courses in the humanities, and there will be neither papers nor exams. Instead, you’ll get experience with reading and producing different kinds of scholarly production: digital projects, conference proposals, blog posts, posters, tutorials, and technical documentation. While the word count for each of these is much less than a typical paper assignment, you may find that writing clearly and succinctly is a greater challenge than putting together a 10-page paper. The number of assignments required will depend on how many credits you’re enrolled for, and what grade you’ve contracted for (see appendix). In lieu of extensive readings for each class, you’ll be expected to spend some time experimenting with your text corpus and the tools and methods we’ve discussed (or others you’ve found). We’ll begin each class talking about challenges and breakthroughs you’ve experienced. Contributing to these discussions is an important part of class participation.
Weekly schedule (tentative)
Tuesday, January 8: Introductions
We’ll go over the syllabus, make sure everyone is signed up for the right number and type of credits, and talk about contract grading. By way of context for the course, I’ll share a few things about my own background with non-English digital humanities. We’ll talk about definitions of digital humanities, and touch on what this course won’t be covering. Finally, we’ll start discussing some of the kinds of research questions that the tools and methods in this course can help address.
Thursday, January 10: DH and disciplinary (sub-)cultures
We’ll continue our discussion of the kinds of research questions that this course can help you learn how to answer. We’ll also talk about some of the sub-cultures and divisions within DH, as well as unifying values shared by much of DH. We’ll touch on how DH intersects with data science, statistics, and programming. We’ll also look at examples of tutorials, recipes, and documentation to lay a foundation for a future tutorial assignment.
Tuesday, January 15: Digitizing and digitized text (hands-on, please bring laptop)
Digital text is a prerequisite for any of the analytical and visualization tools that we’ll be looking at, but it’s not where many projects begin. We’ll talk about scanning or photographing textual documents and using Optical Character Recognition (OCR) to convert them to digital text. We’ll talk about how good is “good enough” for OCR quality, and how you can improve it. Once you have digital text, we’ll look at Voyant, a straightforward tool for looking at your text in new ways – but it requires that your text have words separated by spaces. We’ll talk about word segmenter tools that can make Voyant and similar tools usable for languages that don’t use spaces.
Thursday, January 17: “Data” and documentation
We’ll consider whether (and how) DH has “data”, and what the difference is between a text and data. We’ll follow up on the earlier examples of tutorials by looking at technical documentation, and how to go about decoding these texts written for a programmer audience. We’ll discuss how to talk with programmers, including how (and when) to file a bug report for code or software.
Tuesday, January 22: Unicode
If you work with a language that doesn’t use unaccented Latin characters, you can now be fairly confident that your digital text will be readable on most or all devices. That wasn’t always the case, and we have the Unicode consortium to thank for the tremendous progress made over the last 30 years. We’ll have a special guest lecture from Debbie Anderson, a researcher in linguistics at UC Berkeley, and the director of the Script Encoding Initiative, a project devoted to the preparation of formal proposals for the encoding of scripts and script elements not yet currently supported in Unicode. She’ll explain what Unicode is, how it works, and will talk about the research, design, and community consensus work that goes into adding characters to Unicode.
Thursday, January 24: Getting, using, and sharing texts
Text digitization – be it through OCR or transcription – is a time-consuming prerequisite for digital research. Especially in smaller fields that are less likely to receive grants for large-scale digitization, it’s valuable to share texts that you’ve prepared. We’ll talk about some options for how to do that, best practices for file formats and documentation, and citation and credit for reusing others’ texts. We’ll touch on copyright and data ownership, and how to deal with those constraints. We’ll look at national text corpora and HathiTrust, and how to get access to those corpora. We may also have a guest lecturer who can talk about how the Stanford Libraries can help you acquire texts.
Tuesday, January 29: Thematic research collections
Thematic research collections that make an argument and/or provide a resource are a longstanding group of digital projects. We’ll take a look at some examples and the tools used to create them. We’ll also look at approaches to developing thematic research collections that are influenced by postcolonial studies, including those that respect other forms of knowledge and defer to other permission systems.
Thursday, January 31: TEI
The Text Encoding Initiative (TEI) is a set of guidelines for encoding information about the structure and/or contents of a text within the text itself. It is explicitly not a standard, and we’ll talk about the implications of guidelines vs standard. TEI has been the recommended approach for creating critical editions for decades, and we’ll look at examples of that kind of project and the tools that do, and don’t, exist to make it easier to do this kind of work. We’ll also touch on the human aspect of this infrastructure, and how decisions get made about what changes should be made to TEI.
1/31 - Due date: project proposals (question, method, source)
All students taking the course for 3 or more credits will submit a short proposal (1/2 page) with the question their project will be answering, what text(S) they’ll use to answer it, and what tools / methods they’ll use to answer it.
Thursday, February 7: Topic modeling (hands-on, please bring laptop)
Topic modeling is a computational technique that generates clusters of words (“topics”), though it’s left as an exercise to the researcher to interpret what those topics mean. Using our digitized texts, we’ll run a tool called Mallet that implements the LDA algorithm for topic modeling, and attempt to make sense of the results.
Tuesday, February 12: Communities: Disciplinary, DH, Identity
There are many facets to your identity as a scholar, and you may need to adjust the way you frame your work based on the venues and communities where you are presenting it. We’ll talk about your experience (if any) with conferences in your discipline, and how disciplinary conferences differ from DH ones. We’ll reflect on the differences in the conception of “diversity” in the United States vs. Europe, and how those differences have played out in the international DH conference. We’ll also look at some of the DH organizations, events, and communities that are centered on different aspects of identity, including nation, language, race, and gender. This session will lay the groundwork for the paired conference proposals assignment (for students taking the course for 5 credits).
2/12 - Project proposals returned w/ feedback
Thursday, February 14: NLP (hands-on, please bring laptop)
We’ll cover a natural language processing (NLP) approach/tool in a hands-on manner. Currently consulting with a number of NLP experts about the options that would be the best fit for this course.
Tuesday, February 19: Design and DH
When and how does design matter when you’re working with texts? We’ll talk about design in the context of personal and project web presences, as well as conference poster design, and the difference between attended and unattended posters.
Thursday, February 21: NLP (hands-on, please bring laptop)
We’ll cover an NLP approach/tool in a hands-on manner. Currently consulting with a number of NLP experts about the options that would be the best fit for this course.
Tuesday, February 26: Stylometry (hands-on, please bring laptop)
Computational stylistics has been used for authorship attribution and detecting changes in authorship, looking at the impact of gender or background on literary production, and analyzing genre. We’ll use the R programming language to run the stylo package on your text corpus, or an example corpus, if yours isn’t a good fit.
Thursday, February 28: Creating and cleaning structured data (hands-on, please bring laptop)
Some kinds of analysis and visualization require some kind of structured data as an input: you can’t simply use a full digital text. We’ll talk about some common kinds of structured data, and how to use OpenRefine to reduce or eliminate inconsistencies in that data (“cleaning” it). We’ll try transforming a list of places into geographic coordinates, and talk about the limitations of geocoding. Finally, we’ll take our data (either newly-created from your own texts, or example data) and try it with Palladio, a simple visualization tool designed for humanistic inquiry.
Tuesday, March 5: Palladio & visualization (hands-on, please bring laptop)
We’ll continue our exploration of Palladio and discuss other general-purpose visualization tools. We’ll also discuss accessibility, particularly as relates to visualization, and some visualization tools designed for accessibility.
Thursday, March 7: Network analysis (hands-on, please bring laptop)
Palladio can turn a spreadsheet of data into a network that you can see. To turn a spreadsheet of data into a quantified network that you can meaningfully compare with other data, you need a network analysis tool, and a basic understanding of network analysis. We’ll talk about some major concepts behind network analysis, and will try our example data with Gephi, one of the most widely-used pieces of software for network analysis.
3/12 – Due date: posters
Students who have signed up to do a poster as part of their grade contract must submit their final poster by 10 AM on 3/12 to have it printed in time for the poster exhibition.
Tuesday, March 12: Going beyond borders
After familiarizing ourselves with Gephi in the last class, we’ll look at a network visualization of conference presenters from DH 2014, and what it tells us about how one sample of the field clustered. We’ll talk about professional organizations in DH, and the borders they cross and those they reinforce. We’ll cover DARIAH, the European research infrastructure for digital humanities, and their efforts to engage with scholars beyond Europe. Finally, we’ll reflect on the role that multilingual, culturally-aware scholars can play in bridging DH communities.
Thursday, March 14: Poster session
Stanford’s literature departments, CESTA, and the Library will be invited to a poster session featuring students’ final posters. Students with posters will present them to attendees; students without posters will demo their tutorial and/or talk about their project. Students without a poster or tutorial will listen to their classmates’ presentations and ask questions.
3/15 – Recommended due date: tutorials & conference proposals
Students who have signed up to do a tutorial or conference proposal as part of their contract are advised to turn them in by 3/15 to ensure they have enough time for a revision if needed for the tutorial to be considered satisfactory. Students who submit their work by 3/15 will receive feedback by 3/18.
Tutorials and conference proposals can be submitted as late as 3/18 for feedback by 3/20. Work submitted after 3/18 will not receive feedback and an opportunity for revision.
All work and revisions must be submitted by 3/22.