|
Class 16 Notes
Page history
last edited
by Alan Liu 8 years, 11 months ago
Last Phase of Project Work
1. Preparing Public Presentation Pages for Project
2. Current State of our Corpora
- 1880's Children's Fiction Corpus (134 works)
- Subcategories:
- All
- European
- American
- Female
- Male
- British Female
- British Male
- Processed versions of works:
- Full plain-text
- "Scrubbed" (Jockers 2014 stoplist applied, punctuation removed except for internal apostrophes, numerals removed, converted to all lower-case)
- "Scrubbed and chunked" (each work chunked into segments of 1,000 words)
- 1880's British Adult Fiction Corpus (451 works)
- Subcategories:
- All female & male
- Female
- Male
- Processed versions of works:
- Full plain-text
- "Scrubbed" (Jockers 2014 stoplist applied, punctuation removed except for internal apostrophes, numerals removed, converted to all lower-case)
- "Scrubbed and chunked" (each work chunked into segments of 1,000 words)
3. Text Analysis Work
- Team: Ginny, Sinead, Eve, Jennifer
4. Topic Modeling Work
Class 16 Notes
|
Tip: To turn text into a link, highlight the text, then click on a page or file from the list above.
|
|
|
Comments (0)
You don't have permission to comment on this page.