Split the script holy_grail into lines using the newline ('\n') character. Use re.sub() inside a list comprehension to r

动态未结 0 133

小小草 LV1 2023年9月23日 23:40 发表

Here's the completed code: ``` # Import the necessary modules import re from nltk.tokenize import regexp_tokenize import matplotlib.pyplot as plt # Read in the script as a string with open('holy_grail.txt', 'r') as file: holy_grail = file.read() # Split the script into lines: lines lines = holy_grail.split('\n') # Replace all script lines for speaker pattern = "[A-Z]{2,}(\s)?(#\d)?([A-Z]{2,})?:" lines = [re.sub(pattern, '', l) for l in lines] # Tokenize each line: tokenized_lines tokenized_lines = [regexp_tokenize(s, '\w+') for s in lines] # Make a frequency list of lengths: line_num_words line_num_words = [len(t_line) for t_line in tokenized_lines] # Plot a histogram of the line lengths plt.hist(line_num_words) # Show the plot plt.show() ``` The code reads in the script from a file called `holy_grail.txt` and splits it into lines using the newline character. It then uses