Python의 데이터 세트에서 k개의 가장 빈번한 단어 찾기

<시간/>

데이터 세트에서 가장 자주 사용되는 10개의 단어를 찾아야 하는 경우 Python은 컬렉션 모듈을 사용하여 이를 찾는 데 도움을 줄 수 있습니다. 컬렉션 모듈에는 단어 목록을 제공한 후 단어 수를 제공하는 카운터 클래스가 있습니다. 또한 프로그램 입력에 필요한 단어의 수를 찾기 위해 most_common 방법을 사용합니다.

예시

아래 예에서 우리는 단락을 취하고 먼저 split()을 적용하여 단어 목록을 만듭니다. 그런 다음 counter()를 적용하여 모든 단어의 개수를 찾습니다. 마지막으로 most_common 함수는 우리가 원하는 빈도가 가장 높은 단어의 수에 대한 적절한 결과를 제공합니다.

from collections import Counter
word_set = " This is a series of strings to count " \
   "many words . They sometime hurt and words sometime inspire "\
   "Also sometime fewer words convey more meaning than a bag of words "\
   "Be careful what you speak or what you write or even what you think of. "\
# Create list of all the words in the string
word_list = word_set.split()

# Get the count of each word.
word_count = Counter(word_list)

# Use most_common() method from Counter subclass
print(word_count.most_common(3))

의 메소드

출력

위의 코드를 실행하면 다음과 같은 결과가 나옵니다. -

[('words', 4), ('sometime', 3), ('what', 3)]