파이썬 translate, punctuation을 이용한 단어 구분

Basic/Python

파이썬 translate, punctuation을 이용한 단어 구분

가누 2017. 7. 12. 00:05

파이썬에서는 translate라는 str 메소드가 존재한다.

보통은 테이블을 이용하여 치환해주는 방식인데 다르게 쓸 수 있는 방법이 존재한다.

import string을 한 후

print string.punctuation을 해보면 여러 기호들이 나타남을 알 수 있다.

이 객체와 translate라는 메소드를 이용하여 단어를 구분할 때 어떻게 쓰는지 알아보자.

#-*- coding: CP949 -*-
 
import string
 
rhandle = open('romeo-full.txt', 'r')
 
s = rhandle.read()
 
s = s.translate(None, string.punctuation)
 
dic = {}
 
for i in s:
    if i == ' ' or i == '\n' or ('0' <= i and i <= '9'):
        continue
    dic[i.lower()] = dic.get(i.lower(), 0) + 1
 
tmp = []
for i, j in dic.items():
    tmp.append((j,i))
 
tmp.sort(reverse = True)
 
for i in tmp:
    print i[1], i[0]
 
//                                                       This source code Copyright belongs to Crocus
//                                                        If you want to see more? click here >>