Python에서 텍스트를 검색하고 바꾸는 방법은 무엇입니까?

<시간/>

문제

문자열에서 텍스트 패턴을 검색하고 바꾸려고 합니다.

매우 단순한 리터럴 패턴이 있는 경우 str.replace() 메서드를 사용하는 것이 최적의 솔루션입니다.

예시

def sample():
yield 'Is'
yield 'USA'
yield 'Colder'
yield 'Than'
yield 'Canada?'

text = ' '.join(sample())
print(f"Output \n {text}")

출력

Is USA Colder Than Canada?

먼저 텍스트를 검색하는 방법을 살펴보겠습니다.

# search for exact text
print(f"Output \n {text == 'USA'}")

출력

False

str.find(), str.endswith(), str.startswith()와 같은 기본 문자열 메서드를 사용하여 텍스트를 검색할 수 있습니다.

# text start with
print(f"Output \n {text.startswith('Is')}")

출력

True

# text ends with
print(f"Output \n {text.startswith('Is')}")

출력

True

# search text with find
print(f"Output \n {text.find('USA')}")

출력

검색할 입력 텍스트가 더 복잡하면 정규식과 re 모듈을 사용할 수 있습니다.

# Let us create a date in string format
date1 = '22/10/2020'

# Let us check if the text has more than 1 digit.
# \d+ - match one or more digits
import re
if re.match(r'\d+/\d+/\d+', date1):
print('yes')
else:
print('no')
yes

이제 텍스트 교체로 돌아갑니다. 대체할 텍스트와 문자열이 단순하면 str.replace()를 사용하십시오.

출력

print(f"Output \n {text.replace('USA', 'Australia')}")

출력

Is Australia Colder Than Canada?

검색 및 대체할 복잡한 패턴이 있는 경우 re 모듈의 sub() 메서드를 활용할 수 있습니다.

sub()의 첫 번째 인수는 일치시킬 패턴이고 두 번째 인수는 대체 패턴입니다.

아래 예에서는 dd/mm/yyyy에서 날짜 필드를 찾아 yyyy-dd-mm 형식으로 바꿉니다. \3과 같은 백슬래시 숫자는 패턴의 캡처 그룹 번호를 나타냅니다.

import re
sentence = 'Date is 22/11/2020. Tommorow is 23/11/2020.'
# sentence
replaced_text = re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', sentence)
print(f"Output \n {replaced_text}")

출력

Date is 2020-22-11. Tommorow is 2020-23-11.

다른 방법은 더 나은 성능을 얻기 위해 먼저 표현식을 컴파일하는 것입니다.

출력

pattern = re.compile(r'(\d+)/(\d+)/(\d+)')
replaced_pattern = pattern.sub(r'\3-\1-\2', sentence)
print(f"Output \n {replaced_pattern}")

출력

Date is 2020-22-11. Tommorow is 2020-23-11.

re.subn()은 텍스트 대체와 함께 수행된 대체 수를 알려줍니다.

출력

output, count = pattern.subn(r'\3-\1-\2', sentence)
print(f"Output \n {output}")

출력

Date is 2020-22-11. Tommorow is 2020-23-11.

출력

print(f"Output \n {count}")