Python을 사용하여 웹에서 파일을 다운로드하시겠습니까?

<시간/>

Python은 웹에서 파일을 다운로드하기 위해 urllib, 요청 등과 같은 다양한 모듈을 제공합니다. 저는 파이썬의 요청 라이브러리를 사용하여 URL에서 파일을 효율적으로 다운로드할 것입니다.

요청 라이브러리를 사용하여 URL을 사용하여 파일을 다운로드하는 단계별 절차를 살펴보겠습니다-

1. 모듈 가져오기

import requests

2. 링크 또는 URL 가져오기

url = 'https://www.facebook.com/favicon.ico'
r = requests.get(url, allow_redirects=True)

3. 내용을 이름으로 저장합니다.

open('facebook.ico', 'wb').write(r.content)

파일을 facebook.ico로 저장합니다.

예

import requests


url = 'https://www.facebook.com/favicon.ico'
r = requests.get(url, allow_redirects=True)

open('facebook.ico', 'wb').write(r.content)

결과

Python을 사용하여 웹에서 파일을 다운로드하시겠습니까?

현재 작업 디렉토리에 파일이 다운로드(아이콘)된 것을 볼 수 있습니다.

그러나 웹에서 이미지, 텍스트, 비디오 등과 같은 다른 종류의 파일을 다운로드해야 할 수도 있습니다. 먼저 URL이 연결되는 데이터 유형을 알아보겠습니다.

>>> r = requests.get(url, allow_redirects=True)
>>> print(r.headers.get('content-type'))
image/png

그러나 실제로 다운로드하기 전에 URL의 헤더를 가져오는 것과 관련된 더 똑똑한 방법이 있습니다. 이렇게 하면 다운로드할 예정이 아닌 파일 다운로드를 건너뛸 수 있습니다.

>>> print(is_downloadable('https://www.youtube.com/watch?v=xCglV_dqFGI'))
False
>>> print(is_downloadable('https://www.facebook.com/favicon.ico'))
True

파일 크기별로 다운로드를 제한하기 위해 content-length 헤더에서 filezie를 가져온 다음 요구 사항에 따라 수행할 수 있습니다.

contentLength = header.get('content-length', None)
if contentLength and contentLength > 2e8: # 200 mb approx
return False

URL에서 파일 이름 가져오기

파일 이름을 얻기 위해 url을 구문 분석할 수 있습니다. 다음은 백슬래시(/) 뒤의 마지막 문자열을 가져오는 샘플 루틴입니다.

url= "https://www.computersolution.tech/wp-content/uploads/2016/05/tutorialspoint-logo.png"
if url.find('/'):
print(url.rsplit('/', 1)[1]

위는 URL의 파일 이름을 제공합니다. 그러나 URL에 파일 이름 정보가 없는 경우가 많습니다(예:https://url.com/download). 이러한 경우 파일 이름 정보가 포함된 Content-Disposition 헤더를 가져와야 합니다.

import requests
import re

def getFilename_fromCd(cd):
"""
Get filename from content-disposition
"""
if not cd:
return None
fname = re.findall('filename=(.+)', cd)
if len(fname) == 0:
return None
return fname[0]


url = 'https://google.com/favicon.ico'
r = requests.get(url, allow_redirects=True)
filename = getFilename_fromCd(r.headers.get('content-disposition'))
open(filename, 'wb').write(r.content)

위의 프로그램과 함께 위의 URL 구문 분석 코드는 대부분의 경우 Content-Disposition 헤더에서 파일 이름을 제공합니다.