Python에서 DOM API로 XML 구문 분석

<시간/>

문서 개체 모델("DOM")은 XML 문서에 액세스하고 수정하기 위한 W3C(World Wide Web Consortium)의 교차 언어 API입니다.

DOM은 랜덤 액세스 애플리케이션에 매우 유용합니다. SAX를 사용하면 한 번에 문서의 한 비트만 볼 수 있습니다. 하나의 SAX 요소를 보고 있다면 다른 요소에 액세스할 수 없습니다.

다음은 XML 문서를 빠르게 로드하고 xml.dom 모듈을 사용하여 minidom 개체를 만드는 가장 쉬운 방법입니다. minidom 개체는 XML 파일에서 DOM 트리를 빠르게 생성하는 간단한 파서 메서드를 제공합니다.

샘플 구문은 minidom 객체의 parse( file [,parser] ) 함수를 호출하여 file 로 지정된 XML 파일을 DOM 트리 객체로 파싱합니다.

#!/usr/bin/python
from xml.dom.minidom import parse
import xml.dom.minidom
# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("movies.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("shelf"):
   print "Root element : %s" % collection.getAttribute("shelf")
# Get all the movies in the collection
movies = collection.getElementsByTagName("movie")
# Print detail of each movie.
for movie in movies:
print "*****Movie*****"
   if movie.hasAttribute("title"):
      print "Title: %s" % movie.getAttribute("title")
   type = movie.getElementsByTagName('type')[0]
   print "Type: %s" % type.childNodes[0].data
   format = movie.getElementsByTagName('format')[0]
   print "Format: %s" % format.childNodes[0].data
   rating = movie.getElementsByTagName('rating')[0]
   print "Rating: %s" % rating.childNodes[0].data
   description = movie.getElementsByTagName('description')[0]
   print "Description: %s" % description.childNodes[0].data

이것은 다음과 같은 결과를 낳을 것입니다 -

Root element : New Arrivals
*****Movie*****
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Rating: PG
Description: Talk about a US-Japan war
*****Movie*****
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Rating: R
Description: A schientific fiction
*****Movie*****
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Description: Vash the Stampede!
*****Movie*****
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Description: Viewable boredom

DOM API 문서에 대한 자세한 내용은 표준 Python API를 참조하세요.