XML stands for Extensible Mark-up Language.XML is a very popular format and commonly used for sharing data on the internet. This chapter explains how to parse the XML file and extract necessary information from it.
HTML is a forgiving language. It tolerates a host of sins, from imprecise markup to altogether missing elements, and can still generate a web page in the browser. XML, on the other hand, is basically a tyrant. Violate even the most trivial rule, and the browser or your application will crash. Some people find comfort in the uncompromising nature of XML, because it won’t work unless you build it correctly. It’s great to get instant feedback when you do something wrong!
The main features or advantages of XML are given below.
1) XML separates data from HTML: If you need to display dynamic data in your HTML document, it will take a lot of work to edit the HTML each time the data changes.
With XML, data can be stored in separate XML files. This way you can focus on using HTML/CSS for display and layout, and be sure that changes in the underlying data will not require any changes to the HTML.
2) XML simplifies data sharing: In the real world, computer systems and databases contain data in incompatible formats.
XML data is stored in plain text format. This provides a software- and hardware-independent way of storing data.
This makes it much easier to create data that can be shared by different applications.
3) XML simplifies data transport: One of the most time-consuming challenges for developers is to exchange data between incompatible systems over the Internet.
Exchanging data as XML greatly reduces this complexity, since the data can be read by different incompatible applications.
4) XML simplifies Platform change
Upgrading to new systems (hardware or software platforms), is always time consuming. Large amounts of data must be converted and incompatible data is often lost.
XML data is stored in text format. This makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data.
5) XML increases data availability: Different applications can access your data, not only in HTML pages, but also from XML data sources.
With XML, your data can be available to all kinds of “reading machines” (Handheld computers, voice machines, news feeds, etc), and make it more available for blind people, or people with other disabilities.
6) XML can be used to create new internet languages : A lot of new Internet languages are created with XML.
Here are some examples: XHTML, WSDL for describing available web services, WAP and WML as markup languages for handheld devices, RSS languages for news feeds, RDF and OWL for describing resources and ontology, SMIL for describing multimedia for the web
There are nine basic rules for building good XML:
- All XML must have a root element.
- All tags must be closed.
- All tags must be properly nested.
- Tag names have strict limits.
- Tag names are case sensitive.
- Tag names cannot contain spaces.
- Attribute values must appear within quotes (“”).
- White space is preserved.
- HTML tags should be avoided (optional).
XML that follows these rules is said to be “well formed.” But don’t confuse well-formed XML with valid XML!
Now let’s look at the rules with some examples.
Rule 1: All XML Must Have a Root Element
A root element is simply a set of tags that contains your XML content.
Rule 2: All Tags Must Be Closed
When a tag is declared (opened), it must also be closed. Any unclosed tags will break the code. Even tags that don’t need to be closed in HTML must be closed in XML or XHTML. To open a tag, type the name of the element between less-than (<) and greater-than (>) characters, like this opening tag:
To close a tag, repeat the opening tag exactly, but insert a slash in front of the tag name, like this closing tag:
Even empty tags, such as <hr> and <br>, must be closed.
<p>Roses are Red
<p>Roses are Red</p>
Rule 3: All Tags Must Be Properly Nested
When you insert (nest) one tag within another, pay attention to the order in which you open each tag, and then close the tags in the reverse order. If you open element A and then element B, you must first close B before closing A. Even HTML tags that usually will work without a strict structure must follow the stricter XML rules when they’re used within an XML file.
Rule 4: Tag Names Have Strict Limits
Tag names can’t start with the letters xml, a number, or punctuation, except for the underscore character (_).
The letters XML are used in various commands and can’t start your tag name. Numbers and punctuation also aren’t allowed in the beginning of the tag name.
Rule 5: Tag Names Are Case Sensitive
Uppercase and lowercase matter in XML. Opening and closing tags must match exactly. For example, <ROOT>, <Root>, and <root> are three different tags.
Rule 6: Tag Names Cannot Contain Spaces
Spaces in tag names can cause all sorts of problems with data-intensive applications, so they’re prohibited in XML.Rule 7: Attribute Values Must Appear Within Quotes
Attribute values modify a tag or help identify the type of information being tagged. If you’re a web designer, you may be used to the flexibility of HTML, in which some attributes don’t require quotes. In XML, all attribute values must appear within quotes. For example:
<artist title=”author” nationality=”USA”>
Rule 8: White Space Is Preserved
If you’re in the habit of adding extra spaces and hard returns in your HTML code, watch out! Such spacing is honored by XML and can play havoc with your applications. Use extra spacing judiciously.