XML Basics: A Developer's Guide to Syntax, Schemas, and Best Practices

Published: October 20, 2025Updated: February 26, 202618 min read

XML (eXtensible Markup Language) was designed in the late 1990s as a universal format for structured data. While JSON has taken over many of XML's roles in web development, XML remains the foundation of critical systems: enterprise application integration (EAI), document publishing (DITA, DocBook, EPUB), office file formats (OOXML, ODF), vector graphics (SVG), financial messaging (ISO 20022), healthcare data exchange (HL7, FHIR), and configuration for build tools (Maven, Ant, MSBuild) and application servers (Tomcat, JBoss, IIS).

Understanding XML is essential for any developer working with enterprise systems, data interchange, or document processing. This guide covers everything from basic syntax to schemas, namespaces, XPath, and XSLT — with practical examples you can use immediately.

1. What Is XML and Why Does It Matter?

XML is a markup language that defines rules for encoding documents in a format that is both human-readable and machine-readable. Unlike HTML, which has predefined tags (<div>,<p>, <table>), XML lets you define your own tags to describe any kind of data.

XML was created by the World Wide Web Consortium (W3C) and became a W3C Recommendation in 1998. It was designed to be simple, general-purpose, and usable across the internet. These principles made it the foundation for dozens of standards and formats that are still in active use today.

Key characteristics of XML:

  • Self-describing: Tag names describe the data they contain, making documents understandable without external documentation
  • Platform-independent: Plain text format works on any operating system, programming language, or hardware
  • Extensible: You define your own vocabulary of tags for your specific use case
  • Hierarchical: Tree structure naturally represents parent-child relationships in data
  • Validatable: Schemas (XSD, DTD, Relax NG) let you enforce structure and data types at the document level

2. XML Syntax Rules

XML syntax is strict compared to HTML. Browsers will forgive malformed HTML, but XML parsers will reject any document that violates these rules:

  • Every opening tag must have a closing tag: <name>John</name>, or use self-closing syntax <br/>
  • Tags are case-sensitive: <Name> and <name> are different elements
  • Proper nesting is required: <b><i>text</i></b> is valid; <b><i>text</b></i> is not
  • Exactly one root element: The entire document content must be inside a single root element
  • Attribute values must be quoted: Use double or single quotes, e.g., id="42"
  • Special characters must be escaped: Use &lt; for <, &amp; for &, &gt; for >

A complete, well-formed XML document:

<?xml version="1.0" encoding="UTF-8"?>
<library>
  <book isbn="978-0-13-468599-1">
    <title>The Pragmatic Programmer</title>
    <author>David Thomas</author>
    <author>Andrew Hunt</author>
    <year>2019</year>
    <publisher>Addison-Wesley</publisher>
  </book>
  <book isbn="978-0-596-51774-8">
    <title>JavaScript: The Good Parts</title>
    <author>Douglas Crockford</author>
    <year>2008</year>
    <publisher>O&apos;Reilly Media</publisher>
  </book>
</library>

3. Elements, Attributes, and Text Content

XML documents are built from three main building blocks:

Elements

Elements are the primary carriers of data. They have a start tag, content (text, child elements, or both), and an end tag. The element name should describe the data it holds.

Attributes

Attributes are name-value pairs on the start tag. They're best used for metadata that qualifies or identifies an element rather than carrying primary data content. Good candidates for attributes: IDs, status codes, language codes, version numbers, units of measurement.

Text content

The text between opening and closing tags. Text can appear alone or mixed with child elements (mixed content).

<!-- Element with text content -->
<title>Clean Architecture</title>

<!-- Element with attributes -->
<price currency="EUR" tax="included">29.99</price>

<!-- Element with child elements -->
<address>
  <street>123 Main St</street>
  <city>Springfield</city>
  <zip>62704</zip>
</address>

<!-- Mixed content (text + child elements) -->
<description>This book is <em>essential</em> reading.</description>

4. The XML Declaration

The XML declaration is the first line of an XML document. It specifies the XML version and character encoding. While technically optional, it's strongly recommended for interoperability:

<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  • version: Almost always "1.0". XML 1.1 exists but is rarely used.
  • encoding: The character encoding. UTF-8 is the default and recommended choice. Other options include UTF-16, ISO-8859-1, etc.
  • standalone: "yes" means the document doesn't depend on external DTDs. "no" means it references external definitions.

5. Well-Formed vs. Valid XML

These two concepts are often confused but represent different levels of correctness:

Well-Formed

The document follows XML syntax rules: matching tags, single root, proper nesting, quoted attributes. Every XML parser requires well-formedness. A document that isn't well-formed will raise a fatal parsing error.

Valid

The document is well-formed AND conforms to a schema (DTD, XSD, or Relax NG). Validation checks element names, ordering, data types, required fields, and cardinality. Validation is optional but important for data exchange.

Think of it this way: well-formed means the XML syntax is correct (like grammatically correct English). Valid means the content also follows specific rules (like a properly formatted business letter with all required sections).

6. DTD: Document Type Definitions

DTDs were the original schema language for XML. They define which elements and attributes are allowed, their structure, and their cardinality. DTDs use a compact, non-XML syntax:

<!-- DTD definition -->
<!DOCTYPE library [
  <!ELEMENT library (book+)>
  <!ELEMENT book (title, author+, year)>
  <!ATTLIST book isbn CDATA #REQUIRED>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT year (#PCDATA)>
]>

DTD limitations: DTDs don't support data types (no distinction between strings and numbers), don't support namespaces well, and use non-XML syntax. For these reasons, XSD has largely replaced DTDs in modern systems, though DTDs are still found in legacy documents and HTML5's doctype declaration.

7. XSD: XML Schema Definition

XSD (XML Schema Definition) is the modern, powerful successor to DTD. XSD schemas are themselves written in XML and support rich data types, complex structures, inheritance, and namespace-aware validation.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="title" type="xs:string"/>
              <xs:element name="author" type="xs:string"
                          maxOccurs="unbounded"/>
              <xs:element name="year" type="xs:gYear"/>
            </xs:sequence>
            <xs:attribute name="isbn" type="xs:string"
                          use="required"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

XSD supports over 40 built-in data types (string, integer, decimal, date, dateTime, boolean, etc.) and lets you create custom types with restrictions (minimum/maximum values, patterns, enumerations). This makes XSD the standard choice for B2B data exchange, SOAP services, and regulated industries.

8. Namespaces Explained

Namespaces solve the problem of tag-name collisions when you combine XML from different sources. Without namespaces, a <title> from a book catalog and a<title> from an employee record would be ambiguous.

<!-- Using namespace prefixes -->
<catalog xmlns:book="http://example.com/books"
         xmlns:hr="http://example.com/hr">
  <book:title>XML Handbook</book:title>
  <hr:title>Senior Developer</hr:title>
</catalog>

<!-- Using a default namespace -->
<catalog xmlns="http://example.com/books">
  <title>XML Handbook</title>         <!-- in the books namespace -->
  <author>Jane Smith</author>         <!-- also in books namespace -->
</catalog>

Namespace URIs don't need to resolve to actual web pages — they're just unique identifiers. By convention, organizations use their domain name as part of the URI to ensure global uniqueness.

9. CDATA Sections and Special Characters

Five characters have special meaning in XML and must be escaped when used in text content:

CharacterEscape SequenceName
<&lt;Less than
>&gt;Greater than
&&amp;Ampersand
'&apos;Apostrophe
"&quot;Quotation mark

CDATA sections provide an alternative: a block where characters are not parsed, so you can include raw text (HTML, code, mathematical expressions) without escaping:

<script><![CDATA[
  function compare(a, b) {
    if (a < b && b > 0) {
      return a & b;
    }
  }
]]></script>

10. XPath: Querying XML Data

XPath is a query language for selecting nodes from an XML document. It's used extensively in XSLT, XQuery, and by XML processing tools. XPath expressions describe paths through the XML tree:

Given this XML:
<library>
  <book isbn="978-0-13-468599-1">
    <title>The Pragmatic Programmer</title>
    <year>2019</year>
  </book>
  <book isbn="978-0-596-51774-8">
    <title>JavaScript: The Good Parts</title>
    <year>2008</year>
  </book>
</library>

XPath expressions:
/library/book          → selects all <book> elements
/library/book[1]       → selects the first book
/library/book/title    → selects all <title> elements
//title                → selects <title> anywhere in the document
/library/book[@isbn]   → selects books that have an isbn attribute
/library/book[year>2010]/title → titles of books after 2010

XPath is invaluable when you need to extract specific data from complex XML documents. Most programming languages have XPath support: Python's lxml, Java's javax.xml.xpath, JavaScript's document.evaluate, and .NET's XPathNavigator. Learning XPath is essential for anyone working with XML data processing.

11. XSLT: Transforming XML

XSLT (eXtensible Stylesheet Language Transformations) is a language for transforming XML documents into other XML documents, HTML, plain text, or any text-based format. It's like a template engine specifically designed for XML.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/library">
    <html>
      <body>
        <h1>Book Catalog</h1>
        <ul>
          <xsl:for-each select="book">
            <li>
              <strong><xsl:value-of select="title"/></strong>
              (<xsl:value-of select="year"/>)
            </li>
          </xsl:for-each>
        </ul>
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>

XSLT is used for generating HTML views from XML data, converting between different XML schemas, creating PDF reports (via XSL-FO), and migrating data between systems with different XML formats. While it has a learning curve, XSLT is extremely powerful for XML-heavy workflows.

12. XML in the Real World (2026)

Despite JSON's dominance in web APIs, XML is far from dead. Here's where it's still the standard:

Office Documents

Microsoft Office (OOXML), LibreOffice (ODF), and Google Docs exports all use XML-based formats internally. Every .docx and .xlsx file is a ZIP archive containing XML files.

Vector Graphics (SVG)

Scalable Vector Graphics is XML. Every icon, illustration, and interactive graphic on the modern web can be described in SVG's XML format.

Build Tools & CI/CD

Maven (pom.xml), Ant (build.xml), MSBuild (.csproj, .sln), and Jenkins pipeline configs use XML. Android projects use AndroidManifest.xml.

Healthcare & Finance

HL7 CDA, FHIR (supports both XML and JSON), ISO 20022 financial messages, and FpML derivatives trading all rely on XML schemas.

Syndication Feeds

RSS 2.0 and Atom feeds remain XML. Podcasting directories (Apple Podcasts, Spotify) require XML feeds for content submission.

SOAP & Enterprise Integration

Banking, insurance, supply chain, and government systems still run on SOAP/XML web services. These systems process billions of transactions annually.

13. XML vs JSON: Quick Comparison

AspectXMLJSON
VerbosityMore verbose (opening + closing tags)Compact (braces and brackets)
Data typesAll text; types via schemasNative strings, numbers, booleans, null
Schema validationXSD, DTD, Relax NG (very mature)JSON Schema (growing adoption)
AttributesNative supportModeled via conventions (@, $)
Mixed contentSupported nativelyAwkward, requires conventions
Query languageXPath, XQueryJSONPath, jq
TransformationXSLTCustom code / template engines
Best forDocuments, enterprise, regulated industriesWeb APIs, configs, real-time data

For a deep comparison, read our dedicated XML vs JSON article.

14. Best Practices and Common Mistakes

Best practices

  • Always include the XML declaration with UTF-8 encoding
  • Use meaningful, descriptive element names (not <d1>, <field3>)
  • Prefer elements for data content and attributes for metadata
  • Define and validate against XSD schemas for data exchange
  • Use namespaces when combining vocabularies from different sources
  • Keep consistent naming conventions (camelCase, PascalCase, or kebab-case — pick one)
  • Format documents with consistent indentation for readability
  • Use CDATA only when embedding code or markup that would require extensive escaping

Common mistakes

Missing closing tags or mismatched nesting

XML parsers are unforgiving. A single missing closing tag causes a fatal error. Use a validating editor or linter.

Unescaped ampersands in text or URLs

URLs with query parameters (like &page=2) contain ampersands that must be escaped to &amp;page=2 in XML.

Multiple root elements

XML allows exactly one root element. Wrapping everything in a single container element solves this.

Using attributes for large or structured data

Attributes should hold simple values. Complex or multi-line data belongs in child elements.

Next Steps

Now that you understand XML fundamentals, try converting XML data to JSON using ourXML to JSON converter. For practical conversion examples, see our XML to JSON mapping guide.

If you're working with JSON and want to learn that format in depth, check out ourcomprehensive JSON guide.