1. XML Parsers(DOM,SAX)
Configuration files, application file formats, even database access layers make use of XML-based documents. Fortunately, several high-quality implementations of the standard APIs for handling XML are available. Unfortunately, these APIs are large and therefore provide a formidable hurdle for the beginner.
XML is becoming increasingly popular in the developer community as a tool for passing, manipulating, storing, and organizing information. If you are one of the many developers planning to use XML, you must carefully select and master the XML parser.
The parser—one of XML's core technologies—is your interface to an XML document, exposing its contents through a well-specified API. Confirm that the parser you select has the functionality and performance that your application requires. A poor choice can result in excessive hardware requirements, poor system performance and developer productivity, and stability issues. There are mainly two types of XML API : DOM, SAX
Which parser should I use - DOM or SAX?
The proper choice mostly depends on the requirements of the application. This note lists some of the properties of each of the parsers and tries to give a hand on deciding which one to use.
The DOM parser always reads the whole xml document. It either throws an exception when it encounters an error during parsing, or returns a complete DOM tree as a representation of the xml document.
In contrast, the SAX parser works incrementally and generates events that are passed to the application. An application can receive these events by implementing the abstract methods of the SAX2XMLReader class.
What are the pros and cons?
The DOM parser offers a convenient way for reading, analizing, manipulating and writing back XML files. Since it always reads the whole file, before further processing can take place, using the DOM parser may lead to difficulties when processing huge XML files.
The SAX parser, on the other hand, does not generate a data representation of the XML content, so there is some more programming required, compared to the DOM parser. However, if demanded by the application, the SAX parser enables stream-processing and partial processing of XML sources, which both cannot be done by the DOM parser.
As a rule of thumb, which parser class to use, the following can be checked:
1. Whenever you need stream-processing or partial processing of XML files, you need the SAX parser.
2. Whenever you need a complete representation of the XML content, you should prefer the DOM parser.
3.Still no decision? Then try the DOM parser first, since it is more convenient than the SAX parser.
XPath
XPath is a language for addressing parts of an XML document.XPath (XML Path Language) is a language for selecting nodes from an XML document. In addition, XPath may be used to compute values (strings, numbers, or boolean values) from the content of an XML document.
XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document. XPath is a major element in the W3C's XSLT standard - and XQuery and XPointer are both built on XPath expressions. So an understanding of XPath is fundamental to a lot of advanced XML usage.
The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though not in the official specification), an XPath expression is often referred to simply as an XPath.
XQuery
The best way to explain XQuery is to say that XQuery is to XML what SQL is to database tables. XQuery was designed to query XML data.XML is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories.
A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.
XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents.XQuery uses XPath expression syntax to address specific parts of an XML document. It supplements this with a SQL-like "FLWOR expression" for performing joins. A FLWOR expression is constructed from the five clauses after which it is named: FOR, LET, WHERE, ORDER BY, RETURN.
XSLT
XSLT stands for XSL Transformations. XSLT is a language for transforming XML documents into other XML documents. XSLT is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary.
XSLT is also designed to be used independently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformations that are needed when XSLT is used as part of XSL.
Extensible Stylesheet Language Transformations (XSLT) is an XML-based language used for the transformation of XML documents into other XML or "human-readable" documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized (output) by the processor in standard XML syntax or in another format, such as HTML or plain text. XSLT is most often used to convert data between different XML Schemas or to convert XML data into HTML or XHTML documents for web pages, creating a dynamic web page, or into an intermediate XML format that can be converted to PDF documents.
XSLT vs XQuery : XQuery is for query, XSLT is for transformation.
XSLT & XQuery has lot's of things in common ... let's focus on differences.
Firstly, in functionality alone, there is no doubt that XSLT 2.0 wins over XQuery 1.0. There are many jobs that XSLT 2.0 can do easily that are really difficult in XQuery 1.0. Many of these fall into the categories of up-conversion applications or rendition applications, but there are plenty of others. The example given earlier, of a stylesheet/query that copies a document except for the NOTE attributes, illustrates the point.
Secondly, it's probably true at present that XSLT is better at manipulating documents, and XQuery is better at manipulating data. Both languages should be able to do both jobs, but they seem to be better at some aspects of the job than others.
The extra verbosity of XSLT (which still applies although to a lesser extent with XSLT 2.0) is probably most noticeable with very simple queries ("count how many employees will retire this month"). I find myself increasingly using XQuery for such one-liners in preference to XSLT. This applies whether it's an ad-hoc throwaway query, or something built into a Java application. In many such cases, in fact, all one needs is an XPath expression, and of course XPath is a pure subset of XQuery.
If you are building XML databases, whether "native" XML databases or XML-over-relational databases, XQuery is certainly the language of choice. If you are transforming XML documents in filestore or in memory, I think it's much harder to justify preferring XQuery over XSLT at this stage of the game. In a year's time, perhaps there will be more data to justify making this choice especially for data-oriented applications, but my feeling is that anyone who does so today is probably attaching rather too much weight to subjective criteria.
I would actually encourage any serious XML developer to have both tools in their kitbag. Once you have learnt one, it's easy enough to learn the other. I think that with time, there will be a good level of interoperation between XSLT and XQuery products, so using one language for one task doesn't get in the way of using the other language for another part of the same application. XQuery clearly wins for the database access, XSLT for the presentation side of the application; there are other bits, such as the business logic, where in many cases either language will do the job and it becomes a matter of personal preference.
Links:
http://www.xml.com/pub/a/2001/02/14/perlsax.html
http://www.onjava.com/pub/a/onjava/2002/06/26/xml.html
http://www.devx.com/xml/Article/16922 http://www.idealliance.org/proceedings/xtech05/papers/02-03-01/
XML - http://www.w3schools.com/
XML Schema - http://www.w3schools.com/Schema/schema_schema.asp
XSD Facets (Restricting the range of values for data_type -http://www.w3schools.com/Schema/schema_facets.asp
2.XML Data Binding (C++,Java)
XML Data Binding provides a simple and direct way to use XML in your applications. With data binding your application can largely ignore the actual structure of XML documents, instead working directly with the data content of those documents. This isn't suitable for all applications, but it is ideal for the common case of applications that use XML for data exchange.
Data binding can also provide other benefits beyond programming simplicity. Since it abstracts away many of the document details, data binding usually requires less memory than a document model approach (such as DOM or JDOM) for working with documents in memory. You'll also find that the data binding approach gives you faster access to data within your program than you would get with a document model, since you don't need to go through the structure of the document to get at the data. Finally, special types of data such as numbers and dates can be converted to internal representations on input, rather than being left as text; this allows your application to work with the data values much more efficiently.
You might be wondering, if data binding is such great stuff, when would you want to use a document model approach instead? Basically there are two main cases:
When your application is really concerned with the details of the document structure. If you're writing an XML document editor, for instance, you'll want to stick to a document model rather than using data binding. When the documents that you're processing don't necessarily follow fixed structures. For example, data binding wouldn't be a good approach for implementing a general XML document database.
Data binding dictionary
Grammar is a set of rules defining the structure of a family of XML documents. One type of grammar is the Document Type Definition (DTD) format defined by the XML specification. Another increasingly common type is the W3C XML Schema (Schema) format defined by the XML Schema specification. Grammars define which elements and attributes can be present in a document, and how elements can be nested within the document (often including the order and number of nested elements). Some types of grammars (such as Schema) also go much further, allowing specific data types and even regular expressions to be matched by character data content. In this article I'll often use the term description as an informal way to refer to the grammar for a family of documents.
Marshalling is the process of generating an XML representation for an object in memory. As with Java object serialization, the representation needs to include all dependent objects: objects referenced by our main object, objects referenced by those objects, and so on.
Unmarshalling is the reverse process of marshalling, building an object (and potentially a graph of linked objects) in memory from an XML representation.
Links:
XML-JAVA Bindings
XML- JAVA Data BindingXML Data Binding - http://www-128.ibm.com/developerworks/library/x-databdopt/
XML Data Binding with Castor - http://www.onjava.com/pub/a/onjava/2001/10/24/xmldatabind.html
XML-Java Data Binding Using XMLBeans - http://www.onjava.com/pub/a/onjava/2004/07/28/XMLBeans.html
What tool for xml binding - http://www.theserverside.com/news/thread.tss?thread_id=30658
XML-C++ Bindings
XML - C++ Data BindingXML Data binding with gSoap - http://www.genivia.com/Products/gsoap/features.html
XML Data binding with LMX - http://tech-know-ware.com/lmx/