Journal of Theoretical & Computational Science

Journal of Theoretical & Computational Science
Open Access

ISSN: 2376-130X

Research Article - (2015) Volume 2, Issue 3

XML Files Types and Their Differences and Denominators with Plain TXT File

Research Center, 420-Oman-Muscat, Oman

Abstract

Easy and simplicity of transformation, sharing and storing files among programs, platforms and machines. From these formats are XML formats and plain and flat TXT file formats. Complex formats of the files will cause difficulty in treating with, for instance files transformation while the simple files could treat with them with insurance of no need for compatibility requirements and no virus effect. The method of this research are content size and comparison between XML file format and TXT flat file. The XML file formats are very simple to treat in structure in such as transforming and in converting among various applications. This study got that XMl files are structured programmed content and took some Kilobytes size, such as XML Spreadsheets formats which are taken for a text and a picture content just 3KB, whilst in XML PowerPoint Presentation took 97KB. TXT plain txt file is simpler than XML file which are both text based content because that, the same content after converting to TXT format took just 1KB. The plain text file format is fairly not organized and unstructured text-based content except the head section TXT file.

<

Keywords: Simplicity; XML; Flat TXT File; Transformation; Size;Structure

Introduction

XML stands for an Extensible Markup Language (XML) file format used to create, share and transform information formats and share both the format and the data on the World Wide Web (internet) by use standard ASCII text.

Usage and the aims of XML are to simplify, generalize the data across the internet. It is a textual data with strong support by coding the data for different human languages. It widely used for the representation of data such as those used in web services.

XML is similar to Hypertext Markup language (HTML). Both XML and HTML contain markup symbols to describe the contents of a page or file.

The identification of XML based on: Norman Walsh (1998), said: that, it is a markup language for documents containing structured information. This structured information contains the words and the pictures. In the content of XML document the section heading has a different meaning from content in a footnote, which means something different than content in a figure caption or content in a database table. Also he said the all documents have some structure [1].

XML files sample

Tovee

Jani

Reminder

 

Don’t forget me this weekend!

 

XML file formats transformation with office programs

In the 1990s and after advent of XML, based on Frank Rice (2006), corporation customers realize the valuable of open formats files and basic standardization in computer-regard products and applications.

As example in Microsoft Corporation the IT specialists get benefit from the files formats are possible open with XML format, because of two reasons: first the capacity and files size and second is the easily to read by many applications, and many operating system as well as internet browsers.

Historically transformation transition begun with developers to see the need to transform from binary files format to XML format. Office programs are able to do many functions but can face such challenges included transporting and easily moving data between Operating Systems and various applications.

Frank Rice also mentioned that, MS 2007 in Microsoft Corporation has the ability to transform such as: Doc, Ppt and Xls to XML files format. The new file format called Office Open XML formats. With Open office XML format can be accessed and supported with any application, Frank Rice [2].

TXT file format

TXT file identification is a file extension for a text file, used by a variety of text editors like Note. Text is a human-readable sequence of the characters and the words they form that can be encoded into computer-readable format. There is no standard definition if a text file, though there are several common formats, including ASCCII and ANSI.

According to Varun Agarwal (2008), Flat file is generally a delimited file; with some specific delimiter such as semi-colon, comma. It shell may specify the delimiter in map properties tab in Syndicator. Whereas in case of XML file should need to specify the schema which will define the structure of XML file [3].

TXT files content sample

===Plugin Name ===

Contributors: (this should be a list of wordpress.org user ID’s)

Donate link: http//example.com

Tags: comments, spam

Requires at least: 3.0.1

Tested up to: 3.4

Stable tag: 4.3

License: GPL v2 or later

License URL: http://www.gnu.org/license/gpl-2.0.html

Here is a short description of the plugin. This should be no more than 150 characters. No markup here.

Description

This is the long description. No limit and can use Markdown (as well as in the following sections).

For backward compatibility, if this section is missing, the full length of the short description will be used, and markdown parsed.

A few notes about the sections above:

“Contributors” is a comma separated list of wp.org/wp-plugin.org username

“Tags” is a comma separated list of tags that apply to the plugin

“Requires at least is the lowest version that the plugin will work on

“Tested up to” is the highest version that you’ve successfully used to test the plugin*. Not that it might work on higher version… this just the highest one you’ve verified.

Stable tag should indicate the subversion “tag” of the latest stable version, or “trunk,” if you use /trunk/ for stable.

Application case of XML files use

Analyzing data in the DrugBank aspect is included detailed drugs like chemical and pharmacological compounds by using XML documents. Each drug described with than 150 data fields. There is an application name KNIME, the XML files loaded into it. The XML processing nodes extract several information from XML file like compound name and compound description by using XPath node. The row filter is responsible to extract the information from approved drugs. In the Meta node in the XML file may extract further textual features like the organisms and manufacturer, which are including other free text fields like description and pharmacology. These are analyzed in text mining part of the XML document, Koetter, 2012 [4].

Results

Content size

The XML files of Word document, Excel book and Power Point presentation contain the same content, which is text identification of a computer and a figure of the computer as appear in the below Figure 1.

theoretical-computational-science-programmable-machine

Figure 1: A computer generally means programmable machine. The two principal characteristics of a computer are: it responds to a specific set of instructions in a well-defined manner and it can execute a prerecorded list of instructions (a program).

Converting the Office programs to XML files appeared various sizes like in the Table 1.

XML Spreadsheet XML word document XML Power point presentation
3KB 57KB 97K

Table 1: Converting the Office programs to XML files appeared various sizes like in this table.

Size of XML word document and TXT plain text file

The same content in XML word and TXT plain text size will be like the Table 2.

TXT file XML word document
1KB 57KB

Table 2: The same content in XML word and TXT plain text size will be like the table below.

TXT and word XML document

TXT file:

When convert and open XML word document with same content which is containing text and one picture, all the file appears just text file like the below Figure 2.

theoretical-computational-science-word-document

Figure 2: Conversion of open XML word document.

XML word document

When converting the same file that containing text and one picture as XML file, the document will appear in structured and organized XML file like the below Figure 3.

theoretical-computational-science-organized-XML-file

Figure 3: Structured and organized XML file.

XML document file in note:

Opening XML file document by the Note will change the structure of organized and hierarchical structure of the XML file and appear like flat TXT file Figure 4.

theoretical-computational-science-appear-like-flat

Figure 4: XML file and appear like flat TXT file.

Discussion

XML files used for analyzing data in the DrugBank aspect which is included XML data with XML formats. These uses are like, medicine description, names of medicine and chemical and pharmacological compounds by using XML documents [4].

In the Stackoverflow.com website, Pascal Martin (2015) in his article titled: «XML file and text file» said that, XML file is structured document and standard validation consists of DTD, schema and standardized way to parsing the structure of XML. In the other hand text file is easier to write which is no need to respect and write any structured tag and that text file not necessary to have a well-defined structure [5].

Also Jacek Konieczny (2015) about the same article in Stackoverflow.com commented that the XML file is a text file, but with a well-structured framework which provides ways to respect complicated data structures.

There are attributes that XML file has like verbosity and higher processing costs. The XML file is good choice when use different applications or systems to being data exchanged, while these data is preferred to be well-structured [6].

Coombs, James and Allen (1987), said that files which contain markup or other meta-data can consider them as flat or plain text while remain text readable [6].

The flat or the plain text file is use to be recovered and to be saved for viruses trouble and loss of data. To save it in an immune disk in computer or any media and save it in case of incompatibility. According to Unicode Standard that the plain text is a pure sequence of character codes. Styled text, also known as rich text, is any text representation contains plain text with font size, color and hyperlink [7].

Vangie Beal (2015), also said about text (TXT) file that it is in each byte of the text represents one character based on ASCII Standard (American Standard Code for Information Interchange) code. He mentioned that the files that are formatted with word processor to save the formats must be as binary file to be stored.

The ASCII files are textual information or TXT data. The plain text files are supported by many applications on many machines. Because the TXT file does not include any commands in its content.

Similarity factors between TXT file and XML file

It can be found that there are three factors similar in XML file and TXT:

XML file is organized and neat document while txt file is fairly organized file, somewhat TXT file neat because some statement in the plain text file and its content are organized with some commands like XML an organized textual file.

XML and TXT files are easy storing and transporting among applications and platforms, both XML and TXT files are easily to transport between platform to another platform, as well as easy storing in any store device.

The XML files is organized and in neat structure and commands while the flat plain text file is not whole organized. When open XML file with Note that appears as unorganized structure like TXT file. In the meanwhile a plain text file appears not organized and not neat except the head statement.

In 2015, Paul Murrell has spoken about: “Plain Text” there is one way that is the easiest way to save data and information, which is save it as plain text file, which everything in the TXT file are stored as plain text format. He mentioned that text file format is denominator to store many file formats in many applications. He said also in his article that based on database that a data are stored as simple or plain text even the numbers saved in the flat file as plain text [8].

The file that contains computer identification text and figure of the computer, when transform it to the Word XML file, Excel XML file and Power point XML file can be found that the size differ from each one which Excel file took the lowest size 3KB, while Power point file has taken 97 KB as near multimedia file bigger than other office programs.

In contrast Word XML file compared with text (TXT) file with the same text and same figure of computer can be seen the file in the TXT taken 1 KB size. 1KB text file indicates that it is too easy and too simple file which gives advantage for the plain TXT file to storing and to transporting easily. The XML Word file taken 57KB which is also bigger much than TXT file. This size of TXT file is too small with same file text and pictures.

Conclusion

XML file is easy and simple file could by it store many files and transport them through applications and through platforms. From these programs are Office programs. Many application can open by XML file which is advantage for XML file meanwhile the TXT file is just unstructured plain text file.

The same file which contains text and pictures can pen through many applications like office programs (Word, Excel and Power Point). This file took 3 KB open as Excel file when opening as XML file which the lowest file among Office programs whilst the bigger application was Power point took 97KB size of the file, which explain the Power Point application as Multimedia application. The XML file is structured and organizes file and commands and in contrast flat and plaintext file is not fairly organized and not fairly structured. That appear when comparing and opening XML file by the Notepad it appeared without any neat and structured document. XML file is simple and does not take a huge amount of data but plain text file is simpler and more ease than XML.

Both files XML and TXT files text based lines and both can store and transport easily as well as XML files are in structured and commands document. The TXT is not exactly organized except header of TXT plain text file is structured and some commands appear in neat shape.

Paul Murrell has defined from plain text file attributes that, every things saved as TXT with flat text even numbers and any other values.

Like what appear in case of opening the text and figure as XML Spreadsheet, it taken just 3KB, 75 KB opened as XML Word document and taken also just 1KB text (TXT) file.

References

Citation: AlHashami Z (2015) XML Files Types and Their Differences and Denominators with Plain TXT File. J Theor Comput Sci 2:130.

Copyright: © 2015 AlHashami Z, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Top