What is a PDF file?

Because PDF files are so pervasive many people tend to take the rise of PDF files for granted. PDF is an acronym for Portable Document Format, which is a technology that was created and introduced by Adobe Systems in 1993. However, the ideology behind creating this technology was outlined in 1991 by John Warnock, co-founder and former CEO of Adobe Systems, in his seminal paper called “The Camelot Project”. In this paper, Warnock states that: “this project’s goal is to solve a fundamental problem that confronts today’s companies. The problem is concerned with our ability to communicate visual material between different computer applications and systems … there is no universal way to communicate and view this printed information electronically”. At the time this technology was referred to as IPS, which is an acronym for Interchange Postscript.

Despite being developed as a standard for users to share documents on all operating systems it took time for PDF to be adopted among the computer user community; especially since early versions were not widely distributed and PDF files, initially, offered no support for external hyperlinks within the document thus limiting its utility. Furthermore because of the larger size of PDF documents, in comparison to other plain text documents, downloading them was not as practical since modems were the most common way to access the Internet during those days and a 56.6k modem could only download 5 kilobytes per second. In addition, rendering any documents into PDF files was a sluggish process due to the limited processing power of some computers.  Another crucial reason for the slow adaptation of PDF technology was due to the price tag Adobe placed on its products. However, with the release of Adobe’s version 2.0 it began to distribute its Acrobat Reader, now known as Adobe Reader, free to computer users while continuing its support for the original PDF. This move helped solidify PDF as the standard for

What is a PDF exactly?

PDF files are intended to enable users to preserve the integrity of the original document. Say for example that you work in the publishing industry or need to see the exact presentation of the document in question, obviously a PDF is handy tool. Before the PDF technology numerous other page description languages existed such as PostScript. Postscript was popular in the print community but was not suitable for onscreen use. By utilizing PostScript technology along with some additional changes the structured language PDF was created. Essentially what the PDF does is that it captures all the elements of the document in question and stores it as electronic image that is viewable on computers and now on many electronic devices, allowing for the option of a fixed layout. PDF files can be printed and sent to others and because they are compact they are ideal for such purposes. As a file format PDF files are universally compatible since they work on all computers and platforms. To use or open a PDF file one simply needs access to a PDF viewer, such as the Adobe Reader mentioned above. PDF viewers are compatible with all computers and platforms, allowing users to view or use PDF documents whether they are offline or online. Since PDF has become such an ubiquitous technology most web browsers have embedded PDF readers thus allowing the user to view PDF files online.

Functionality and Competition

Despite being the most commonly known and used way for document transmission, PDF has had numerous competitors such as DjVu, the now defunct Envoy and even its sibling PostScript. However, before comparing the PDF file format technology to others, it is helpful to list some of the advantages of the PDF file format. PDF files allow for random access and linearization of files, which basically means that any object, whether it is a page or graphic within a file is viewable at will in constant time as opposed to PostScript. PDF files also allow for embedded fonts, thus all fonts should be rendered correctly no matter which fonts are installed on the computer being used to view the PDF file. PDF files also allow for the option of searchable text, due to OCR text layering, thus saving time if one wishes to locate a certain passage in a book or if one wants to copy paste another section. Despite the popularity of PDF files some computer users prefer the file format alternative DjVu, which is a pure raster file format, whereas PDF can contain both raster and vector graphics. DjVu was developed by AT&T Labs in 1996 in order distribute high-resolution images of various digital documents; everything from photographs to newspapers, magazines and books. Nonetheless, PDF has never relinquished its dominant position since its intial release in 1993. In 1996, version 1.1 was released which allowed for an encryption option(40 bit) and hyperlinks. In the same year version 1.2 was also released which allowed for Unicode support. Version 1.3 which was released four years later allowed for prepress support, digital signatures, embedded files and annotations. Later editions allowed for better encryption options as well as OpenType fonts; and in 2008 the PDF file format technology was released as open standard. Besides using Adobe Reader to read pdf files one can also use the pre-installed MacOs application Preview. For Unix there is Xpdf as well as command line options for all operating systems, including pdftk which is a free, open source command-line tool enabling users to build their own pdf files. As of 2014, PDF is still the world’s preferred electronic document format.


How did the JPF file become the de facto standard?

JPEG History

Before dwelling into the history of the file compression technology known as JPEG, which is also known as “jpg” due to its extension ending “.jpg”, it is helpful to first decode the acronym itself. JPEG stands for Joint Photographic Experts Group, which denotes the name of the committee that created the standard. The experts in question are actually ISO and IEC. ISO is the International Organization for Standardization, a worldwide federation consisting of around 100 countries. However, ISO is technically not an acronym since ISO apparently is a reference to the Greek word isos, denoting equal. The second group of experts is the International Electrotechnical Commission, which is an international assessment body pertaining to all fields of electro technology.

The JPEG file format’s function
The function of JPEG was to compress images, mainly those of natural or real world scenes (those that do not have sharp changes) and to reduce file sizes of images in order to facilitate their storage and transmission. By creating the JPEG file format an international digital image compression standard for continuous-tone or multilevel still images for color and gray scale was established. To clarify matters it is also important to realize that JPEG is called a “lossy” file form, which means that the decompressed image in the JPEG file format is a different “version” of the original file because it is compressed since there is a loss of quality.  In comparison PNG and TIFF are file formats that are lossless.

Technological necessity
The typical compression ratio in JPEG is around 1:10, thus making it an ideal format for decreasing the size of images. In addition it is important to note that JPEG images are full color images that are capable of storing 24 bits per pixel as well using around 16.7 million colors. Younger readers might also want to keep in mind that during the early years of the Internet, dialup modems were not exactly designed for graphic heavy websites especially since it took ten seconds to download a 50 kilobyte image with a 56.6k modem, whereas on a 14.4k modem it took 50 seconds; hence a file format as JPEG was invaluable especially since the floppy disk format was the storage standard during the time of JPEG’s introduction. Nowadays with broadband connections and high capacity storage media this might seem like less of an issue, yet JPEG is also still quite useful due its ubiquity in digital cameras, since you can shoot in either RAW file format or JPEG for more storage capacity or in some instances both. Furthermore, the JPEG file format is widely used on the Internet and in a myriad of image editing software. In fact the popularity of the format is due to the fact that in editing software one can specify the quality setting and the how compressed the image should be. The downside is that by compressing too much in JPEG there might be unwanted artifacts that show up on image if it is printed. These artifacts can include block coding and ringing. Therefore, JPEG has always been quite convenient for hobby photographers and for people that wish to share images quickly on the Internet, especially since images displayed on a screen have lower resolution in comparison to printed ones thus making it difficult to discern the loss of quality. For this reason JPEG has often been viewed as inferior file format in some professional circles since compressed information is lost, thus leaving less opportunity for editing than other lossless file formats such as RAW or TIFF, whereas the latter can be lossless or lossy.

JPEG as the standard
However despite being a lossy format, JPEG has managed to achiever greater compression results than other lossless image compression algorithms, a notable example is the GIF file format that uses 8 bits per pixel. This is achieved by exploiting the known visual limitations of human visual perception since small color changes are not as noticeable as small changes in brightness. In other words an algorithm is employed that examines the color and brightness within the image and if there are numerous areas within the image that have uniform color, the algorithm can compress these areas since many adjacent pixels possess fairly similar information. Thus, the JPEG can exploit our perception bias and compress images while the image still retains his natural quality to our eyes despite the loss of information.

A caveat must be added though to contextualize further and explain the new file standard JPEG2000. A few years after the JPEG committee introduced the JPEG file format they also introduced a JPEG lossless standard that was never widely adopted.  However, JPEG2000 is a new standard that was introduced to succeed the JPEG file format. The main differences is that JPEG2000 has superior compression technology, for example artifacts are less visible and there is the option of having lossless compression or lossy compression; furthermore JPEG 2000 is based on discrete wavelet transformation which is a superior algorithm compared to the JPEG algorithm which is based on discrete cosine transformation or DC. Moreover, JPEG2000 has not caught on since it lacks wide browser support and despite the additional higher performance digital camera manufacturers and desktop software companies have not had an incentive to switch to JPEG2000, thus JPEG’s status as a popular file format is secure for the time being.

What is a JPG file?

JPG File Format

JPG files are a common file format used for digital photos and various other digital graphics. JPG is derived from ‘Joint Photographic Experts Group’, which is the name of the committee that developed this particular file type. Thanks to its effective compression technique, JPG is the most widely used file format for photos and other graphics used on websites.

The file extensions that are typically used for this format are .JPG, .JPG, .JPE or .JFIF, although .JPG is the most commonly used format on all platforms.

One of the reasons behind the huge popularity of the JPG image file format is that it offers an amazingly effective compression technique for color images. Unlike some of the other popularly used file types, which show considerable loss in photo image quality with the slightest reduction of file size, JPG allows for a significant degree of file size reduction without too much loss of image quality. Using JPG, images can be compressed to almost about 5% of their original size, which is impossible in most other formats.

As file sizes start to get very low however, JPG images will become blurry. This is called “lossy” compression, which essentially refers to the loss of image quality as file size decreases. In lossy compression, the smaller sizes are essentially achieved by removing increasing amounts of color information from the pixels. The result is a noticeable pattern, known as JPG artifacts that become more visible as the image quality settings are reduced.

When saving photos and other images as JPG files for the web, email and any other purpose, you will have to make a call on the tradeoff between file size and image quality. The good news is the degree of compression can be adjusted to a certain extent, allowing users a selectable tradeoff between image quality and storage size. At the highest quality setting of 12, which is minimum compression, the loss of image detail can be virtually indiscernible but as the quality slides down the scale the loss of detail soon becomes increasingly more noticeable. An image with a JPG compression quality of 6 will start to show a sub pattern of blocks of pixels. An image with a JPG quality 3 will have more noticeable blocking of pixels. When the JPG quality gets as low as 0 the pixels will begin to resemble the patter of a parque floor.

While the JPG format is commonly used for storing and transmitting pictures and images on the World Wide Web, it is not the best choice of format for line drawings or any other textual graphics. This is because its compression technique does not perform as well on these types of images. When compressing such images, the sharp contrasts between adjoining pixels can cause distinctive, easily visible artifacts, which reduce image clarity. For saving these types of images, it is advisable to use one of the lossless graphics format.

Another scenario when JPG should not be used is where the exact reproduction of the data is important, such as when working on certain technical image processing work or any scientific or medical imaging applications. The lossy compression could jeopardize these kinds of applications with their compromised image quality.

JPG is also not well suited to files that need to go through several edits before they will be used. Every time the image is decompressed and recompressed, some degree of image quality will usually be lost. This can be particularly prominent if the image is shifted or cropped, or if encoding parameters are changed. To prevent this from happening, when you receive any JPG file that is being modified or is likely to be modified in the future, the first to do is to save it in a lossless format such as TIFF with no compression and export a copy as JPG for editing.

Advantages of JPG

The JPG file format has been around for a long time and is almost universal today, which means JPG files can be opened and viewed in almost all image viewing applications.

JPG is compatible with all printers, so you can print files directly from the viewing application without the need to make any changes to its format.

The JPG format is also compatible with almost all photo editing software, though the files often need to be saved to another format to save the alterations.

JPG is often set as the default file format for digital cameras to enable them to take pictures quickly

Cameras and other devices store JPG images very quickly. This feature makes it possible for you to capture fast moving action clearly with a JPG image.

JPG files are compressed, which means a JPG image will be smaller than pictures taken in another format, making them easier to store and more convenient to email.

All of these advantages have made JPG one of the preferred default file formats for images used on the internet.

