Clipboard API Improvements

The Clipboard API provides a mechanism for websites to support accessing the system pasteboard (pasteboard is the macOS and iOS counterpart to clipboard on Windows and Linux). Copy and paste is one of the most basic interactions in modern operating systems. We use it for all sorts of purposes, from copying a hyperlink on one website to another, to copying a blog post typed in a native word processing application to a blog platform on the web. For this reason, creating a compelling productivity application such as a word processor and a presentation application on the Web requires interacting with the system pasteboard just as much as other native applications.

Over the last couple of months, we have added support for new API for better interoperability with other browsers, and refined our implementations to allow more use cases in macOS and iOS ports of WebKit. These changes are available for you review in Safari 11.1 and iOS 11.3 beta programs.

First, we modernized our DataTransfer API. We added support for items, and fixed many bugs on macOS and iOS. Because most websites don’t support uploading TIFF files, WebKit now automatically converts TIFF images to PNG images and exposes PNG images as files when there are images in the system pasteboard.

Directory Upload

In r221177, we added support for uploading directories via DataTransfer.webkitGetAsEntry() and input.webkitdirectory to be interoperable with other browsers such as Chrome, Firefox, and Edge which had already implemented this WebKit-prefixed feature. This new API allows users to upload a whole directory onto Cloud storage and file sharing services such as iCloud and Dropbox. On iOS, directory upload is supported when dragging folders from the Files app and dropping into web pages.

Custom MIME Types

Because the system pasteboard is used by other native applications, there are serious security and privacy implications when exposing data to web content through the clipboard API. If a website could insert arbitrary content into the system pasteboard, the website can exploit security bugs in any native application which reads the pasteboard content — for instance, a utility application which shows the content put into the pasteboard. Similarly, if a website could read the system pasteboard at any given point in time, it can potentially steal sensitive information such as user’s real full name and mailing addresses that the user was copying.

For this reason, we previously didn’t allow reading of anything but plain text and URL in DataTransfer objects. We relaxed this restriction in r222595 by allowing reading and writing of arbitrary MIME types between web pages of the same origin. This change allows web applications from a single origin to seamlessly share information using their own MIME types and MIME types we don’t support, while still hiding privacy and security sensitive information other native applications may put into the system pasteboard. Because custom MIME types used by websites are bundled together under a special MIME type that WebKit controls, web pages can’t place malicious payloads of arbitrary MIME types in the system pasteboard to exploit bugs in native applications.

Getting and Setting Data

Apart from custom MIME types, web applications may now write text/html, text/plain and text/uri-list to the system pasteboard using DataTransfer.setData or DataTransfer.items.add during a copy or dragstart event. This content is written with the appropriate UTI for macOS and iOS, so pasting into native applications that are already capable of pasting HTML markup, plain text strings, or URLs will work as expected.

On the reading side, web applications may now also use DataTransfer.getData and DataTransfer.items during a paste and drop event to read text/html, text/plain and text/uri-list data from the system pasteboard. If any files were written to the pasteboard — for example, when copying a PDF file in Finder — this information will be accessible through DataTransfer.files and DataTransfer.items; for backwards compatibility, the “Files” type will also be added to the list of types in DataTransfer.types to indicate that file data may be requested by the page.

An important caveat is that native applications may write file paths to the pasteboard as URLs or plain text while copying files. This may cause users to unknowingly expose file paths to the home directory and private containers of native applications. Thus, WebKit implements heuristics to suppress access to this data via DataTransfer API in such cases. If the pasteboard contains at least one file and text/uri-list is requested, the scheme of the URL must be http, https, data, or blob in order for WebKit to expose it to the page. Other schemes, such as file or ftp, will result in an empty string. Likewise, requests for text/plain will return the empty string when there are files on the pasteboard.

Reading and Writing HTML Content

Among other MIME types, HTML content is most pervasive on the web. Unfortunately, letting arbitrary websites write HTML content into the system pasteboard is problematic because HTML can contain script tags and event handlers which can end up executing malicious scripts in the application reading the content. Letting websites read arbitrary HTML content in the system pasteboard is also problematic because some word processor and spreadsheet applications put privacy sensitive information such as local file paths and user information into the HTML placed in the system pasteboard. For example, if an user typed 12345 into an unsaved spreadsheet, and copied & pasted into a random website, the website might be able to learn user’s local home directory path if we were to expose the raw HTML content placed in the pasteboard by other native applications. For this reason, we previously didn’t allow reading or writing of HTML content via DataTransfer objects. Instead, websites had to wait for WebKit’s native editing code to paste the content and process it afterwards.

In r223440, we introduced a mechanism to sanitize HTML read from and written to the system pasteboard, allowing us to lift this restriction. When the website tries to write HTML to the pasteboard, we paste the HTML into a dummy document, re-serialize it to HTML, and then write the re-serialized HTML into the system pasteboard. This process ensures any script elements, event handlers, and other potentially dangerous content will be stripped away. We also package all the necessary sub-resources in the HTML such as images into WebArchive so that native applications which reads the pasteboard content doesn’t have to re-fetch those resources upon paste. Similarly, when a website tries to read the HTML content placed by other native applications, we run through the same steps of pasting the content into a dummy document and re-serializing HTML, stripping away any private information the user didn’t intend to include in the pasted content. Sanitization also happens when HTML content is copied and pasted across different origins but not within web pages of the same origin. As a result, websites can write arbitrary HTML contents via clipboard API and read the exact same content back later within a single origin.

Pasting HTML Content with Images

We also made a major change in the way we handle local files included in the pasted HTML content. Previously, sub-resources (such as image files in pasted content) used URLs of the form webkit-fake-url://<filename> where <filename> is the filename of the sub-resource. Because this is not a standard protocol the website can access, the pasted images’ data were inaccessible to websites. Even though WebKit is capable of loading these images, there was no way for websites to save the images either to their service or into browser’s storage API. r223440 replaces these fake URLs with blob URLs so that the website can save the images. We also use blob URLs instead of fake URLs when pasting RTFD content since r222839.

This change provides a mechanism for Web applications to save images included in pasted content using the Blob API. For example, an online e-mail editor now has the capability to save images that a user copied and pasted from TextEdit or Microsoft Word on iOS and macOS. We’re pleased to be the first browser to provide this powerful platform integration capability to Web developers.

Conclusion

We’re excited to empower productivity apps on the Web to more seamlessly integrate with native applications on macOS and iOS via the updated clipboard API. We’d also like to give special thanks to the developers of TinyMCE who have tirelessly worked with us to resolve many bugs involving copy and paste from Microsoft Word to high profile websites which use TinyMCE.