Resource Digests for HTTP
draft-polli-resource-digests-http-latest

Abstract

This document defines the Digest and Want-Digest header fields for HTTP, thus allowing client and server to negotiate an integrity checksum of the exchanged resource representation.

This document obsoletes [RFC3230]. It replaces the term “instance” with “representation”, which makes it consistent with the HTTP Semantic and Context defined in [RFC7231].

Note to Readers

RFC EDITOR: please remove this section before publication

Discussion of this draft takes place on the HTTP working group mailing list (ietf-http-wg@w3.org), which is archived at https://lists.w3.org/Archives/Public/ietf-http-wg/.

The source code and issues list for this draft can be found at https://github.com/ioggstream/draft-polli-resource-digests-http.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on November 10, 2019.

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

1.1. Brief history of integrity headers
1.2. This proposal
1.3. Goals
1.4. Notational Conventions

2. Resource representation and representation-data
3. Digest Algorithm values

3.1. Representation digest

3.1.1. digest-algorithm encoding examples

4. Header Specifications

4.1. Want-Digest
4.2. Digest

5. Deprecate Negotiation of Content-MD5
6. Examples

6.1. Unsolicited Digest response

6.1.1. Representation data is fully contained in the payload
6.1.2. Representation data is not contained in the payload
6.1.3. Representation data is partially contained in the payload i.e. range request
6.1.4. Digest in both Request and Response. Returned value depends on representation metadata

6.2. Want-Digest solicited digest responses

6.2.1. Client request data is fully contained in the payload
6.2.2. A client requests an unsupported Digest, the server MAY reply with an unsupported digest
6.2.3. A client requests an unsupported Digest, the server MAY reply with a 400

7. Security Considerations

7.1. Usage in signatures
7.2. Message Truncation
7.3. Algorithm Agility

8. IANA Considerations

8.1. The “id-sha-256” Digest Algorithm
8.2. The “id-sha-512” Digest Algorithm
8.3. Want-Digest Header Field Registration
8.4. Digest Header Field Registration

9. References

9.1. Normative References
9.2. Informative References

Appendix A. Acknowledgements
Appendix B. FAQ
Authors' Addresses

1. Introduction

Integrity protection for HTTP content is typically achieved via TCP or HTTPS [RFC2818]. However, additional integrity protection might be desirable for some use cases. This might be for additional protection against failures or attack (see [SRI]), programming errors, corruption of stored data or because content needs to remain unmodified throughout multiple HTTPS-protected exchanges.

1.1. Brief history of integrity headers

The Content-MD5 header field was originally introduced to provide integrity, but HTTP/1.1 [RFC7231] in appendix-B obsoleted it:

The Content-MD5 header field has been removed because it was inconsistently implemented with respect to partial responses.

[RFC3230] provided a more flexible solution introducing the concept of “instance”, and the headers Digest and Want-Digest.

1.2. This proposal

The concept of selected representation defined in [RFC7231] made [RFC3230] definitions inconsistent with the current standard. A refresh was then required.

This document updates the Digest and Want-Digest header field definitions to align with [RFC7231] concepts.

This approach can be easily adapted to use-cases where the transferred data does require some sort of manipulation to be considered a representation or conveys a partial representation of a resource (eg. Range Requests).

Changes are semantically compatible with existing implementations and better cover both the request and response cases.

The value of Digest is calculated on selected representation, which is tied to the value contained in any Content-Encoding or Content-Type header fields. Therefore, a given resource may have multiple different digest values.

To allow both parties to exchange a Digest of a representation with no content codings two more algorithms are added (id-sha-256 and id-sha-512).

1.3. Goals

The goals of this proposal are:

Digest coverage for either the resource’s representation data or selected representation data communicated via HTTP.
Support for multiple digest algorithms.
Negotiation of the use of digests.

The goals do not include:

Header integrity:: The digest mechanisms described here cover only representation and selected representation data, and do not protect the integrity of associated representation metadata headers or other message headers.
Authentication:: The digest mechanisms described here are not meant to support authentication of the source of a digest or of a message or anything else. These mechanisms, therefore, are not a sufficient defense against many kinds of malicious attacks.
Privacy:: Digest mechanisms do not provide message privacy.
Authorization:: The digest mechanisms described here are not meant to support authorization or other kinds of access controls.

1.4. Notational Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 ([RFC2119] and [RFC8174]) when, and only when, they appear in all capitals, as shown here.

The definitions “representation”, “selected representation”, “representation data”, “representation metadata” and “payload body” in this document are to be interpreted as described in [RFC7230] and [RFC7231].

2. Resource representation and representation-data

To avoid inconsistencies, an integrity mechanism for http messages should decouple the checksum calculation:

from the payload body - which may be altered by mechanism like Range Requests or the method (eg. HEAD);
and from the message body - which depends on Transfer-Encoding and whatever tranformations the intermediaries may apply.

The following examples shows how representation metadata, payload tranformations and method impacts on the message and payload body.

Here is a gzip-compressed json object

# gzip.compress(json.dumps(  {"a": "1"*100}  ).encode()))

Request:

    PUT /entries/1234 HTTP/1.1
    Content-Type: application/json
    Content-Encoding: gzip

    H4sIAItWyFwC/6tWSlSyUlAypANQqgUAREcqfG0AAAA=

Now the same payload body conveys a malformed json object.

Request:

    PUT /entries/1234 HTTP/1.1
    Content-Type: application/json

    H4sIAItWyFwC/6tWSlSyUlAypANQqgUAREcqfG0AAAA=

A Range-Request alters the payload body, conveying a partial representation.

Request:

    GET /entries/1234 HTTP/1.1
    Range: bytes=1-7

Response:

    HTTP/1.1 206 Partial Content
    Content-Encoding: gzip
    Content-Type: application/json
    Content-Range: bytes=1-7

    iwgAla3RXA==

Now the method too alters the payload body.

Request:

    HEAD /entries/1234 HTTP/1.1
    Accept: application/json
    Accept-Encoding: gzip

Response:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Content-Encoding: gzip

3. Digest Algorithm values

Digest algorithm values are used to indicate a specific digest computation. For some algorithms, one or more parameters may be supplied.

      digest-algorithm = token

The BNF for “parameter” is as is used in [RFC7230]. All digest- algorithm values are case-insensitive.

The Internet Assigned Numbers Authority (IANA) acts as a registry for digest-algorithm values. The registry contains the following tokens.

NB: This RFC updates [RFC5843] which is still delegated for all algorithms updates

SHA-256:: The SHA-256 algorithm [FIPS180-3]. The output of this algorithm is encoded using the base64 encoding [RFC4648]. Reference: [FIPS180-3], [RFC4648], this document.
SHA-512:: The SHA-512 algorithm [FIPS180-3]. The output of this algorithm is encoded using the base64 encoding [RFC4648]. Reference: [FIPS180-3], [RFC4648], this document.
MD5:: The MD5 algorithm, as specified in [RFC1321]. The output of this algorithm is encoded using the base64 encoding [RFC4648].
SHA:: The SHA-1 algorithm [FIPS180-1]. The output of this algorithm is encoded using the base64 encoding [RFC4648].
UNIXsum:: The algorithm computed by the UNIX “sum” command, as defined by the Single UNIX Specification, Version 2 [UNIX]. The output of this algorithm is an ASCII decimal-digit string representing the 16-bit checksum, which is the first word of the output of the UNIX “sum” command.
UNIXcksum:: The algorithm computed by the UNIX “cksum” command, as defined by the Single UNIX Specification, Version 2 [UNIX]. The output of this algorithm is an ASCII digit string representing the 32-bit CRC, which is the first word of the output of the UNIX “cksum” command.

To allow sender and recipient to provide a checksum which is independent from the Content-Coding, the following additional algorithms are defined:

id-sha-512:: The sha-512 digest of the representation-data of the resource when no content coding is applied (eg. Content-Encoding: identity)
id-sha-256:: The sha-256 digest of the representation-data of the resource when no content coding is applied (eg. Content-Encoding: identity)

If other digest-algorithm values are defined, the associated encoding MUST either be represented as a quoted string, or MUST NOT include “;” or “,” in the character sets used for the encoding.

3.1. Representation digest

A representation digest is the value of the output of a digest algorithm, together with an indication of the algorithm used (and any parameters).

    representation-data-digest = digest-algorithm "="
                            <encoded digest output>

As explained in {#resource-representations} the digest is computed on the entire selected representation data of the resource defined in [RFC7231]:

  representation-data := Content-Encoding( Content-Type( bits ) )

The encoded digest output uses the encoding format defined for the specific digest-algorithm.

3.1.1. digest-algorithm encoding examples

The sha-256 digest-algorithm uses base64 encoding

sha-256=......

The “UNIXsum” digest-algorithm uses ASCII string of decimal digits.

UNIXsum=30637

4. Header Specifications

The following headers are defined

4.1. Want-Digest

The Want-Digest message header field indicates the sender’s desire to receive a representation digest on messages associated with the Request- URI and representation metadata.

Want-Digest = "Want-Digest" ":"
                 #(digest-algorithm [ ";" "q" "=" qvalue])

If a digest-algorithm is not accompanied by a qvalue, it is treated as if its associated qvalue were 1.0.

The sender is willing to accept a digest-algorithm if and only if it is listed in a Want-Digest header field of a message, and its qvalue is non-zero.

If multiple acceptable digest-algorithm values are given, the sender’s preferred digest-algorithm is the one (or ones) with the highest qvalue.

Examples:

Want-Digest: sha-256 Want-Digest: SHA-256;q=0.3, sha;q=1

4.2. Digest

The Digest header field provides a digest of the representation data

      Digest = "Digest" ":" #(representation-data-digest)

Representation data might be:

fully contained in the message body,
partially-contained in the message body,
or not at all contained in the message body.

The resource is specified by the effective Request-URI and any cache-validator contained in the message.

For example, in a response to a HEAD request, the digest is calculated using the representation data that would have been enclosed in the payload body if the same request had been a GET.

Digest can be used in requests too. Returned value depends on the representation metadata headers.

A Digest header field MAY contain multiple representation-data-digest values. This could be useful for responses expected to reside in caches shared by users with different browsers, for example.

A recipient MAY ignore any or all of the representation-data-digests in a Digest header field.

A sender MAY send a representation-data-digest using a digest-algorithm without knowing whether the recipient supports the digest-algorithm, or even knowing that the recipient will ignore it.

…

5. Deprecate Negotiation of Content-MD5

This RFC deprecates the negotiation of Content-MD5 as this header has been obsoleted by [RFC7231]

The MD5 algorithm is NOT RECOMMENDED as it’s now vulnerable to collision attacks [CMU-836068]

6. Examples

6.1. Unsolicited Digest response

6.1.1. Representation data is fully contained in the payload

Request:

  GET /items/123

Response:

  HTTP/1.1 200 Ok
  Content-Type: application/json
  Content-Encoding: identity
  Digest: sha-256=X48E9qOokqqrvdts8nOJRJN3OWDUoyWxBf7kbu9DBPE=

  {"hello": "world"}

6.1.2. Representation data is not contained in the payload

Request:

  HEAD /items/123

Response:

  HTTP/1.1 200 Ok
  Content-Type: application/json
  Content-Encoding: identity
  Digest: sha-256=X48E9qOokqqrvdts8nOJRJN3OWDUoyWxBf7kbu9DBPE=

6.1.3. Representation data is partially contained in the payload i.e. range request

Request:

  GET /items/123
  Range: bytes=1-7

Response:

  HTTP/1.1 206 Partial Content
  Content-Type: application/json
  Content-Encoding: identity
  Content-Range: bytes 1-7/18
  Digest: sha-256=X48E9qOokqqrvdts8nOJRJN3OWDUoyWxBf7kbu9DBPE=

  "hello"

6.1.4. Digest in both Request and Response. Returned value depends on representation metadata

Digest can be used in requests too. Returned value depends on the representation metadata headers.

Request:

  PUT /items/123
  Content-Type: application/json
  Content-Encoding: identity
  Accept-Encoding: br
  Digest: sha-256=4REjxQ4yrqUVicfSKYNO/cF9zNj5ANbzgDZt3/h3Qxo=

  {"hello": "world"}

Response:

  Content-Type: application/json
  Content-Encoding: br
  Digest: sha-256=X48E9qOokqqrvdts8nOJRJN3OWDUoyWxBf7kbu9DBPE=

  b'\x8b\x08\x80{"hello": "world"}\x03'

6.2. Want-Digest solicited digest responses

6.2.1. Client request data is fully contained in the payload

The client requests a digest, preferring sha. The server is free to reply with sha-256 anyway.

Request:

  GET /items/123
  Want-Digest: sha-256;q=0.3, sha;q=1

Response:

  HTTP/1.1 200 Ok
  Content-Type: application/json
  Content-Encoding: identity
  Digest: sha-256=X48E9qOokqqrvdts8nOJRJN3OWDUoyWxBf7kbu9DBPE=

  {"hello": "world"}

6.2.2. A client requests an unsupported Digest, the server MAY reply with an unsupported digest

The client requests a sha digest only. The server is currently free to reply with a Digest containing an unsupported algorithm

Request:

  GET /items/123
  Want-Digest: sha;q=1

Response:

  HTTP/1.1 200 Ok
  Content-Type: application/json
  Content-Encoding: identity
  Digest: sha-256=X48E9qOokqqrvdts8nOJRJN3OWDUoyWxBf7kbu9DBPE=

  {"hello": "world"}

6.2.3. A client requests an unsupported Digest, the server MAY reply with a 400

The client requests a sha Digest, the server advises for sha-256 and sha-512

Request:

  GET /items/123
  Want-Digest: sha;q=1

Response:

  HTTP/1.1 400 Bad Request
  Want-Digest: sha-256, sha-512

…

7. Security Considerations

7.1. Usage in signatures

Digital signatures are widely used together with checksums to provide the certain identification of the origin of a message [NIST800-32].

It’s important to note that, being the Digest header an hash of a resource representation, signing only the Digest header, without all the representation metatada (eg. the values of Content-Type and Content-Encoding) may expose the communication to tampering.

7.2. Message Truncation

…

7.3. Algorithm Agility

…

8. IANA Considerations

8.1. The “id-sha-256” Digest Algorithm

This memo registers the “id-sha-256” digest algorithm in the HTTP Digest Algorithm Values registry:

Digest Algorithm: id-sha-256
Description: As specified in Section 3.

8.2. The “id-sha-512” Digest Algorithm

This memo registers the “id-sha-512” digest algorithm in the HTTP Digest Algorithm Values registry:

Digest Algorithm: id-sha-512
Description: As specified in Section 3.

8.3. Want-Digest Header Field Registration

This section registers the Want-Digest header field in the “Permanent Message Header Field Names” registry ([RFC3864]).

Header field name: Want-Digest

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document(s): Section 4.1 of this document

8.4. Digest Header Field Registration

This section registers the Digest header field in the “Permanent Message Header Field Names” registry ([RFC3864]).

Header field name: Digest

Applicable protocol: http

Status: standard

Author/Change controller: IETF

Specification document(s): Section 4.2 of this document

9. References

9.1. Normative References

[CMU-836068]	Carnagie Mellon University, Software Engineering Institute, ., "MD5 Vulnerable to collision attacks", December 2008.
[FIPS180-1]	Department of Commerce, National., "NIST FIPS 180-1, Secure Hash Standard", April 1995.
[FIPS180-3]	Department of Commerce, National., "NIST FIPS 180-3, Secure Hash Standard", October 2008.
[FIPS180-4]	Department of Commerce, National., "NIST FIPS 180-4, Secure Hash Standard", March 2012.
[NIST800-32]	Department of Commerce, National., "Introduction to Public Key Technology and the Federal PKI Infrastructure", February 2001.
[RFC1321]	Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, DOI 10.17487/RFC1321, April 1992.
[RFC2119]	Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
[RFC3230]	Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", RFC 3230, DOI 10.17487/RFC3230, January 2002.
[RFC3864]	Klyne, G., Nottingham, M. and J. Mogul, "Registration Procedures for Message Header Fields", BCP 90, RFC 3864, DOI 10.17487/RFC3864, September 2004.
[RFC4648]	Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006.
[RFC5789]	Dusseault, L. and J. Snell, "PATCH Method for HTTP", RFC 5789, DOI 10.17487/RFC5789, March 2010.
[RFC5843]	Bryan, A., "Additional Hash Algorithms for HTTP Instance Digests", RFC 5843, DOI 10.17487/RFC5843, April 2010.
[RFC7230]	Fielding, R. and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing", RFC 7230, DOI 10.17487/RFC7230, June 2014.
[RFC7231]	Fielding, R. and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content", RFC 7231, DOI 10.17487/RFC7231, June 2014.
[RFC7233]	Fielding, R., Lafon, Y. and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Range Requests", RFC 7233, DOI 10.17487/RFC7233, June 2014.
[RFC8174]	Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017.
[UNIX]	The Open Group, ., "The Single UNIX Specification, Version 2 - 6 Vol Set for UNIX 98", February 1997.

9.2. Informative References

[RFC2818]	Rescorla, E., "HTTP Over TLS", RFC 2818, DOI 10.17487/RFC2818, May 2000.
[RFC5788]	Melnikov, A. and D. Cridland, "IMAP4 Keyword Registry", RFC 5788, DOI 10.17487/RFC5788, March 2010.
[RFC6962]	Laurie, B., Langley, A. and E. Kasper, "Certificate Transparency", RFC 6962, DOI 10.17487/RFC6962, June 2013.
[RFC7396]	Hoffman, P. and J. Snell, "JSON Merge Patch", RFC 7396, DOI 10.17487/RFC7396, October 2014.
[SRI]	Akhawe, D., Braun, F., Marier, F. and J. Weinberger, "Subresource Integrity", n.d..

Appendix A. Acknowledgements

The vast majority of this document is inherited from [RFC3230], so thanks to J. Mogul and A. Van Hoff for their great work. The original idea of refreshing this document arose from an interesting discussion with M. Nottingham, J. Yasskin and M. Thomson when reviewing the MICE Content Encoding.

Appendix B. FAQ

Why remove all references to content-md5?

Those were unnecessary to understanding and using this spec.
Why remove references to instance manipulation?

Unnecessary again for correctly using and applying the spec. An example with Range Request is more than enough.
How to use Digest with PATCH method?

The PATCH verb brings some complexities (eg. about representation metadata headers, patch document format, …),
- PATCH entity-headers apply to the patch document and MUST NOT be applied to the target resource, see [RFC5789], Section 2.
- servers shouldn’t assume PATCH semantics for generic media types like “application/json” but should instead use a proper content-type, eg [RFC7396]
- a 200 OK response to a PATCH request would contain the digest of the patched item, and the etag of the new object. This behavior - tighly coupled to the application logic - gives the client low probability of guessing the actual outcome of this operation (eg. concurrent changes, …)

Authors' Addresses

Roberto Polli Team Digitale EMail: robipolli@gmail.com

Lucas Pardue EMail: lucaspardue.24.7@gmail.com