[Caml-list] Covariant GADTs

Discussion:

(too old to reply)

Markus Mottl

2016-09-17 17:39:14 UTC

Hi,

GADTs currently do not allow for variance annotations so I wondered
whether there are any workarounds for what I want:

-----
type +_ t =
| Z : [ `z ] t
| S : [ `z | `s ] t -> [ `s ] t

let get_next (s : [ `s ] t) =
match s with
| S next -> next
-----

The above compiles without the variance annotation, but then you
cannot express "S Z", because there is no way to coerce Z to be of
type [`z | `s] t.

Another approach I tried is the following:

-----
type _ t =
| Z : [ `z ] t
| S : [< `z | `s ] t -> [ `s ] t

let get_next (s : [ `s ] t) : [< `z | `s ] t =
match s with
| S next -> next
-----

The above gives the confusing error:

-----
Error: This expression has type [< `s | `z ] t
but an expression was expected of type [< `s | `z ] t
The type constructor $S would escape its scope
-----

There are apparently two type variables associated with [< `s | `z ]
t: the locally existential one introduced by matching the GADT, and
the one in the type restriction of the function, but the compiler
cannot figure out that these can be safely unified. There is
currently no way of specifying locally abstract types that have type
constraints, which could possibly also help here.

Are there workarounds to achieve the above? Are there any plans to
add variance annotations for GADTs?

Regards,
Markus

--
Markus Mottl http://www.ocaml.info ***@gmail.com
--
Caml-list mailing list. Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Petter A. Urkedal

2016-09-18 08:17:52 UTC

Permalink

Hi Markus,

Post by Markus Mottl
Hi,
GADTs currently do not allow for variance annotations so I wondered
-----
type +_ t =
| Z : [ `z ] t
| S : [ `z | `s ] t -> [ `s ] t
let get_next (s : [ `s ] t) =
match s with
| S next -> next
-----
The above compiles without the variance annotation, but then you
cannot express "S Z", because there is no way to coerce Z to be of
type [`z | `s] t.
-----
type _ t =
| Z : [ `z ] t
| S : [< `z | `s ] t -> [ `s ] t
let get_next (s : [ `s ] t) : [< `z | `s ] t =
match s with
| S next -> next
-----
-----
Error: This expression has type [< `s | `z ] t
but an expression was expected of type [< `s | `z ] t
The type constructor $S would escape its scope
-----
There are apparently two type variables associated with [< `s | `z ]
t: the locally existential one introduced by matching the GADT, and
the one in the type restriction of the function, but the compiler
cannot figure out that these can be safely unified. There is
currently no way of specifying locally abstract types that have type
constraints, which could possibly also help here.

In this case, you can supply the needed type information to `get_next`
with structural type matching:

type _ t = Z : [`z] t | S : 'a t -> [`s of 'a] t
let get_next : [`s of 'a] t -> 'a t = function S next -> next

This gives you a richer type which also allows defining get_next_pair, etc.

Post by Markus Mottl
Are there workarounds to achieve the above? Are there any plans to
add variance annotations for GADTs?

I had a similar problem myself and found a paper [1] explaining the
issue and a possible solution. Haven't really studied it, but as I
understand the issue is that the current GADTs only introduce type
equality, which is too strict to verify variance. What is needed is
more like (my ASCIIfication)

type +'a t =
| Z when 'a >= [`z]
| S when 'a >= [`s] of [< `z | `s] t

allowing 'a in each case to be a supertype rather than equal to the
specified type. In that spirit, I tried

type +_ t =
| Z : [> `z] t
| S : [< `s | `z] t -> [> `s] t;;

but that does not check for variance either.

Best,
Petter

[1] https://arxiv.org/abs/1301.2903

--
Caml-list mailing list. Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Markus Mottl

2016-09-19 01:52:44 UTC

Permalink

Hi Petter,

thanks, the above approach obviously works with my previous example,
which I had sadly oversimplified. In my case I was actually dealing
with inlined, mutable records where the above won't work, because then
I can't overwrite fields. The somewhat more complete example also
contains strange type system behavior I don't understand that may even
be a bug.

The example below shows how GADTs + inline records + (if available)
covariance could help implement doubly linked lists much more
efficiently. The memory representation using mutable inline records
is essentially optimal. Taking an element link can be done without
having to introduce superfluous pattern matches, because the GADT
guarantees via the type system that a link is non-empty.

I had to use "Obj.magic" due to the lack of covariance support with
GADTs in the "coerce" function, which I believe to be sound.

The strange behavior is in the "insert_first" function and also the
"create" function: "Empty" is not of covariant type, but nevertheless
can be used for allocation in that way (e.g. in the "let res = ..."
part). But the commented out lines show that I cannot overwrite the
exact same field in the just allocated value with the exact same
value. I can understand the reason why the latter is not possible,
but why is allocation allowed that way? Maybe a type system guru can
explain what's going on.

----------
module Link = struct
type kind = [ `empty | `elt ]

type ('el, _) t =
| Empty : ('el, [ `empty ]) t
| Elt : {
mutable prev : ('el, [< kind ]) t;
el : 'el;
mutable next : ('el, [< kind ]) t;
} -> ('el, [ `elt ]) t

let coerce (t : ('el, [< kind ]) t) : ('el, [< kind ]) t = Obj.magic t

let get_opt_elt : type a. ('el, a) t -> ('el, [ `elt ]) t option = function
| Empty -> None
| Elt _ as elt -> Some elt

let cut : type a. ('el, a) t -> unit = function
| Empty -> ()
| Elt { prev = prev_elt; next = next_elt } ->
match prev_elt, next_elt with
| Empty, Empty -> ()
| Empty, Elt next -> next.prev <- coerce Empty
| Elt prev, Empty -> prev.next <- coerce Empty
| Elt prev, Elt next ->
prev.next <- coerce next_elt;
next.prev <- coerce prev_elt
end (* Link *)

module Doubly_linked : sig
type 'el t
type 'el elt

val create : unit -> 'el t
val first : 'el t -> 'el elt option
val insert_first : 'el t -> 'el -> unit
val remove : 'el elt -> unit
end = struct
open Link

type 'el t = Head : { mutable head : ('el, [< Link.kind ]) Link.t } -> 'el t
type 'el elt = ('el, [ `elt ]) Link.t

let create () = Head { head = Empty }

let first (Head h) = Link.get_opt_elt h.head

let insert_first (Head h) el =
h.head <-
match h.head with
| Empty ->
let res = Elt { prev = Empty; el; next = Empty } in
(* let Elt foo = res in *)
(* foo.prev <- Empty; *)
coerce res
| Elt _ as next -> coerce (Elt { prev = Empty; el; next })

let remove = Link.cut
end (* Doubly_linked *)
----------

Regards,
Markus

Post by Petter A. Urkedal
Hi Markus,

In this case, you can supply the needed type information to `get_next`
type _ t = Z : [`z] t | S : 'a t -> [`s of 'a] t
let get_next : [`s of 'a] t -> 'a t = function S next -> next
This gives you a richer type which also allows defining get_next_pair, etc.

Post by Markus Mottl
Are there workarounds to achieve the above? Are there any plans to
add variance annotations for GADTs?

I had a similar problem myself and found a paper [1] explaining the
issue and a possible solution. Haven't really studied it, but as I
understand the issue is that the current GADTs only introduce type
equality, which is too strict to verify variance. What is needed is
more like (my ASCIIfication)
type +'a t =
| Z when 'a >= [`z]
| S when 'a >= [`s] of [< `z | `s] t
allowing 'a in each case to be a supertype rather than equal to the
specified type. In that spirit, I tried
type +_ t =
| Z : [> `z] t
| S : [< `s | `z] t -> [> `s] t;;
but that does not check for variance either.
Best,
Petter
[1] https://arxiv.org/abs/1301.2903

octachron

2016-09-19 08:58:46 UTC

Permalink

Hi Markus,

Post by Markus Mottl
Taking an element link can be done without
having to introduce superfluous pattern matches, because the GADT
guarantees via the type system that a link is non-empty.

I fear that you are misinterpreting the guarantees giving to you by the
type ('el, 'kind) Link.t. This type can be rewritten as

type ('el, _) t =
| Empty : ('el, [ `empty ]) t
| Elt : {
mutable prev : ('el, [< kind ] as 'prev ) t;
el : 'el;
mutable next : ('el, [< kind ] as 'next ) t;
} -> ('el, [ `elt ]) t

Notice the existential types 'prev and 'next within the Elt constructor:
they mean that outside of the context of Elt, the type system simply
does not know the subtype of the prev and next fields. Even within the
context of the `Elt` constructor, the type system only knows that there
exists two types p and n such that Elt r is typed Elt
{prev:p;next:n;el:el}. But the information about which precise type was
present when constructing the value is lost.

Therefore, these fields are neither readable nor writable directly. A
direct manifestation of the problem is that, as you observed, you cannot
assign new values to either prev or next without use of `Obj.magic`. For
instance,

let set_next (Elt r) x = r.next <- x;;

fails with

Error: This expression has type 'a but an expression was expected of
type ('b, [< kind ]) t. The type constructor $Elt_'next would escape its
scope.

because the type checker tries to unify the type 'a of x with the
existential type $Elt_'next of `next`. Using a fantasy syntax, we have

let set_next (type a) (Elt (type p n) {next:n; prev:p; _} ) (x:a) =
r.next:p <- x:a;;

and the types `a` and `p` cannot be unified because `p` should not
escape the context of the constructor Elt.

Consequently writing `set_next` requires to use Obj.magic:

let set_next (Elt r) x = r.next <- Obj.magic x;;

This use of Obj.magic should be fine, since it is impossible to access
the field prev and next directly. Contrarily, your coerce function is
unsafe:

let current (Elt {el;_}) = el
let crash () = current (coerce @@ Empty)

On the other side of the problem, this is the reason why you need to
wrap access with an option in `get_opt_next`. Note that I am not certain
that this wrapping does not completely defeat your optimisation objective.

Regards − octachron.

Post by Markus Mottl
Hi Petter,
thanks, the above approach obviously works with my previous example,
which I had sadly oversimplified. In my case I was actually dealing
with inlined, mutable records where the above won't work, because then
I can't overwrite fields. The somewhat more complete example also
contains strange type system behavior I don't understand that may even
be a bug.
The example below shows how GADTs + inline records + (if available)
covariance could help implement doubly linked lists much more
efficiently. The memory representation using mutable inline records
is essentially optimal. Taking an element link can be done without
having to introduce superfluous pattern matches, because the GADT
guarantees via the type system that a link is non-empty.
I had to use "Obj.magic" due to the lack of covariance support with
GADTs in the "coerce" function, which I believe to be sound.
The strange behavior is in the "insert_first" function and also the
"create" function: "Empty" is not of covariant type, but nevertheless
can be used for allocation in that way (e.g. in the "let res = ..."
part). But the commented out lines show that I cannot overwrite the
exact same field in the just allocated value with the exact same
value. I can understand the reason why the latter is not possible,
but why is allocation allowed that way? Maybe a type system guru can
explain what's going on.
----------
module Link = struct
type kind = [ `empty | `elt ]
type ('el, _) t =
| Empty : ('el, [ `empty ]) t
| Elt : {
mutable prev : ('el, [< kind ]) t;
el : 'el;
mutable next : ('el, [< kind ]) t;
} -> ('el, [ `elt ]) t
let coerce (t : ('el, [< kind ]) t) : ('el, [< kind ]) t = Obj.magic t
let get_opt_elt : type a. ('el, a) t -> ('el, [ `elt ]) t option = function
| Empty -> None
| Elt _ as elt -> Some elt
let cut : type a. ('el, a) t -> unit = function
| Empty -> ()
| Elt { prev = prev_elt; next = next_elt } ->
match prev_elt, next_elt with
| Empty, Empty -> ()
| Empty, Elt next -> next.prev <- coerce Empty
| Elt prev, Empty -> prev.next <- coerce Empty
| Elt prev, Elt next ->
prev.next <- coerce next_elt;
next.prev <- coerce prev_elt
end (* Link *)
module Doubly_linked : sig
type 'el t
type 'el elt
val create : unit -> 'el t
val first : 'el t -> 'el elt option
val insert_first : 'el t -> 'el -> unit
val remove : 'el elt -> unit
end = struct
open Link
type 'el t = Head : { mutable head : ('el, [< Link.kind ]) Link.t } -> 'el t
type 'el elt = ('el, [ `elt ]) Link.t
let create () = Head { head = Empty }
let first (Head h) = Link.get_opt_elt h.head
let insert_first (Head h) el =
h.head <-
match h.head with
| Empty ->
let res = Elt { prev = Empty; el; next = Empty } in
(* let Elt foo = res in *)
(* foo.prev <- Empty; *)
coerce res
| Elt _ as next -> coerce (Elt { prev = Empty; el; next })
let remove = Link.cut
end (* Doubly_linked *)
----------
Regards,
Markus

Mikhail Mandrykin

2016-09-19 10:18:36 UTC

Permalink

Hello,

Post by Petter A. Urkedal
Hi Markus,
Therefore, these fields are neither readable nor writable directly. A
direct manifestation of the problem is that, as you observed, you cannot
assign new values to either prev or next without use of `Obj.magic`. For
instance,

As far as I know quite common approach in this case is introduction of one-
constructor wrapper types to hide the existential variable and allow mutability
e.g.

type ('el, _) t =
| Empty : ('el, [ `empty ]) t
| Elt : {
mutable prev : 'el link;
el : 'el;
mutable next : 'el link;
} -> ('el, [ `elt ]) t
and 'el link = Link : ('el, _) t -> 'el link;;

So the link type wraps the type parameter of the next element and thus allows
safe mutation, otherwise it's only possible to update the field with the element of
exactly same type that doesn't allow e.g. deleting an element at the end of the list
without reallocating the corresponding record of the previous element (and if one
decides to keep more precise information e.g. about the number of elements, the
whole list needs to be re-allocated). With the link wrapper as above it's possible to
define add, remove and also a get operation without and extra pattern matching:

let add : type a. _ -> (_, a) t -> (_, [`elt]) t = fun el ->
function
| Empty -> Elt { el; prev = Link Empty; next = Link Empty }
| Elt _ as n -> Elt { el; prev = Link Empty; next = Link n };;

let remove : type a. ('el, a) t -> 'el link =
function
| Empty -> Link Empty
| Elt { prev = Link p as prev; next = Link n as next} ->
(match p with Empty -> () | Elt p -> p.next <- next);
(match n with Empty -> () | Elt n -> n.prev <- prev);
next;;

let get : (_, [`elt]) t -> _ = function Elt { el; _ } -> el

Also note the GPR#606(https://github.com/ocaml/ocaml/pull/606[1] ) that should
allow constructing and deconstructing links (Link l) without overhead.

Regards, Mikhail
--
Mikhail Mandrykin
Linux Verification Center, ISPRAS
web: http://linuxtesting.org
e-mail: ***@ispras.ru

--------
[1] https://github.com/ocaml/ocaml/pull/606

Mikhail Mandrykin

2016-09-19 13:37:43 UTC

Permalink

Post by Mikhail Mandrykin
Hello,

As far as I know quite common approach in this case is introduction of one-
constructor wrapper types to hide the existential variable and allow
mutability e.g.
type ('el, _) t =
| Empty : ('el, [ `empty ]) t
| Elt : {
mutable prev : 'el link;
el : 'el;
mutable next : 'el link;
} -> ('el, [ `elt ]) t
and 'el link = Link : ('el, _) t -> 'el link;;
So the link type wraps the type parameter of the next element and thus
allows safe mutation, otherwise it's only possible to update the field with
the element of exactly same type that doesn't allow e.g. deleting an
element at the end of the list without reallocating the corresponding
record of the previous element (and if one decides to keep more precise
information e.g. about the number of elements, the whole list needs to be
re-allocated). With the link wrapper as above it's possible to define add,
let add : type a. _ -> (_, a) t -> (_, [`elt]) t = fun el ->
function
| Empty -> Elt { el; prev = Link Empty; next = Link Empty }
| Elt _ as n -> Elt { el; prev = Link Empty; next = Link n };;
let remove : type a. ('el, a) t -> 'el link =
function
| Empty -> Link Empty
| Elt { prev = Link p as prev; next = Link n as next} ->
(match p with Empty -> () | Elt p -> p.next <- next);
(match n with Empty -> () | Elt n -> n.prev <- prev);
next;;

This is actually buggy (although typesafe), more like this:

let add : type a. _ -> (_, a) t -> (_, [`elt]) t = fun el ->
function
| Empty -> Elt { el; prev = Link Empty; next = Link Empty }
| Elt ({ prev = Link p as prev; _ } as n) as next ->
let r = Elt { el; prev; next = Link next } in
(match p with Empty -> () | Elt p -> p.next <- Link r);
n.prev <- Link r;
r;;

let remove : type a. (_, a) t -> (_, a) t * _ =
function
| Empty -> Empty, Link Empty
| Elt ({ prev = Link p as prev; next = Link n as next} as e) as elt ->
(match p with Empty -> () | Elt p -> p.next <- next);
(match n with Empty -> () | Elt n -> n.prev <- prev);
e.next <- Link Empty;
e.prev <- Link Empty;
elt, next;;

Regards, Mikhail
--
Mikhail Mandrykin
Linux Verification Center, ISPRAS
web: http://linuxtesting.org
e-mail: ***@ispras.ru

Markus Mottl

2016-09-19 14:46:31 UTC

Permalink

Thanks, Mikhail, that's the correct way to solve this problem from a
typing perspective. Sadly, this encoding using a separate GADT
containing a "Link" tag defeats the purpose of the idea, which was to
save indirections and the associated memory overhead. I wish it was
possible to introduce existentially quantified variables within
records without having to go through another GADT.

Regards,
Markus

Post by Mikhail Mandrykin
Hello,

As far as I know quite common approach in this case is introduction of
one-constructor wrapper types to hide the existential variable and allow
mutability e.g.
type ('el, _) t =
| Empty : ('el, [ `empty ]) t
| Elt : {
mutable prev : 'el link;
el : 'el;
mutable next : 'el link;
} -> ('el, [ `elt ]) t
and 'el link = Link : ('el, _) t -> 'el link;;
So the link type wraps the type parameter of the next element and thus
allows safe mutation, otherwise it's only possible to update the field with
the element of exactly same type that doesn't allow e.g. deleting an element
at the end of the list without reallocating the corresponding record of the
previous element (and if one decides to keep more precise information e.g.
about the number of elements, the whole list needs to be re-allocated). With
the link wrapper as above it's possible to define add, remove and also a get
let add : type a. _ -> (_, a) t -> (_, [`elt]) t = fun el ->
function
| Empty -> Elt { el; prev = Link Empty; next = Link Empty }
| Elt _ as n -> Elt { el; prev = Link Empty; next = Link n };;
let remove : type a. ('el, a) t -> 'el link =
function
| Empty -> Link Empty
| Elt { prev = Link p as prev; next = Link n as next} ->
(match p with Empty -> () | Elt p -> p.next <- next);
(match n with Empty -> () | Elt n -> n.prev <- prev);
next;;
let get : (_, [`elt]) t -> _ = function Elt { el; _ } -> el
Also note the GPR#606(https://github.com/ocaml/ocaml/pull/606 ) that should
allow constructing and deconstructing links (Link l) without overhead.
Regards, Mikhail
--
Mikhail Mandrykin
Linux Verification Center, ISPRAS
web: http://linuxtesting.org

Mikhail Mandrykin

2016-09-19 14:54:09 UTC

Permalink

Post by Markus Mottl
Thanks, Mikhail, that's the correct way to solve this problem from a
typing perspective. Sadly, this encoding using a separate GADT
containing a "Link" tag defeats the purpose of the idea, which was to
save indirections and the associated memory overhead. I wish it was
possible to introduce existentially quantified variables within
records without having to go through another GADT.

In fact the purpose of GPR#606 (https://github.com/ocaml/ocaml/pull/606[1]) is to
avoid the indirection e.g.
type t = A of string [@@unboxed]
let x = A "toto"
assert (Obj.repr x == Obj.repr (match x with A s -> s))
It is also said in the comment that:

This is useful (for example):

--...
-- when using a single-constructor, single-field GADT to introduce an existential
type

This is merged into trunk and should appear in 4.04.0: (from CHANGES)
- GPR#606: optimized representation for immutable records with a single
field, and concrete types with a single constructor with a single argument.
This is triggered with a [@@unboxed] attribute on the type definition.
(Damien Doligez)

Regards, Mikhail

Post by Markus Mottl
Regards,
Markus

Post by Mikhail Mandrykin
Hello,

As far as I know quite common approach in this case is introduction of
one-constructor wrapper types to hide the existential variable and allow
mutability e.g.
type ('el, _) t =
| Empty : ('el, [ `empty ]) t
|
| Elt : {
mutable prev : 'el link;
el : 'el;
mutable next : 'el link;
} -> ('el, [ `elt ]) t
and 'el link = Link : ('el, _) t -> 'el link;;
So the link type wraps the type parameter of the next element and thus
allows safe mutation, otherwise it's only possible to update the field with
the element of exactly same type that doesn't allow e.g. deleting an
element at the end of the list without reallocating the corresponding
record of the previous element (and if one decides to keep more precise
information e.g. about the number of elements, the whole list needs to be
re-allocated). With the link wrapper as above it's possible to define
let add : type a. _ -> (_, a) t -> (_, [`elt]) t = fun el ->
function
| Empty -> Elt { el; prev = Link Empty; next = Link Empty }
|
| Elt _ as n -> Elt { el; prev = Link Empty; next = Link n };;
let remove : type a. ('el, a) t -> 'el link =
function
| Empty -> Link Empty
|
| Elt { prev = Link p as prev; next = Link n as next} ->
(match p with Empty -> () | Elt p -> p.next <- next);
(match n with Empty -> () | Elt n -> n.prev <- prev);

Markus Mottl

2016-09-19 15:03:27 UTC

Permalink

Ah, thanks a lot, I totally missed following the link. Yes, this
OCaml feature would solve this problem efficiently, too. I guess an
existentially quantified record field would look neater, but I'd be
happy enough with GPR#606 getting into the next release.

Regards,
Markus

In fact the purpose of GPR#606 (https://github.com/ocaml/ocaml/pull/606) is
to avoid the indirection e.g.
let x = A "toto"
assert (Obj.repr x == Obj.repr (match x with A s -> s))
--...
-- when using a single-constructor, single-field GADT to introduce an
existential type
This is merged into trunk and should appear in 4.04.0: (from CHANGES)
- GPR#606: optimized representation for immutable records with a single
field, and concrete types with a single constructor with a single argument.
(Damien Doligez)
Regards, Mikhail

Post by Markus Mottl
Regards,
Markus

Post by Mikhail Mandrykin
Hello,

As far as I know quite common approach in this case is introduction of
one-constructor wrapper types to hide the existential variable and allow
mutability e.g.
type ('el, _) t =
| Empty : ('el, [ `empty ]) t
|
| Elt : {
mutable prev : 'el link;
el : 'el;
mutable next : 'el link;
} -> ('el, [ `elt ]) t
and 'el link = Link : ('el, _) t -> 'el link;;
So the link type wraps the type parameter of the next element and thus
allows safe mutation, otherwise it's only possible to update the field
with
the element of exactly same type that doesn't allow e.g. deleting an
element at the end of the list without reallocating the corresponding
record of the previous element (and if one decides to keep more precise
information e.g. about the number of elements, the whole list needs to be
re-allocated). With the link wrapper as above it's possible to define
let add : type a. _ -> (_, a) t -> (_, [`elt]) t = fun el ->
function
| Empty -> Elt { el; prev = Link Empty; next = Link Empty }
|
| Elt _ as n -> Elt { el; prev = Link Empty; next = Link n };;
let remove : type a. ('el, a) t -> 'el link =
function
| Empty -> Link Empty
|
| Elt { prev = Link p as prev; next = Link n as next} ->
(match p with Empty -> () | Elt p -> p.next <- next);
(match n with Empty -> () | Elt n -> n.prev <- prev);
next;;
let get : (_, [`elt]) t -> _ = function Elt { el; _ } -> el
Also note the GPR#606(https://github.com/ocaml/ocaml/pull/606 ) that
should
allow constructing and deconstructing links (Link l) without overhead.
Regards, Mikhail
--
Mikhail Mandrykin
Linux Verification Center, ISPRAS
web: http://linuxtesting.org

--
Mikhail Mandrykin
Linux Verification Center, ISPRAS
web: http://linuxtesting.org

Markus Mottl

2016-09-20 21:08:17 UTC

Permalink

Has the OCaml team ever considered implementing existentially
quantified type variables for record fields? Having given it some
superficial thought, I didn't see any obvious reason why it would be
impossible, though it would be admittedly more difficult than the
usual (universally quantified) polymorphic record fields.

Existentially quantified type variables need a well-defined scope.
It's easy to define this scope with GADTs and first class modules: it
starts with a pattern match or when unpacking the module. But
accessing a record field as such doesn't create a binding. This would
require some care when establishing the scope. Maybe one could define
the start as the topmost binding whose type depends on a given
existential type as obtained through field accesses.

Another issue: when accessing an immutable field twice, one could
assign the same existential type to bindings of their values. But
accessing a mutable field twice would require two distinct existential
types, because intermittent changes to the field could substitute
values of incompatible types. Maybe there are even more awkward
things that I haven't thought about.

Any thoughts?

Regards,
Markus

Post by Markus Mottl
Ah, thanks a lot, I totally missed following the link. Yes, this
OCaml feature would solve this problem efficiently, too. I guess an
existentially quantified record field would look neater, but I'd be
happy enough with GPR#606 getting into the next release.
Regards,
Markus

Post by Markus Mottl
Regards,
Markus

Post by Mikhail Mandrykin
Hello,

As far as I know quite common approach in this case is introduction of
one-constructor wrapper types to hide the existential variable and allow
mutability e.g.
type ('el, _) t =
| Empty : ('el, [ `empty ]) t
|
| Elt : {
mutable prev : 'el link;
el : 'el;
mutable next : 'el link;
} -> ('el, [ `elt ]) t
and 'el link = Link : ('el, _) t -> 'el link;;
So the link type wraps the type parameter of the next element and thus
allows safe mutation, otherwise it's only possible to update the field
with
the element of exactly same type that doesn't allow e.g. deleting an
element at the end of the list without reallocating the corresponding
record of the previous element (and if one decides to keep more precise
information e.g. about the number of elements, the whole list needs to be
re-allocated). With the link wrapper as above it's possible to define
let add : type a. _ -> (_, a) t -> (_, [`elt]) t = fun el ->
function
| Empty -> Elt { el; prev = Link Empty; next = Link Empty }
|
| Elt _ as n -> Elt { el; prev = Link Empty; next = Link n };;
let remove : type a. ('el, a) t -> 'el link =
function
| Empty -> Link Empty
|
| Elt { prev = Link p as prev; next = Link n as next} ->
(match p with Empty -> () | Elt p -> p.next <- next);
(match n with Empty -> () | Elt n -> n.prev <- prev);
next;;
let get : (_, [`elt]) t -> _ = function Elt { el; _ } -> el
Also note the GPR#606(https://github.com/ocaml/ocaml/pull/606 ) that
should
allow constructing and deconstructing links (Link l) without overhead.
Regards, Mikhail
--
Mikhail Mandrykin
Linux Verification Center, ISPRAS
web: http://linuxtesting.org

--
Mikhail Mandrykin
Linux Verification Center, ISPRAS
web: http://linuxtesting.org

Lukasz Stafiniak

2016-09-21 10:11:44 UTC

Permalink

Post by Markus Mottl
Existentially quantified type variables need a well-defined scope.
It's easy to define this scope with GADTs and first class modules: it
starts with a pattern match or when unpacking the module. But
accessing a record field as such doesn't create a binding. This would
require some care when establishing the scope. Maybe one could define
the start as the topmost binding whose type depends on a given
existential type as obtained through field accesses.
Another issue: when accessing an immutable field twice, one could
assign the same existential type to bindings of their values. But
accessing a mutable field twice would require two distinct existential
types, because intermittent changes to the field could substitute
values of incompatible types. Maybe there are even more awkward
things that I haven't thought about.
Any thoughts?

A simple solution would be to "A-transform" (IIRC the term) accesses
to fields with existential type variables. This would give a more
narrow scope on the expression level than you suggest, but a
well-defined one prior to type inference. To broaden the scope you
would need to let-bind the field access yourself at the appropriate
level.

Lukasz Stafiniak

2016-09-21 10:15:05 UTC

Permalink

Post by Lukasz Stafiniak
A simple solution would be to "A-transform" (IIRC the term) accesses

Sorry, I forgot to define this. I mean rewrite rules like:
[f r.x] ==> [let x = r.x in f x]
where subsequently the existential variable is introduced (unpacked)
at the let-binding level. This corresponds to a single-variant GADT
pattern match.

Post by Lukasz Stafiniak
to fields with existential type variables. This would give a more
narrow scope on the expression level than you suggest, but a
well-defined one prior to type inference. To broaden the scope you
would need to let-bind the field access yourself at the appropriate
level.

Markus Mottl

2016-09-21 17:04:23 UTC

Permalink

Here is a complete working example of the advantages of using GADTs
with inline records. It also uses the [@@unboxed] feature now
available with OCaml 4.04 as discussed before here, though it required
a little workaround due to an apparent bug in the current beta.

The below implementation of the union-find algorithm is considerably
more efficient (with the 4.04 beta only) than the Union_find
implementation in the Jane Street Core kernel. The problem admittedly
lends itself to the GADT + inline record trick.

There is actually one advantage to using an intermediate, unboxed GADT
tag compared to records with existentially quantified fields (if they
were available): functions matching the tag don't require those
horrible type annotations for locally abstract types, because the
match automatically sets up the scope for you. Having to write "Node
foo" instead of just "foo" in some places isn't too bad. Not sure
it's possible to have the best of both worlds.

----------
module Union_find = struct
(* This does not work yet due to an OCaml 4.04 beta bug
type ('a, 'kind) tree =
| Root : { mutable value : 'a; mutable rank : int } -> ('a, [ `root ]) tree
| Inner : { mutable parent : 'a node } -> ('a, [ `inner ]) tree

and 'a node = Node : ('a, _) tree -> 'a node [@@ocaml.unboxed]

type 'a t = ('a, [ `inner ]) tree
*)

type ('a, 'kind, 'parent) tree =
| Root : { mutable value : 'a; mutable rank : int } ->
('a, [ `root ], 'parent) tree
| Inner : { mutable parent : 'parent } -> ('a, [ `inner ], 'parent) tree

type 'a node = Node : ('a, _, 'a node) tree -> 'a node [@@ocaml.unboxed]

type 'a t = ('a, [ `inner ], 'a node) tree

let create v = Inner { parent = Node (Root { value = v; rank = 0 }) }

let rec compress ~repr:(Inner inner as repr) = function
| Node (Root _ as root) -> repr, root
| Node (Inner next_inner as repr) ->
let repr, _ as res = compress ~repr next_inner.parent in
inner.parent <- Node repr;
res

let compress_inner (Inner inner as repr) = compress ~repr inner.parent

let get_root (Inner inner) =
match inner.parent with
| Node (Root _ as root) -> root (* Avoids compression call *)
| Node (Inner _ as repr) ->
let repr, root = compress_inner repr in
inner.parent <- Node repr;
root

let get t = let Root r = get_root t in r.value

let set t x = let Root r = get_root t in r.value <- x

let same_class t1 t2 = get_root t1 == get_root t2

let union t1 t2 =
let Inner inner1 as repr1, (Root r1 as root1) = compress_inner t1 in
let Inner inner2 as repr2, (Root r2 as root2) = compress_inner t2 in
if root1 == root2 then ()
else
let n1 = r1.rank in
let n2 = r2.rank in
if n1 < n2 then inner1.parent <- Node repr2
else begin
inner2.parent <- Node repr1;
if n1 = n2 then r1.rank <- r1.rank + 1
end
end (* Union_find *)
----------

Regards,
Markus

Post by Lukasz Stafiniak

Post by Lukasz Stafiniak
A simple solution would be to "A-transform" (IIRC the term) accesses

[f r.x] ==> [let x = r.x in f x]
where subsequently the existential variable is introduced (unpacked)
at the let-binding level. This corresponds to a single-variant GADT
pattern match.

Gabriel Scherer

2016-09-21 21:43:13 UTC

Permalink

Very nice. Would you have more precise numbers for the "considerably more
efficient" part? It's not always easy to find clear benefits to inline
records on representative macro-benchmarks.

Post by Markus Mottl
Here is a complete working example of the advantages of using GADTs
available with OCaml 4.04 as discussed before here, though it required
a little workaround due to an apparent bug in the current beta.
The below implementation of the union-find algorithm is considerably
more efficient (with the 4.04 beta only) than the Union_find
implementation in the Jane Street Core kernel. The problem admittedly
lends itself to the GADT + inline record trick.
There is actually one advantage to using an intermediate, unboxed GADT
tag compared to records with existentially quantified fields (if they
were available): functions matching the tag don't require those
horrible type annotations for locally abstract types, because the
match automatically sets up the scope for you. Having to write "Node
foo" instead of just "foo" in some places isn't too bad. Not sure
it's possible to have the best of both worlds.
----------
module Union_find = struct
(* This does not work yet due to an OCaml 4.04 beta bug
type ('a, 'kind) tree =
| Root : { mutable value : 'a; mutable rank : int } -> ('a, [ `root ]) tree
| Inner : { mutable parent : 'a node } -> ('a, [ `inner ]) tree
type 'a t = ('a, [ `inner ]) tree
*)
type ('a, 'kind, 'parent) tree =
| Root : { mutable value : 'a; mutable rank : int } ->
('a, [ `root ], 'parent) tree
| Inner : { mutable parent : 'parent } -> ('a, [ `inner ], 'parent) tree
type 'a t = ('a, [ `inner ], 'a node) tree
let create v = Inner { parent = Node (Root { value = v; rank = 0 }) }
let rec compress ~repr:(Inner inner as repr) = function
| Node (Root _ as root) -> repr, root
| Node (Inner next_inner as repr) ->
let repr, _ as res = compress ~repr next_inner.parent in
inner.parent <- Node repr;
res
let compress_inner (Inner inner as repr) = compress ~repr inner.parent
let get_root (Inner inner) =
match inner.parent with
| Node (Root _ as root) -> root (* Avoids compression call *)
| Node (Inner _ as repr) ->
let repr, root = compress_inner repr in
inner.parent <- Node repr;
root
let get t = let Root r = get_root t in r.value
let set t x = let Root r = get_root t in r.value <- x
let same_class t1 t2 = get_root t1 == get_root t2
let union t1 t2 =
let Inner inner1 as repr1, (Root r1 as root1) = compress_inner t1 in
let Inner inner2 as repr2, (Root r2 as root2) = compress_inner t2 in
if root1 == root2 then ()
else
let n1 = r1.rank in
let n2 = r2.rank in
if n1 < n2 then inner1.parent <- Node repr2
else begin
inner2.parent <- Node repr1;
if n1 = n2 then r1.rank <- r1.rank + 1
end
end (* Union_find *)
----------
Regards,
Markus

Post by Lukasz Stafiniak

Post by Lukasz Stafiniak
A simple solution would be to "A-transform" (IIRC the term) accesses

[f r.x] ==> [let x = r.x in f x]
where subsequently the existential variable is introduced (unpacked)
at the let-binding level. This corresponds to a single-variant GADT
pattern match.

--
--
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Markus Mottl

2016-09-22 00:39:32 UTC

Permalink

The direct comparison with the Jane Street implementation showed a 40%
speed increase for some random things I tried, but that's not a fair
comparison. If I improve the JS code, e.g. to avoid allocations, the
performance improvement due to the GADT + inlined records drops to
only about 10%.

In terms of memory, a freshly created set costs 7 machine words in the
original code vs. 5 for the GADT. Adding one rank costs 4 machine
words in the standard implementation vs. only 2 for GADTs. That's a
pretty significant size reduction. The GADT representation would
surely help in programs that allocate a lot of these values, but the
values don't tend to grow much internally due to the tree compression
algorithm. I'm sure there are better examples where a program would
typically allocate GADT-based data structures of more significant
size.

Regards,
Markus

On Wed, Sep 21, 2016 at 5:40 PM, Gabriel Scherer

Post by Gabriel Scherer
Very nice. Would you have more precise numbers for the "considerably more
efficient" part? It's not always easy to find clear benefits to inline
records on representative macro-benchmarks.

Post by Markus Mottl
Here is a complete working example of the advantages of using GADTs
available with OCaml 4.04 as discussed before here, though it required
a little workaround due to an apparent bug in the current beta.
The below implementation of the union-find algorithm is considerably
more efficient (with the 4.04 beta only) than the Union_find
implementation in the Jane Street Core kernel. The problem admittedly
lends itself to the GADT + inline record trick.
There is actually one advantage to using an intermediate, unboxed GADT
tag compared to records with existentially quantified fields (if they
were available): functions matching the tag don't require those
horrible type annotations for locally abstract types, because the
match automatically sets up the scope for you. Having to write "Node
foo" instead of just "foo" in some places isn't too bad. Not sure
it's possible to have the best of both worlds.
----------
module Union_find = struct
(* This does not work yet due to an OCaml 4.04 beta bug
type ('a, 'kind) tree =
| Root : { mutable value : 'a; mutable rank : int } -> ('a, [ `root ]) tree
| Inner : { mutable parent : 'a node } -> ('a, [ `inner ]) tree
type 'a t = ('a, [ `inner ]) tree
*)
type ('a, 'kind, 'parent) tree =
| Root : { mutable value : 'a; mutable rank : int } ->
('a, [ `root ], 'parent) tree
| Inner : { mutable parent : 'parent } -> ('a, [ `inner ], 'parent) tree
type 'a node = Node : ('a, _, 'a node) tree -> 'a node
type 'a t = ('a, [ `inner ], 'a node) tree
let create v = Inner { parent = Node (Root { value = v; rank = 0 }) }
let rec compress ~repr:(Inner inner as repr) = function
| Node (Root _ as root) -> repr, root
| Node (Inner next_inner as repr) ->
let repr, _ as res = compress ~repr next_inner.parent in
inner.parent <- Node repr;
res
let compress_inner (Inner inner as repr) = compress ~repr inner.parent
let get_root (Inner inner) =
match inner.parent with
| Node (Root _ as root) -> root (* Avoids compression call *)
| Node (Inner _ as repr) ->
let repr, root = compress_inner repr in
inner.parent <- Node repr;
root
let get t = let Root r = get_root t in r.value
let set t x = let Root r = get_root t in r.value <- x
let same_class t1 t2 = get_root t1 == get_root t2
let union t1 t2 =
let Inner inner1 as repr1, (Root r1 as root1) = compress_inner t1 in
let Inner inner2 as repr2, (Root r2 as root2) = compress_inner t2 in
if root1 == root2 then ()
else
let n1 = r1.rank in
let n2 = r2.rank in
if n1 < n2 then inner1.parent <- Node repr2
else begin
inner2.parent <- Node repr1;
if n1 = n2 then r1.rank <- r1.rank + 1
end
end (* Union_find *)
----------
Regards,
Markus

Post by Lukasz Stafiniak

Post by Lukasz Stafiniak
A simple solution would be to "A-transform" (IIRC the term) accesses

[f r.x] ==> [let x = r.x in f x]
where subsequently the existential variable is introduced (unpacked)
at the let-binding level. This corresponds to a single-variant GADT
pattern match.

--
--
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jacques Garrigue

2016-10-04 10:34:41 UTC

Permalink

Post by Markus Mottl
Has the OCaml team ever considered implementing existentially
quantified type variables for record fields? Having given it some
superficial thought, I didn't see any obvious reason why it would be
impossible, though it would be admittedly more difficult than the
usual (universally quantified) polymorphic record fields.

There is no such thing for individual record fields, but note that
inline records already give you that for type variable scoping the whole
record.

type t = T : {x: 'a; f: 'a -> int} -> t
let g (T r) = r.f r.x

or for your example:

| Elt : {mutable prev : (‘el, ‘kind) t} -> (‘el, [`elt]) t

Non-quantified type variables in a GADT constructor are existential, and you can
still use explicit quantification for universals.

Jacques

Markus Mottl

2016-09-19 14:39:59 UTC

Permalink

Hi Octachron,

yes, indeed, that was spot on. I somehow got sidetracked by the lack
of covariance annotations for GADTs, not realizing that the implicitly
defined type variables 'prev and 'next are existential within the
scope of Elt only. Anything outside can't be a witness so I can't
write, and anything inside can't get out without escaping scope.

I guess what I would need to solve my problem efficiently are record
fields that support existentially rather than only universally
quantified variables. Something along the lines of:

| Elt : { mutable prev : $'kind. ('el, 'kind) t }

Would that make sense?

Btw., the "option" type was only introduced so as not having to reveal
the type representation to users of the API. The "Some" tag would
typically be dropped right after the user pattern matches on the
option type. They would then still benefit from the efficient
representation of the link, and the GADT + inlined record would still
be able to save the memory overhead associated with the indirection to
a non-inlined record.

Regards,
Markus

Post by Petter A. Urkedal
Hi Markus,

Post by Markus Mottl
Taking an element link can be done without
having to introduce superfluous pattern matches, because the GADT
guarantees via the type system that a link is non-empty.

I fear that you are misinterpreting the guarantees giving to you by the type
('el, 'kind) Link.t. This type can be rewritten as
type ('el, _) t =
| Empty : ('el, [ `empty ]) t
| Elt : {
mutable prev : ('el, [< kind ] as 'prev ) t;
el : 'el;
mutable next : ('el, [< kind ] as 'next ) t;
} -> ('el, [ `elt ]) t
they mean that outside of the context of Elt, the type system simply does
not know the subtype of the prev and next fields. Even within the context of
the `Elt` constructor, the type system only knows that there exists two
types p and n such that Elt r is typed Elt {prev:p;next:n;el:el}. But the
information about which precise type was present when constructing the value
is lost.
Therefore, these fields are neither readable nor writable directly. A direct
manifestation of the problem is that, as you observed, you cannot assign new
values to either prev or next without use of `Obj.magic`. For instance,
let set_next (Elt r) x = r.next <- x;;
fails with
Error: This expression has type 'a but an expression was expected of type
('b, [< kind ]) t. The type constructor $Elt_'next would escape its scope.
because the type checker tries to unify the type 'a of x with the
existential type $Elt_'next of `next`. Using a fantasy syntax, we have
let set_next (type a) (Elt (type p n) {next:n; prev:p; _} ) (x:a) = r.next:p
<- x:a;;
and the types `a` and `p` cannot be unified because `p` should not escape
the context of the constructor Elt.
let set_next (Elt r) x = r.next <- Obj.magic x;;
This use of Obj.magic should be fine, since it is impossible to access the
let current (Elt {el;_}) = el
On the other side of the problem, this is the reason why you need to wrap
access with an option in `get_opt_next`. Note that I am not certain that
this wrapping does not completely defeat your optimisation objective.
Regards − octachron.

--
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Goswin von Brederlow

2016-09-19 10:06:34 UTC

Permalink

I think here you are trying to encode into the type system if the
next/prev point to Empty or Elt. But you don't encode this information
into the resulting type at all. How will you know the type of prev/next
from an Elt?

Problem is that you then need to do that recursively. You not only
need to know if next is Empty or Elt but how many Elt there are before
you hit Empty and in both directions. E.g.

('el, [`elt] -> [`empty], [`elt] -> [`elt] -> [`empty]) t

Also inserting into the list would change the type of every item in
the list. Since some other item could be stored somewhere it's type
would change and we can't have that. That would rule out using
mutables and you would have to copy the whole list on every change.

Seems to me like you are on a dead end here.

Post by Markus Mottl
let coerce (t : ('el, [< kind ]) t) : ('el, [< kind ]) t = Obj.magic t
let get_opt_elt : type a. ('el, a) t -> ('el, [ `elt ]) t option = function
| Empty -> None
| Elt _ as elt -> Some elt
let cut : type a. ('el, a) t -> unit = function
| Empty -> ()
| Elt { prev = prev_elt; next = next_elt } ->
match prev_elt, next_elt with
| Empty, Empty -> ()
| Empty, Elt next -> next.prev <- coerce Empty
| Elt prev, Empty -> prev.next <- coerce Empty
| Elt prev, Elt next ->
prev.next <- coerce next_elt;
next.prev <- coerce prev_elt
end (* Link *)
module Doubly_linked : sig
type 'el t
type 'el elt
val create : unit -> 'el t
val first : 'el t -> 'el elt option
val insert_first : 'el t -> 'el -> unit
val remove : 'el elt -> unit
end = struct
open Link
type 'el t = Head : { mutable head : ('el, [< Link.kind ]) Link.t } -> 'el t
type 'el elt = ('el, [ `elt ]) Link.t
let create () = Head { head = Empty }
let first (Head h) = Link.get_opt_elt h.head
let insert_first (Head h) el =
h.head <-
match h.head with
| Empty ->
let res = Elt { prev = Empty; el; next = Empty } in
(* let Elt foo = res in *)
(* foo.prev <- Empty; *)
coerce res
| Elt _ as next -> coerce (Elt { prev = Empty; el; next })
let remove = Link.cut
end (* Doubly_linked *)
----------
Regards,
Markus

MfG
Goswin