tl;dr: The new Haddocks are available at: http://www.snoyman.com/haddocks/conduit-0.2.0/index.html
Even though it's relatively young, conduit has gotten a lot of real-world usage, and a fair bit of scrutiny. I think we achieved all of our main objectives with the first release, but that doesn't mean we're going to avoid improvements. I asked the community to give their feedback, and here were the main criticisms I've heard:
BufferedSource
doesn't feel quite right. One complaint was the namebsourceUnpull
, but overall people thought it didn't fit in well with the rest of the package.- Usage of mutable variables for storing state is suboptimal.
- The split between
Source
andPreparedSource
isn't very nice.
While I won't call the first issue fully resolved, I would say that conduit 0.1 was a big step
in the right direction. Instead of exposing all the internals of BufferedSource
,
it's now an abstract type. (This does solve the bsourceUnpull
name
dislike, though that's obviously a minor point.) Overall, we had a move in dependent packages
away from using BufferedSource
in any external APIs. In other words,
BufferedSource
is intended purely as an internal tool. For example, in Warp, we
use BufferedSource
to parse the request headers, but then convert it back to a
Source
to pass to the application for request body reading.
I've been opposed to making any changes for the second issue (mutable variables). My belief was that one of the sources of conduits' simplicity relative to enumerators was its usage of mutable state. And in general, I don't believe in changing something until there's hard evidence that it's actually causing problems.
Last week, however, Felipe Lessa found one such concrete problem: using
SequencedSink
was very slow. Upon investigation, I determined that the problem
came from Sink
's monadic bind implementation. The issue is that for each bind, a
new mutable variable was being allocated, and it needed to be checked to determine its state.
Unfortunately, having a long chain of binds resulted in exponential complexity, having to check
N
variables for each action. This clearly needed to be fixed, but there was no
way to do so (that I could see) with the previous types.
So I was presented with a dilemna: either continue in the mutable variable path and try to solve the problem, or go in the pure/CPS direction, where I knew a simpler solution existed. The choice was actually pretty easy: go for the pure approach. I had the following reasons:
- The main motivation to avoid the change to CPS was to keep the simplicity of the current approach. However, I was about to lose that simplicity anyway.
- Like most Haskellers, I do have an innate dislike for mutable variables.
- After more work comparing conduits to enumerators, I've come to believe that the main source
of confusion in enumerators is that the data producer (
Enumerator
) is just a consumer-transformer. Since the essence ofSource
would stay the same in CPS, I think that this change does not hinder our simplicity. - There was strong reason to believe that GHC would be able to optimize CPS code better than mutable variable code.
So I took the plunge and tried out CPS... and I really like the result! The first change is to
SourceResult
's Open
constructor: instead of just returning a
new value, it returns a new value and a new Source
. This allows us to
pass our state in that new Source
. There are similar changes to
SinkResult
and ConduitResult
. After this, I benchmarked the
old and new version, comparing both a monadic-bind-intensive Sink
and a
Sink
without any binds. The former had a ten-fold speedup (not surprising due
to the decrease in algorithmic complexity), and the latter had a 20% speedup.
But that wasn't the end of it. This new approach allows us to get rid of the
Prepared
family of types. Let's take the sourceFile
function
as an example, which opens a Handle
and reads data from a file. In the old
approach, we needed to provide the PreparedSource
with the
Handle
in order for the PreparedSource
to read from it.
Therefore, we had a Source
which opened the Handle
and passed
it to the PreparedSource
. In the new approach, we have a Source
that opens a handle, reads some data, and returns a new Source
that reads from
the Handle
.
So contrary to my original belief, I think this CPS move actually simplifies conduit greatly.
Another, orthogonal change that I put in was better data types in a few places. Previously, if
you wanted to use the sourceState
function, and had a pull function that
returned Closed
, you needed to provide a dummy state value. (If you look through
current conduit
code bases, you'll see a lot of error
calls.)
Instead, we now have a specialized data type (ConduitStateResult
, name
suggestions welcome) that avoids this need. Internally, I also cleaned up a number of the types
to enforce invariants at the type level.
Speaking of invariants, the final simplification is that we now have just one invariant ruling
over the whole package: never reuse a Source
, Sink
, or
Conduit
. After you pull from a Source
, it will give you a new
Source
. Do not reuse the original Source
. If you get a
Closed
result, there is no new Source
, and therefore you
cannot pull again or close the Source
.
I encourage everyone to have a look at the Haddocks and give me your feedback.
When will this be released?
Likely some time this week. I don't have any specific changes in mind right now, outside of name adjustments that are suggested by the community.How this affects users
Anyone programming against the high-level conduit API exclusively will have no breakage. If
you're using functions like sourceIO
or sinkState
, you'll have
minimal changes to use the modified datatypes (essentially changing a few constructors and
reordering your arguments). If you're coding directly against the low-level types, you'll need
to restructure things a bit to pass around continuations.
Please email me (or preferably the Haskell cafe) if you want some help on converting old conduit code to this new set of types. For the most part, it's a mechanical process, and I can give lots of examples from the code I've already migrated.
How this affects Yesod
Yesod 0.10 will be built off of this new-and-improved conduit. In fact, the code is already updated for it. This likely means that the Yesod release will be about a week later than originally anticipated, maybe in the second week of February.