5.4 KiB
Payment processor
Implementation details
Strong typing
The crate::core
module contain three new-type-style tuple-struct definitions:
ClientId
, TxId
, and TxAmount
. The reason for new-typing instead of using
bare u16
or u32
values is to avoid potentially mixing up different kinds of
ids.
Monetary values
The implementation for TxAmount
makes use of a fpdec::Dec
instead of
a floating point value: using floats for financial values is a terrible idea,
instead we should use integers to represent exact decimal fractions.
fpdec::Dec
was used instead of a custom implementation for convenience. It
does allow storing values with a precision that is way higher than is needed in
our case (input/output should be limited 4 digits after the decimal point) but
this does not affect us negatively. It would be easy to swap in a custom numeric
type if it turns out to be more appropriate for our needs down the line.
Serialization/deserialization
Unfortunately, most of the code relating to (de)serialization is less than
ideal, from a code aesthetic view-point, under-using the idiomatic serde
library. Since CSV as an input/output format is very limited when paired with
serde
, either some hacks were needed to massage the input into the correct
format for consumption by the payment processor (see
crate::transaction::TransactionRecord
) or for output (see
crate::Ledger::dump_csv
). The reasons are outlined in the related commits for
these features (see 1, 2, 3 for upstream issues).
Another thing to note: Ledger::dump_csv
outputs the accounts in order (even
though they are stored unordered), to simplify diff-ing and testing.
Account information
Accounts are represented as a map from ClientId
to AccountInfo
. The funds
are split between held_funds
and available_funds
(relating to disputes). The
total funds are computed when needed but otherwise not stored to avoid
dis-synchronization between a potential total_funds
and its `held_funds
- available_funds` equivalent.
Once a client account has been frozen (after a Chargeback
transaction) then
any attempt to modify the balance of this account will result in an error.
Transaction log
It is assumed that each transaction id is unique, however the key to map into
transaction-related data is (ClientId, TxId)
. This is done to simplify the
error-handling in case a valid transaction is used with an invalid user and
vice-versa. Since all of dispute
, resolve
, and chargeback
reference both
ids together, we should check that both of them are correct before further
processing.
Testing
All behaviour testing was done using unit tests runnable with cargo test
.
expect-test
(of rust-analyzer
fame) was used to easily compare between the
final state of a ledger and its expected value, and for the ease of writing new
tests (or updating them, should a bug be found and squashed).
Error handling
All errors that are raised from Ledger::process
are non-fatal. They are the
result of invalid input, and mean that the transaction's processing was stopped,
and it is ignored.
The few errors states that could arise, from internal invariants not being
upheld, are instead surfaced through panics as further processing cannot go on
once an invariant is broken. The few expect
calls inside
Ledger::get_past_transaction_info
are used for this reason.
Parallelisation
Currently, the code is single threaded, reading the input CSV in a streaming
fashion and processing transactions one-at-a-time. If we wanted to expose the
ledger in a way that allows multiple transactions to transpire at once (e.g:
a REST API), we would need to guard data accesses to ensure that no invariants
are broken. Similarly, TOCTOU issues would be a big deal (for example, trying to
chargeback
the same dispute
d transaction twice at the same time: the first
one could finish processing after the second one already checked that it was
initially in dispute
state but before applying the chargeback, resulting in
a double charge). To deal with that we could serialize all accesses with a big
Mutex
, but this would be slow and no different from handling each transaction
serially. Or we could instead have a lock per account affected by a transaction
which would be obtained at the very beginning of that transaction's processing
before any other action is taken. This would allow transactions affecting
different accounts to be processed in parallel, and would not impact the
correctness of the ledger or affect its internal invariants. This would be done
thanks to a concurrent hash-map with fine-grained locking (either bucket-level
or entry-level).
Disputes
It is unclear whether dispute
can be applied to both deposit
and withdraw
transactions, or only the later. I took the more general position that it should
be possible to do both, but the point could be made that, rationally, who in
their right minds would dispute against some unexpected funds showing up in
their accounts (it's free money!).
Handling disputes is otherwise a very simple state machine, so I implemented it
using the TxState
enum. This meant that any of dispute
, resolve
, and
chargeback
are handled very similarly, and resulted in (IMHO) quite elegant
code to guard against illegal states (like trying to dispute a transaction that
was already disputed).