doc: add 'README.md'

This commit is contained in:
Bruno BELANYI 2022-08-23 16:50:58 +02:00
parent e66f4f687d
commit 8d89bdf31d

111
README.md Normal file
View file

@ -0,0 +1,111 @@
# Payment processor
## Implementation details
### Strong typing
The `crate::core` module contain three new-type-style tuple-struct definitions:
`ClientId`, `TxId`, and `TxAmount`. The reason for new-typing instead of using
bare `u16` or `u32` values is to avoid potentially mixing up different kinds of
ids.
### Monetary values
The implementation for `TxAmount` makes use of a `fpdec::Dec` instead of
a floating point value: using floats for financial values is a terrible idea,
instead we should use integers to represent exact decimal fractions.
`fpdec::Dec` was used instead of a custom implementation for convenience. It
does allow storing values with a precision that is way higher than is needed in
our case (input/output should be limited 4 digits after the decimal point) but
this does not affect us negatively. It would be easy to swap in a custom numeric
type if it turns out to be more appropriate for our needs down the line.
### Serialization/deserialization
Unfortunately, most of the code relating to (de)serialization is less than
ideal, from a code aesthetic view-point, under-using the idiomatic `serde`
library. Since CSV as an input/output format is very limited when paired with
`serde`, either some hacks were needed to massage the input into the correct
format for consumption by the payment processor (see
`crate::transaction::TransactionRecord`) or for output (see
`crate::Ledger::dump_csv`). The reasons are outlined in the related commits for
these features (see [1], [2], [3] for upstream issues).
Another thing to note: `Ledger::dump_csv` outputs the accounts in order (even
though they are stored unordered), to simplify diff-ing and testing.
[1]: https://github.com/BurntSushi/rust-csv/issues/211
[2]: https://github.com/BurntSushi/rust-csv/issues/172
[3]: https://github.com/BurntSushi/rust-csv/issues/98
### Account information
Accounts are represented as a map from `ClientId` to `AccountInfo`. The funds
are split between `held_funds` and `available_funds` (relating to disputes). The
total funds are computed when needed but otherwise not stored to avoid
dis-synchronization between a potential `total_funds` and its `held_funds
+ available_funds` equivalent.
Once a client account has been frozen (after a `Chargeback` transaction) then
any attempt to modify the balance of this account will result in an error.
### Transaction log
It is assumed that each transaction id is unique, however the key to map into
transaction-related data is `(ClientId, TxId)`. This is done to simplify the
error-handling in case a valid transaction is used with an invalid user and
vice-versa. Since all of `dispute`, `resolve`, and `chargeback` reference both
ids together, we should check that both of them are correct before further
processing.
### Testing
All behaviour testing was done using unit tests runnable with `cargo test`.
`expect-test` (of `rust-analyzer` fame) was used to easily compare between the
final state of a ledger and its expected value, and for the ease of writing new
tests (or updating them, should a bug be found and squashed).
### Error handling
All errors that are raised from `Ledger::process` are non-fatal. They are the
result of invalid input, and mean that the transaction's processing was stopped,
and it is ignored.
The few errors states that could arise, from internal invariants not being
upheld, are instead surfaced through panics as further processing cannot go on
once an invariant is broken. The few `expect` calls inside
`Ledger::get_past_transaction_info` are used for this reason.
### Parallelisation
Currently, the code is single threaded, reading the input CSV in a streaming
fashion and processing transactions one-at-a-time. If we wanted to expose the
ledger in a way that allows multiple transactions to transpire at once (e.g:
a REST API), we would need to guard data accesses to ensure that no invariants
are broken. Similarly, TOCTOU issues would be a big deal (for example, trying to
`chargeback` the same `dispute`d transaction twice at the same time: the first
one could finish processing after the second one already checked that it was
initially in `dispute` state but before applying the chargeback, resulting in
a double charge). To deal with that we could serialize all accesses with a big
`Mutex`, but this would be slow and no different from handling each transaction
serially. Or we could instead have a lock per account affected by a transaction
which would be obtained at the very beginning of that transaction's processing
before any other action is taken. This would allow transactions affecting
different accounts to be processed in parallel, and would not impact the
correctness of the ledger or affect its internal invariants. This would be done
thanks to a concurrent hash-map with fine-grained locking (either bucket-level
or entry-level).
### Disputes
It is unclear whether `dispute` can be applied to both `deposit` and `withdraw`
transactions, or only the later. I took the more general position that it should
be possible to do both, but the point could be made that, rationally, who in
their right minds would dispute against some unexpected funds showing up in
their accounts (it's free money!).
Handling disputes is otherwise a very simple state machine, so I implemented it
using the `TxState` enum. This meant that any of `dispute`, `resolve`, and
`chargeback` are handled very similarly, and resulted in (IMHO) quite elegant
code to guard against illegal states (like trying to dispute a transaction that
was already disputed).