doc: add 'README.md'

2022-08-23 16:50:58 +02:00 · 2022-08-23 16:50:58 +02:00 · 8d89bdf31d
parent e66f4f687d
commit 8d89bdf31d
1 changed files with 111 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,111 @@
+# Payment processor
+
+## Implementation details
+
+### Strong typing
+
+The `crate::core` module contain three new-type-style tuple-struct definitions:
+`ClientId`, `TxId`, and `TxAmount`. The reason for new-typing instead of using
+bare `u16` or `u32` values is to avoid potentially mixing up different kinds of
+ids.
+
+### Monetary values
+
+The implementation for `TxAmount` makes use of a `fpdec::Dec` instead of
+a floating point value: using floats for financial values is a terrible idea,
+instead we should use integers to represent exact decimal fractions.
+`fpdec::Dec` was used instead of a custom implementation for convenience. It
+does allow storing values with a precision that is way higher than is needed in
+our case (input/output should be limited 4 digits after the decimal point) but
+this does not affect us negatively. It would be easy to swap in a custom numeric
+type if it turns out to be more appropriate for our needs down the line.
+
+### Serialization/deserialization
+
+Unfortunately, most of the code relating to (de)serialization is less than
+ideal, from a code aesthetic view-point, under-using the idiomatic `serde`
+library. Since CSV as an input/output format is very limited when paired with
+`serde`, either some hacks were needed to massage the input into the correct
+format for consumption by the payment processor (see
+`crate::transaction::TransactionRecord`) or for output (see
+`crate::Ledger::dump_csv`). The reasons are outlined in the related commits for
+these features (see [1], [2], [3] for upstream issues).
+
+Another thing to note: `Ledger::dump_csv` outputs the accounts in order (even
+though they are stored unordered), to simplify diff-ing and testing.
+
+[1]: https://github.com/BurntSushi/rust-csv/issues/211
+[2]: https://github.com/BurntSushi/rust-csv/issues/172
+[3]: https://github.com/BurntSushi/rust-csv/issues/98
+
+### Account information
+
+Accounts are represented as a map from `ClientId` to `AccountInfo`. The funds
+are split between `held_funds` and `available_funds` (relating to disputes). The
+total funds are computed when needed but otherwise not stored to avoid
+dis-synchronization between a potential `total_funds` and its `held_funds
+ available_funds` equivalent.
+
+Once a client account has been frozen (after a `Chargeback` transaction) then
+any attempt to modify the balance of this account will result in an error.
+
+### Transaction log
+
+It is assumed that each transaction id is unique, however the key to map into
+transaction-related data is `(ClientId, TxId)`. This is done to simplify the
+error-handling in case a valid transaction is used with an invalid user and
+vice-versa. Since all of `dispute`, `resolve`, and `chargeback` reference both
+ids together, we should check that both of them are correct before further
+processing.
+
+### Testing
+
+All behaviour testing was done using unit tests runnable with `cargo test`.
+`expect-test` (of `rust-analyzer` fame) was used to easily compare between the
+final state of a ledger and its expected value, and for the ease of writing new
+tests (or updating them, should a bug be found and squashed).
+
+### Error handling
+
+All errors that are raised from `Ledger::process` are non-fatal. They are the
+result of invalid input, and mean that the transaction's processing was stopped,
+and it is ignored.
+
+The few errors states that could arise, from internal invariants not being
+upheld, are instead surfaced through panics as further processing cannot go on
+once an invariant is broken. The few `expect` calls inside
+`Ledger::get_past_transaction_info` are used for this reason.
+
+### Parallelisation
+
+Currently, the code is single threaded, reading the input CSV in a streaming
+fashion and processing transactions one-at-a-time. If we wanted to expose the
+ledger in a way that allows multiple transactions to transpire at once (e.g:
+a REST API), we would need to guard data accesses to ensure that no invariants
+are broken. Similarly, TOCTOU issues would be a big deal (for example, trying to
+`chargeback` the same `dispute`d transaction twice at the same time: the first
+one could finish processing after the second one already checked that it was
+initially in `dispute` state but before applying the chargeback, resulting in
+a double charge). To deal with that we could serialize all accesses with a big
+`Mutex`, but this would be slow and no different from handling each transaction
+serially. Or we could instead have a lock per account affected by a transaction
+which would be obtained at the very beginning of that transaction's processing
+before any other action is taken. This would allow transactions affecting
+different accounts to be processed in parallel, and would not impact the
+correctness of the ledger or affect its internal invariants. This would be done
+thanks to a concurrent hash-map with fine-grained locking (either bucket-level
+or entry-level).
+
+### Disputes
+
+It is unclear whether `dispute` can be applied to both `deposit` and `withdraw`
+transactions, or only the later. I took the more general position that it should
+be possible to do both, but the point could be made that, rationally, who in
+their right minds would dispute against some unexpected funds showing up in
+their accounts (it's free money!).
+
+Handling disputes is otherwise a very simple state machine, so I implemented it
+using the `TxState` enum. This meant that any of `dispute`, `resolve`, and
+`chargeback` are handled very similarly, and resulted in (IMHO) quite elegant
+code to guard against illegal states (like trying to dispute a transaction that
+was already disputed).