processor/README.md

5.4 KiB

Payment processor

Implementation details

Strong typing

The crate::core module contain three new-type-style tuple-struct definitions: ClientId, TxId, and TxAmount. The reason for new-typing instead of using bare u16 or u32 values is to avoid potentially mixing up different kinds of ids.

Monetary values

The implementation for TxAmount makes use of a fpdec::Dec instead of a floating point value: using floats for financial values is a terrible idea, instead we should use integers to represent exact decimal fractions. fpdec::Dec was used instead of a custom implementation for convenience. It does allow storing values with a precision that is way higher than is needed in our case (input/output should be limited 4 digits after the decimal point) but this does not affect us negatively. It would be easy to swap in a custom numeric type if it turns out to be more appropriate for our needs down the line.

Serialization/deserialization

Unfortunately, most of the code relating to (de)serialization is less than ideal, from a code aesthetic view-point, under-using the idiomatic serde library. Since CSV as an input/output format is very limited when paired with serde, either some hacks were needed to massage the input into the correct format for consumption by the payment processor (see crate::transaction::TransactionRecord) or for output (see crate::Ledger::dump_csv). The reasons are outlined in the related commits for these features (see 1, 2, 3 for upstream issues).

Another thing to note: Ledger::dump_csv outputs the accounts in order (even though they are stored unordered), to simplify diff-ing and testing.

Account information

Accounts are represented as a map from ClientId to AccountInfo. The funds are split between held_funds and available_funds (relating to disputes). The total funds are computed when needed but otherwise not stored to avoid dis-synchronization between a potential total_funds and its `held_funds

  • available_funds` equivalent.

Once a client account has been frozen (after a Chargeback transaction) then any attempt to modify the balance of this account will result in an error.

Transaction log

It is assumed that each transaction id is unique, however the key to map into transaction-related data is (ClientId, TxId). This is done to simplify the error-handling in case a valid transaction is used with an invalid user and vice-versa. Since all of dispute, resolve, and chargeback reference both ids together, we should check that both of them are correct before further processing.

Testing

All behaviour testing was done using unit tests runnable with cargo test. expect-test (of rust-analyzer fame) was used to easily compare between the final state of a ledger and its expected value, and for the ease of writing new tests (or updating them, should a bug be found and squashed).

Error handling

All errors that are raised from Ledger::process are non-fatal. They are the result of invalid input, and mean that the transaction's processing was stopped, and it is ignored.

The few errors states that could arise, from internal invariants not being upheld, are instead surfaced through panics as further processing cannot go on once an invariant is broken. The few expect calls inside Ledger::get_past_transaction_info are used for this reason.

Parallelisation

Currently, the code is single threaded, reading the input CSV in a streaming fashion and processing transactions one-at-a-time. If we wanted to expose the ledger in a way that allows multiple transactions to transpire at once (e.g: a REST API), we would need to guard data accesses to ensure that no invariants are broken. Similarly, TOCTOU issues would be a big deal (for example, trying to chargeback the same disputed transaction twice at the same time: the first one could finish processing after the second one already checked that it was initially in dispute state but before applying the chargeback, resulting in a double charge). To deal with that we could serialize all accesses with a big Mutex, but this would be slow and no different from handling each transaction serially. Or we could instead have a lock per account affected by a transaction which would be obtained at the very beginning of that transaction's processing before any other action is taken. This would allow transactions affecting different accounts to be processed in parallel, and would not impact the correctness of the ledger or affect its internal invariants. This would be done thanks to a concurrent hash-map with fine-grained locking (either bucket-level or entry-level).

Disputes

It is unclear whether dispute can be applied to both deposit and withdraw transactions, or only the later. I took the more general position that it should be possible to do both, but the point could be made that, rationally, who in their right minds would dispute against some unexpected funds showing up in their accounts (it's free money!).

Handling disputes is otherwise a very simple state machine, so I implemented it using the TxState enum. This meant that any of dispute, resolve, and chargeback are handled very similarly, and resulted in (IMHO) quite elegant code to guard against illegal states (like trying to dispute a transaction that was already disputed).