developer/swarm.md: rewording

Change-Id: I72c5b0f7963554c981b267f4c7a962df88f41257
This commit is contained in:
Fadi Shehadeh 2025-04-22 15:45:14 -04:00 committed by Adrien Béraud
parent 76dfb31ec4
commit 41880ef0ea
1 changed files with 96 additions and 97 deletions

View File

@ -1,30 +1,22 @@
# Swarm
```{important}
Jami source code tends to use the terms **(un)ban**, while the user interface uses the terms **(un)block**.
```
A *swarm* (group chat) is a set of participants capable of resilient, decentralized communication.
For example, if two participants lose connectivity with the rest of the group (e.g., during an Internet outage) but can still reach each other over a LAN or subnetwork, they can exchange messages locally and then synchronize with the rest of the group once connectivity is restored.
## Synospis
The goal of this document is to describe how group chats (a.k.a. **swarm chat**) will be implemented in Jami.
A *swarm* is a group able to discuss without any central authority in a resilient way.
Indeed, if two person doesn't have any connectivity with the rest of the group (ie Internet outage) but they can contact each other (in a LAN for example or in a subnetwork), they will be able to send messages to each other and then, will be able to sync with the rest of the group when it's possible.
So, the *swarm* is defined by:
1. Ability to split and merge following the connectivity.
2. Syncing of the history. Anyone must be able to send a message to the whole group.
A *swarm* is defined by the following properties:
1. Ability to split and merge based on network connectivity.
2. History synchronization. Every participant must be able to send a message to the entire group.
3. No central authority. Can not rely on any server.
4. Non-repudiation. Devices must be able to verify old messages' validity and to replay the whole history.
5. PFS on the transport. Storage is managed by the device.
4. Non-repudiation. Devices must be able to verify past messages' validity and to replay the entire history.
5. Perfect Forward Secrecy (PFS) is provided on the transport channels. Storage is handled by each device.
The main idea is to get a synchronized Merkle tree with the participants.
We identified four modes for swarm chat that we want to implement:
* **ONE_TO_ONE**, basically the case we have today when you discuss to a friend
* **ADMIN_INVITES_ONLY** generally a class where the teacher can invite people, but not students
* **INVITES_ONLY** a private group of friends
* **PUBLIC** basically an opened forum
We identified four modes for swarms that we want to implement:
* **ONE_TO_ONE**: A private conversation between two endpoints—either between two users or with yourself.
* **ADMIN_INVITES_ONLY**: A swarm in which only the administrator can invite members (for example, a teacher-managed classroom).
* **INVITES_ONLY**: A closed swarm that admits members strictly by invitation; no one may join without explicit approval.
* **PUBLIC**: A public swarm that anyone can join without prior invitation (For example a forum).
## Scenarios
@ -38,55 +30,59 @@ We identified four modes for swarm chat that we want to implement:
* His device certificate in ̀ /devices`
* His CRL in ̀ /crls`
3. The hash of the first commit becomes the **ID** of the conversation
4. Bob announces to his other devices that he creates a new conversation. This is done via an invite to join the swarm sent through the DHT to other devices linked to that account.
4. *Bob* announces to his other devices that he created a new conversation. This is done via an invite to join the group sent through the DHT to other devices linked to that account.
### Adding someone
*Alice adds Bob*
*Bob adds Alice*
1. Alice adds Bob to the repo:
1. *Bob* adds Alice to the repo:
* Adds the invited URI in `/invited`
* Adds the CRL into `/crls`
2. Alice sends a request on the DHT
2. *Bob* sends a request on the DHT.
### Receiving an invite
*Alice gets the invite to join the previously create swarm*
*Alice gets the invite to join the previously created swarm*
1. She accepts the invite (if decline, do nothing, it will just stay into invited and Alice will never receive any message)
2. A peer-to-peer connection between Alice and Bob is done.
3. Alice pull the Git repo of Bob. **WARNING this means that messages need a connection, not from the DHT like today.**
4. Alice validates commits from Bob
5. To validate that Alice is a member, she removes the invite from `/invited` directory, then adds her certificate into the `/members` directory
6. Once all commits validated and on her device, other members of the group are discovered by Alice. with these peers, she will construct the **DRT** (explained below) with Bob as a bootstrap.
1. *Alice* accepts the invite (if she declines, nothing happens; she will remain in the "invited" list, and will never receive any messages)
2. A peer-to-peer connection is established between *Alice* and *Bob*.
3. *Alice* pulls the Git repository from *Bob*. **WARNING this means that messages require a connection, not from the DHT as it is today.**
4. *Alice* validates the commits from *Bob*.
5. To validate that *Alice* is a member, she removes the invite from `/invited` directory, then adds her certificate to the `/members` directory
6. Once all commits are validated and syncronized to her device, *Alice* discovers other members of the group. with these peers, she will then construct the **DRT** with *Bob* as a bootstrap.
### Sending a message
*Alice sends a message*
*Alice sends a message to Bob*
Sending a message is pretty simple. Alice writes a commit-message in the following format:
1. *Alice* creates a commit message. She constructs a JSON payload containing the MIME type and message body. For example:
```json
{
"type": "text/plain",
"body": "coucou"
"body": "hello"
}
```
2. *Alice* ensure her device credentials are present. If *Alice*s device certificate or its associated CRL isnt already stored in the repository, she adds them so that other participants can verify the commit.
and adds her device and CRL to the repository if missing (others must be able to verify the commit).
Merge conflicts are avoided because we are mostly based on commit messages, not files (unless CRLS + certificates but they are located).
Then she announces the new commit via the **DRT** with a service message (explained later) and pings the DHT for mobile devices (they must receive a push notification).
3. *Alice* commits to the repository (Because Jami relies primarily on commit-message metadata rather than file contents, merge conflicts are rare; the only potential conflicts would involve CRLs or certificates, which are versioned in a dedicated location).
For pinging other devices, the sender sends to other members a SIP message with mimetype = "application/im-gitmessage-id" containing a JSON with the "deviceId" which sends the message, the "id" of the conversation related, and the "commit"
4. *Alice* announces the commit via the **DRT** with a service message and pings the DHT for mobile devices (they must receive a push notification).
```{note}
To notify other devices, the sender transmits a SIP message with `type: application/im-gitmessage-id`.
The JSON payload includes the deviceId (the senders), the conversationId and the reference (hash) of the new commit.
```
### Receiving a message
*Bob receives the message from Alice*
*Bob receives a message from Alice*
1. *Bob* do a Git pull on *Alice*
2. Commits MUST be verified via a hook
3. If all commits are valid, commits are stored and displayed. Then *Bob* announces the message via the DRT for other devices.
4. If all commits are not valid, pull is canceled. *Alice* must reestablish her state to a correct state.
1. *Bob* performs a Git pull on *Alice*'s repository.
2. All incoming commits MUST be verified by a hook.
3. If all commits are valid, commits are stored and displayed.*Bob* then announces the message via the DRT for other devices.
4. If any commit is invalid, pull is aborted. *Alice* must restore her repository to a correct state before retrying.
### Validating a commit
@ -97,60 +93,66 @@ To avoid users pushing some unwanted commits (with conflicts, false messages, et
2. If a fetch is too big, it's not merged.
```
+ For each commits, check that the device that tries to send the commit is authorized at this moment and that the certificates are present (in /devices for the device, and in /members or /admins for the issuer).
+ 3 cases. The commit has 2 parents, so it's a merge, nothing more to validate here
+ The commit has 0 parents, it's the initial commit:
+ Check that admin cert is added
+ Check that device cert is added
+ Check CRLs added
+ Check that no other file is added
+ The commit has 1 parent, commit message is a JSON with a type:
+ If text (or other mime-type that doesn't change files)
+ Check signature from certificate in the repo
+ Check that no weird file is added outside device cert nor removed
+ If vote
+ Check that voteType is supported (ban, unban)
+ Check that vote is for the user that signs the commit
+ Check that vote is from an admin and device present and not banned
+ Check that no weird file is added nor removed
+ If member
+ If adds
+ Check that the commit is correctly signed
+ Check that certificate is added in /invited
+ Check that no weird file is added nor removed
+ If ONE_TO_ONE, check that we only have one admin, one member
+ If ADMIN_INVITES_ONLY, check that invite is from an admin
+ If joins
+ Check that the commit is correctly signed
+ Check that device is added
+ Check that invitation is moved to members
+ Check that no weird file is added nor removed
+ If banned
+ Check that vote is valid
+ Check that the user is ban via an admin
+ Check that member or device certificate is moved to banned/
+ Check that only files related to the vote are removed
+ Check that no weird file is added nor removed
+ else fail. Notify the user that they may be with an old version or that peer tried to submit unwanted commits
+ For each incoming commit, ensure that the sending device is currently authorized and that the issuers certificate exists under /members or /admins, and the devices certificate under /devices.
+ Then handle one of three cases, based on the commits parent count:
+ Merge Commit (2 parents). No further validation is required, merges are always accepted.
+ Initial Commit (0 parents). Validate that this is the very first repository snapshot:
+ Admin certificate is added.
+ Device certificate is added.
+ CRLs (Certificate Revocation Lists) are added.
+ No other files are present.
+ Ordinary Commit (1 parent). The commit message must be JSON with a toplevel `type` field. Handle each `type` as follows:
+ If `text` (or any nonfilemodifying MIME type)
+ Signature is valid against the authors certificate in the repo.
+ No unexpected files are added or removed.
+ If `vote`
+ `voteType` is one of the supported values (e.g. "ban", "unban").
+ The vote matches the signing user.
+ The signer is an admin, their device is present, and not themselves banned.
+ No unexpected files are added or removed.
+ If `member`
+ If `adds`
+ Properly signed by the inviter.
+ New members URI appears under `/invited`.
+ No unexpected files are added or removed.
+ If ONE_TO_ONE, ensure exactly one admin and one member.
+ If ADMIN_INVITES_ONLY, the inviter must be an admin.
+ If `joins`
+ Properly signed by the joining device.
+ Device certificate added under `/devices`.
+ Corresponding invite removed from `/invited` and certificate added to `/members`.
+ No unexpected files are added or removed.
+ If `banned`
+ Vote is valid per the `vote` rules above.
+ Ban is issued by an admin.
+ Targets certificate moved to /banned.
+ Only files related to the ban vote are removed.
+ No unexpected files are added or removed.
+ Fallback. If the commits type or structure is **unrecognized**, reject it and notify the peer (or user) that they may be running an outdated version or attempting unauthorized changes.
### Ban a device
*Alice, Bob, Carla, Denys are in a swarm. Alice bans Denys*
```{important}
Jami source code tends to use the terms **(un)ban**, while the user interface uses the terms **(un)block**.
```
This is one of the most difficult scenarios in our context.
Without central authority we can not trust:
*Alice, Bob, Carla, Denys are in a swarm. Alice issues a ban against Denys.*
1. Timestamps of generated commits
2. Conflicts with banned devices.
If multiple admin devices are present and if Alice can speak with Bob but not Denys and Carla; Carla can speak with Denys; Denys bans Alice, Alice bans Denys, what will be the state when the 4 members will merge the conversations.
4. A device can be compromised, stolen or its certificate can expire. We should be able to ban a device and avoid that it lies about its expiration or send messages in the past (by changing its certificate or the timestamp of its commit).
In a fully peertopeer system with no central authority, this simple action exposes three core challenges:
1. Untrusted Timestamps: Commit timestamps cannot be relied upon for ordering ban events, as any device can forge or replay commits with arbitrary dates.
2. Conflicting bans: In cases where multiple admin devices exist, network partitions can result in conflicting ban decisions. For instance, if Alice can communicate with Bob but not with Denys and Carla, while Carla can communicate with Denys, conflicting bans may occur. If Denys bans Alice while Alice bans Denys, the groups state becomes unclear when all members eventually reconnect and merge their conversation histories.
3. Compromised or expired devices: Devices can be compromised, stolen, or have their certificates expire. The system must allow banning such devices and ensure they cannot manipulate their certificate or commit timestamps to send unauthorized messages or falsify their expiration status.
Similar systems (with distributed group systems) are not so much, but these are some examples:
+ [mpOTR doesn't define how to ban someone](https://www.cypherpunks.ca/~iang/pubs/mpotr.pdf)
+ Signal, without any central server for group chat (EDIT: they recently change that point), doesn't give the ability to ban someone from a group.
This voting system needs a human action to ban someone or must be based on the CRLs info from the repository (because we can not trust external CRLs)
This voting system needs a human action to ban someone or must be based on the CRLs info from the repository (because we can not trust external CRLs).
### Remove a device from a conversation
@ -207,9 +209,9 @@ The commit message will be the following:
For now, "mode" accepts values 0 (ONE_TO_ONE), 1 (ADMIN_INVITES_ONLY), 2 (INVITES_ONLY), 3 (PUBLIC)
### Processes for 1:1 swarms
### Processes for 1:1 chats
The goal here is to keep the old API (addContact/removeContact, sendTrustRequest/acceptTrustRequest/discardTrustRequest) to generate swarm with a peer and its contact.
The goal here is to keep the old API (addContact/removeContact, sendTrustRequest/acceptTrustRequest/discardTrustRequest) to create a chat with a peer and its contact.
This still implies some changes that we cannot ignore:
The process is still the same, an account can add a contact via addContact, then send a TrustRequest via the DHT.
@ -240,7 +242,7 @@ or
In this case, two conversations are generated.
We don't want to remove messages from users or choose one conversation here.
So, sometimes two 1:1 swarm between the same members will be shown.
So, sometimes two conversations between the same members will be shown.
It will generate some bugs during the transition time (as we don't want to break API, the inferred conversation will be one of the two shown conversations, but for now it's "ok-ish", will be fixed when clients will fully handle conversationId for all APIs (calls, file transfer, etc)).
```{important}
@ -448,10 +450,7 @@ However, non-permanent messages (like messages readable only for some minutes) c
### File transfer
Swarm massively changes file transfer.
Now, all the history is syncing, allowing all devices in the conversation to easily retrieve old files.
This changes allow us to move from a logic where the sender pushed the file on other devices, via trying to connect to their devices (This was bad because not really resistant to connections changes/failures and needed a manual retry) to a logic where the sender allow other devices to download.
Moreover, any device having the file can be the host for other devices, allowing to retrieve files even if the sender is not there.
This new system overhauls file sharing: the entire history is now kept in sync, so any device in the conversation can instantly access past files. Rather than forcing the sender to push files directly—an approach that was fragile in the face of connection drops and often required manual retries—devices simply download files when they need them. Moreover, once one device has downloaded a file, it can act as a host for others, ensuring files remain available even if the original sender goes offline.
#### Protocol
@ -477,7 +476,7 @@ If valid, the file will be removed from the waiting.
In case of failure, when a device of the conversation will be back online, we will ask for all waiting files by the same way.
### Call in swarm
### Call in Swarm
#### Idea
@ -491,7 +490,7 @@ The host can be determined via two ways:
+ In the swarm metadatas. Where it's stored like the title/desc/avatar of the room
+ Or the initial caller.
When starting a call, the host will add a new commit to the swarm, with the URI to join (accountUri/deviceId/conversationId/confId).
When starting a call, the host will add a new commit to the repository, with the URI to join (accountUri/deviceId/conversationId/confId).
This will be valid till the end of the call (announced by a commit with the duration to show)
So every part will receive the infos that a call has started and will be able to join it by calling it.
@ -731,9 +730,9 @@ Like a DHT with a superuser. (Not convinced)
Currently, the file transfer algorithm is based on a TURN connection (See {doc}`file-transfer`). In the case of a big group, this will be bad. We first need a p2p implement for the file transfer. Implement the RFC for p2p transfer.
Other problem: currently there is no implementation for TCP support for ICE in PJSIP. This is mandatory for this point (in pjsip or homemade)
Other problem: currently there is no implementation for TCP support for ICE in PJSIP. This is mandatory for this point (in PJSIP or homemade)
## Resources
+ https://eprint.iacr.org/2017/666.pdf
+ <https://eprint.iacr.org/2017/666.pdf>
+ Robust distributed synchronization of networked linear systems with intermittent information (Sean Phillips and Ricardo G.Sanfelice)