Scalability of Cacophony's data model
Posted by Jeff Disher
Scalability of Cacophony's data model
An interesting thing about the introduction of the replyTo mechanism is that it will definitely dump more data, and more meta-data, into the main record stream of the on-IPFS data structure.

In theory, this should be just fine. In fact, the entire way that the data is ordered is to allow partial de-duplication of the CacophonyRecords structure if it were to become several MiB in size.

The actual limits we are likely to encounter are regarding how the start-up of the system works, where it loads this data from the index before starting. If that has tens of thousands of elements, it might start taking some real amount of time. It also might require that the system use more than a 32 MiB heap to store the entire index in-memory. Of course, the local data store could work differently and the data doesn't need to be in-memory, but it seems like it will be a very long time before that matters. If we ever get to the point where hundreds of interconnected users with several thousands of posts each are typically, I will happily completely redesign the local data store and local indexing strategy. None of this would require on-IPFS changes or any network disruption, though.

Another limit is likely to be each user needing to host all of their own data. While this shouldn't be a problem (it is common for people to have 100s of GiB free, these days), it is something they might find a bit odd, over time. They probably don't need to worry about it, but they would need to be aware of it.

Realistically, where scale will initially matter will likely be in the UI, itself. That is, the UI paradigms for navigating the data. This may eventually require the introduction of local data tagging (maybe even on-IPFS data tagging in a model v3) and some kind of text search index of locally-indexed data. Unless part of a complete local store redesign, even these mechanisms can probably be bolted on to the side, so I am not all too worried.

Anyway, we will see. The main problem usually isn't the technical problem, but the cultural problem of making sure people find it useful or interesting.

Jeff.