aschrijver:
in AS/AP land by default the urge is to try to hammer every information model in the limited set of objects and activities that ActivityStreams provides, and not touch the whole extensibilty mechanism
This is part of the problem but the other problem is trying to hammer it at all. It should fit perfectly without any hammering.
Set theory and logical inferencing
One of the core things about information modeling is that when trying to logically reason about things, you have to realize that those things can belong to multiple sets/classes. Some of those sets may be subsets or supersets. For example, you can use Schema Dot Org to say that all BlogPostings are Articles, but not all Articles are BlogPostings. Using logic, we can infer statements that were never explicitly made, but are still accepted to be true.
For the application developer who wants to handle all manner of Articles, they might encounter something which is an Article but doesn't say that it is an Article. If there is a statement like @type: http://schema.org/BlogPosting
, you can use an RDF Schema or OWL ontology (not to be confused with Schema Dot Org) to fully infer the following knowledge:
PREFIX schema: PREFIX rdfs: # Given the following statement...<#something> a schema:BlogPosting.# And combining it with the following class relations...schema:BlogPosting rdfs:subClassOf schema:SocialMediaPosting.schema:SocialMediaPosting rdfs:subClassOf schema:Article.# We can infer the following without being explicitly told it:<#something> a schema:SocialMediaPosting.<#something> a schema:Article.schema:BlogPosting rdfs:subClassOf schema:Article.
If you're not evaluating this kind of inferencing as runtime code, you're evaluating it in your head and pre-baking it into your application based on your foreknowledge.
We can also infer classes/types based on the domains and ranges of certain properties. Instead of claiming "All Articles have an articleBody", we can instead claim "If it has articleBody, it's an Article". Here, we are saying that the domain of "articleBody" is "Article":
schema:articleBody rdfs:domain schema:Article.
Now, we can similarly infer that something is an Article simply by virtue of it having an articleBody, like duck typing:
# Given the following statement:<#something> schema:articleBody "test post please ignore".# Combined with the following RDF Schema information:schema:articleBody rdfs:domain schema:Article.# We can infer this without being explicitly told it:<#something> a schema:Article.
However, we lack enough information to say that it is specifically a BlogPosting. There are no properties that have a domain of BlogPosting. If we want people to know that it's specifically a BlogPosting, then we need to explicitly declare this. But we shouldn't need to explicitly declare every single superclass for compatibility reasons! It would be incorrect to assume that all Articles must be explicitly declared as Articles. In much the same way, we should be able to know that every as:Object is an as:Object without explicitly stating that it has @type: https://www.w3.org/ns/activitystreams#Object
. Imagine if you came across an Image that looked like this:
{ "@context": "https://www.w3.org/ns/activitystreams", "type": ["Object", "Document", "Image"]}
This is not false, but it is unnecessarily verbose if you already know that all Images are Documents and that all Documents are Objects:
PREFIX as: PREFIX rdfs: as:Image rdfs:subClassOf as:Document.as:Document rdfs:subClassOf as:Object.
Usually, this knowledge is pre-baked into applications because people who read the spec are told that these class relations exist, so if they write good code, then that code should be able to recognize that an Image is also a Document. (If only it were that easy to write good code!)
Information modeling and applying multiple models
So when it comes to modeling what exactly something is, I think that we should find commonalities between different classes of things and distill those into specific properties. ActivityStreams has a decent model for modeling Activity as "something that happened", but it's all the other parts that aren't as coherently modeled -- the Note/Article/Document/Link/Collection stuff could use more thought.
If the aim is to model communities of people online who have discussions, then something like SIOC provides a better content model. Imagine if we didn't have to worry about what was a Note and what was an Article, and we just called it a Post. Does the distinction between a Note and an Article matter? It's highly debatable. Even staying within ActivityStreams, you could consider Note and Article to both be subclasses of some kind of ContentfulObject class, where the domain of as:content is ContentfulObject. (Currently, the domain of as:content is as:Object, which is very broad.)
Put another way, the information model should match the application domain. Mastodon has a mostly microblogging-based approach, but its "statuses" actually fit multiple information models. We don't have to limit ourselves to fitting into one box!
Imagine a Mastodon API response for a Status. Why not expose this in the exact same way that the AS2 response is exposed? We can apply multiple profiles to the same resource. Here, we can use properties from AS2, Mastodon, and SIOC in equal measure. Let's take this abbreviated API response from Mastodon:
{ "uri": "https://mastodon.social/users/trwnh/statuses/114618667090037785", "url": "https://mastodon.social/@trwnh/114618667090037785", "id": "114618667090037785", "created_at": "2025-06-03T09:14:23.752Z", "sensitive": false, "content": "test post please ignore
", "account": { "uri": "https://mastodon.social/users/trwnh", "url": "https://mastodon.social/@trwnh", "id": "14715" }}
Now, let's give it a basic JSON-LD context to turn it into Linked Data. Either we can use @base
with the id
property...
{ "@context": { "@vocab": "http://joinmastodon.org/ns/api#", "@base": "https://mastodon.social/api/v1/statuses/", "id": "@id" }, "uri": "https://mastodon.social/users/trwnh/statuses/114618667090037785", "url": "https://mastodon.social/@trwnh/114618667090037785", "id": "114618667090037785", // expands with @base "created_at": "2025-06-03T09:14:23.752Z", "sensitive": false, "content": "test post please ignore
", "account": { "@context": { "@base": "https://mastodon.social/api/v1/accounts/" // override the @base }, "uri": "https://mastodon.social/users/trwnh", "url": "https://mastodon.social/@trwnh", "id": "14715" // expands against more recently defined @base }}
Or we can use uri
directly instead...
{ "@context": { "@vocab": "http://joinmastodon.org/ns/api#", "uri": "@id" }, "uri": "https://mastodon.social/users/trwnh/statuses/114618667090037785", // our @id "url": "https://mastodon.social/@trwnh/114618667090037785", "id": "114618667090037785", "created_at": "2025-06-03T09:14:23.752Z", "sensitive": false, "content": "test post please ignore
", "account": { "uri": "https://mastodon.social/users/trwnh", // our @id "url": "https://mastodon.social/@trwnh", "id": "14715" }}
We can also make statements to let us infer things:
PREFIX as: PREFIX sioc: PREFIX dcterms: PREFIX rdfs: PREFIX owl: PREFIX : # Equivalences between Mastodon API and ActivityStreams:url owl:equivalentProperty as:url.:created_at owl:equivalentProperty as:published.:sensitive owl:equivalentProperty as:sensitive.:content owl:equivalentProperty as:content.:account rdfs:subPropertyOf as:attributedTo.# Equivalences between Mastodon API and SIOC:created_at rdfs:subClassOf dcterms:created.:content rdfs:subPropertyOf sioc:content.:account rdfs:subPropertyOf sioc:has_creator.
Note that the difference between "subproperty" and "equivalent property" is that:
- if A is a "subproperty of" B, then values of A are also values of B, but values of B are not necessarily values of A. Seeing a Mastodon API url lets you infer an as:url, but seeing an as:url doesn't let you infer a Mastodon API url.
- if A is an "equivalent property" to B, then values of A are also values of B, and values of B are also values of A. Seeing a Mastodon API url lets you infer an as:url, and likewise seeing an as:url lets you infer a Mastodon API url.
(Basically, I am saying above that seeing as:attributedTo or sioc:has_creator does not immediately imply that it is a Mastodon account.)
Using our inferencing abilities, we can present this same Status in three different ways:
{ "@context": { "@vocab": "http://joinmastodon.org/ns/api#" }, "content": "test post please ignore
", // ...}
{ "@context": { "@vocab": "https://www.w3.org/ns/activitystreams#" }, "content": "test post please ignore
", // ...}
{ "@context": { "@vocab": "http://rdfs.org/sioc/ns#" }, "content": "test post please ignore
", // ...}
Or we can present this Status in a single combined resource. The challenge is in negotiating with the requester which specific representation or profile they wish to consume. But in a generic JSON-LD sense, this would work:
{ "https://www.w3.org/ns/activitystreams#content": "test post please ignore
", "http://rdfs.org/sioc/ns#content": "test post please ignore
", "http://joinmastodon.org/ns/api#content": "test post please ignore
"}
Yes, this is duplicating the information, but we have tools to prevent that, like inferencing and content negotation. The duplication arises because most/all consumers do not do any inferencing at all. This means that the publisher needs to do the consumer's inferencing for them, ahead-of-time... or otherwise define something like https://w3c.github.io/dx-connegp/connegp/
Tying it back to the FEP and "longform text" as an application domain
Ultimately, I don't think the divide between Article and Note should be given as much prominence as it currently is. If the difference actually matters, then they should have different information models. For example, if "longform text" meant something more like "the content is split into one or more sections", then this should be reflected in the information model, not stuffed into a single content
property that equally serves both "shortform" and "longform" applications.
In other words, it's a bad idea to interpret a property differently depending on which type is declared. This whole mess started because Mastodon discriminated against Article resources by not rendering the content directly, which some people don't like. I would argue that the root of this decision is that there is some indelible difference between "Note content" and "Article content" that is not being captured, insofar as you accept "Note" to mean "shortform" and "Article" to mean "longform".
The other thing at play here is that HTML is mostly unstructured data, or rather, the structure of an "article" is arbitrary. The
tag can contain basically anything. If you want something more structured, then you should actually implement that structure instead of just dumping HTML into content
.Take a look at something like https://csarven.ca/linked-data-notifications when parsed as RDF sometime, and you'll see some really interesting things:
- The resource is declared to be a
bibo:Document, sioc:Post, schema:ScholarlyArticle, prov:Entity, foaf:Document, as:Article
in equal measure. How many fediverse applications do you think would be able to recognize that this is an Article and properly handle it as such? - The resource declares that it
schema:hasPart
of an RDF List (:introduction, :related-work, :requirements-and-design-considerations, :protocol, :implementations, :analysis-and-evaluation, :conclusions, :acknowledgements)
. Each of these sections is independently addressable and described.- The HTML content of each section is available as the
schema:description
, and subsections are likewise made available via schema:hasPart
again.
- Comments are also included and similarly extensively described in multiple vocabularies -- AS2, SIOC, Schema Dot Org, Web Annotations, as appropriate.
So how much of this can we say is necessary for "longform text"? Granted, the level of detail in this scholarly article is probably far beyond what most people care to describe for a personal blog or social media account. But it's worth considering which parts make up which application domains, and therefore which information models should include which parts.
Maybe the baseline needs to change such that an "Article" is no longer just a blob of HTML content, indistinguishable from a "Note" except by what the publisher chooses to declare. If that structure is necessary, then it should be accounted for.
Or maybe it's fine that an Article is just a stub converted from the name and summary and url. Maybe we care more about the metadata than the actual content, like how as:inReplyTo
or sioc:reply_of
/sioc:has_reply
are used to indicate that one thing is a response to another thing, or sioc:has_container
can be used to link a Post to a Thread or Forum.
I still think that some of the things in this FEP are doubling down on the problem rather than making it better. Mainly, I still have concerns about the proposed use of preview
to essentially serve as an "alternate" instead, and I am concerned that the preview
being a different object will open the door to people replying to the preview when they meant to reply to the article. I also have more general concerns about the specificity of AS2-Vocab and its content model, but that's broader than just this FEP, and I don't really have a better answer at this time. I am worried that publishing AS2 documents is going to become a highly idiosyncratic thing where you have to deal with so many "fediverse" consumer quirks that you can't express yourself as you intended. By dint of having only one content
property, you are already stuck in lowest-common-denominator form. In that regard, I'm not sure how much this FEP actually "matters" for "long-form text", since many of the recommendations it makes regarding properties are recommendations that apply more generally to things that aren't "long-form text" also. It might be worth considering how much of this "long-form text" stuff would overlap with a more general/broad "social media" FEP. But again, I don't think AS2-Vocab or AP are equipped to make this distinction... nor am I sure how much it makes sense to make this distinction.