3546 View
8 CommentIn discussionComments enabled
In the category:
Undefined
What should be the interpretation of cardinality of reverse attribute? Does the cardinality apply to the source or the destination of the relationship? Just to clarify!
< 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = *
Then I assume your question is, should this be read as:
Descendants of Substance which are the active ingredient of exactly 3 products, OR
Descendants of Substance which are the active ingredient of a product containing exactly 3 active ingredients
So if executed over a substrate of:
X has active ingredient S1
X has active ingredient S2
X has active ingredient S3
Y has active ingredient S1
Z has active ingredient S1
Approach 1 would result in the set {S1} and approach 2 would result in the set {S1, S2, S3}.
To me it would make sense to apply the 'R' first, before the cardinality ... which means you would be applying the cardinality of [3..3] to the substrate:
S1 is active ingredient of X
S2 is active ingredient of X
S3 is active ingredient of X
S1 is active ingredient of Y
S1 is active ingredient of Z
In which case, the answer would be {S1} - interpretation 1 - and in terms of your original question, the cardinality would apply to the source of the relationship (for each selected destination).
However, I would be interested to know if others agree or disagree with this. This certainly looks like an area in which we should improve the documentation.
P.S. - Interestingly, (based on a very quick analysis) I think that SnoQuery may use interpretation 2 and Ontoserver may use interpretation 1 (as did I).
This is a SQL interpretation of Approach 1 applied to < 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = *
SELECT DISTINCT relationships.destinationId FROM relationships WHERE relationships.active = 1 AND relationships.destinationId IN (SELECT SubtypeId FROM transitiveclosure WHERE SupertypeId = 105590001 AND PathLength > 0) AND # substance relationships.typeId = 127489000 # active ingredient GROUP BY relationships.destinationId HAVING count(relationships.Id) = 3
Personally, I think interpretation #2 might actually by the more correct, though perhaps less obviously useful interpretation in the specific domain of drug ingredients. Given that we don't actually have inverse attributes, #2 arguably fits better with the original idea of the R operator and notation as I understood it, of needing to be able to ask:
For the set of all * that satisfies:
*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|
...what is the non-redundant set of values that we encounter in the 105590001|Substance| slot?
Ie for the set of all drugs with exactly three ingredients, what is the set of substances found?
Though it does of course then beg the question of what notation is equivalent to interpretation #1 since this is itself also a perfectly valid question to ask!
8 Comments
Linda Bird
Interesting question Daniel!
If we consider an example:
< 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = *
Then I assume your question is, should this be read as:
So if executed over a substrate of:
Approach 1 would result in the set {S1} and approach 2 would result in the set {S1, S2, S3}.
To me it would make sense to apply the 'R' first, before the cardinality ... which means you would be applying the cardinality of [3..3] to the substrate:
In which case, the answer would be {S1} - interpretation 1 - and in terms of your original question, the cardinality would apply to the source of the relationship (for each selected destination).
However, I would be interested to know if others agree or disagree with this. This certainly looks like an area in which we should improve the documentation.
P.S. - Interestingly, (based on a very quick analysis) I think that SnoQuery may use interpretation 2 and Ontoserver may use interpretation 1 (as did I).
Brandon Ulrich
For what it's worth, Snow Owl/the IHTSDO terminology server are using interpretation 1.
Linda Bird
Thanks Brandon! Perfect!
Daniel Karlsson
This is a SQL interpretation of Approach 1 applied to < 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = *
SELECT DISTINCT relationships.destinationId FROM relationships
WHERE
relationships.active = 1 AND
relationships.destinationId IN (SELECT SubtypeId FROM transitiveclosure WHERE SupertypeId = 105590001 AND PathLength > 0) AND # substance
relationships.typeId = 127489000 # active ingredient
GROUP BY relationships.destinationId
HAVING count(relationships.Id) = 3
Can we get agreement on this interpretation?
Jeremy Rogers
Personally, I think interpretation #2 might actually by the more correct, though perhaps less obviously useful interpretation in the specific domain of drug ingredients. Given that we don't actually have inverse attributes, #2 arguably fits better with the original idea of the R operator and notation as I understood it, of needing to be able to ask:
For the set of all * that satisfies:
*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|
...what is the non-redundant set of values that we encounter in the 105590001|Substance| slot?
Ie for the set of all drugs with exactly three ingredients, what is the set of substances found?
Though it does of course then beg the question of what notation is equivalent to interpretation #1 since this is itself also a perfectly valid question to ask!
Daniel Karlsson
So, do we need both versions?
< 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = * - Approach #1
< 105590001 |Substance|: R [3..3] 127489000 |Has active ingredient| = * - Approach #2
Not super clear, and what would be the corresponding dot notation?
Daniel Karlsson
Approach #2 could be written as:
<105590001|Substance|: R 127489000 |Has active ingredient|= (*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|)
Then, approach #1 would be the preferred interpretation, or?
Interestingly, in SnQuery, this and the original query give the same results (at least same number and, by manual inspection, the same concepts).
Linda Bird
I agree that both approaches could be useful, and as you suggest Daniel:
It's interesting to consider that approach 2 can also be written using dot notation as:
However, I can't think of an even mildly intuitive way of representing approach 1 using dot notation. Do we need one?
Kind regards,
Linda.