Interpretation of cardinality of reverse attributes

Created by Daniel Karlsson, last modified on 2017-May-15

Daniel Karlsson

Owner

4333 View 8 Comment In discussion Comments enabled In the category: Undefined

What should be the interpretation of cardinality of reverse attribute? Does the cardinality apply to the source or the destination of the relationship? Just to clarify!

Contributors (4)

Brandon Ulrich

Number of accepted comment 0

Number of comment 1
Daniel Karlsson

Number of accepted comment 0

Number of comment 3
Jeremy Rogers

Number of accepted comment 0

Number of comment 1
Anonymous

Number of accepted comment 0

Number of comment 3

8 Comments

user-ec161
Interesting question Daniel!
If we consider an example:
< 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = *
Then I assume your question is, should this be read as:
1. Descendants of Substance which are the active ingredient of exactly 3 products, OR
2. Descendants of Substance which are the active ingredient of a product containing exactly 3 active ingredients
So if executed over a substrate of:
- X has active ingredient S1
- X has active ingredient S2
- X has active ingredient S3
- Y has active ingredient S1
- Z has active ingredient S1
Approach 1 would result in the set {S1} and approach 2 would result in the set {S1, S2, S3}.
To me it would make sense to apply the 'R' first, before the cardinality ... which means you would be applying the cardinality of [3..3] to the substrate:
- S1 is active ingredient of X
- S2 is active ingredient of X
- S3 is active ingredient of X
- S1 is active ingredient of Y
- S1 is active ingredient of Z
In which case, the answer would be {S1} - interpretation 1 - and in terms of your original question, the cardinality would apply to the source of the relationship (for each selected destination).
However, I would be interested to know if others agree or disagree with this. This certainly looks like an area in which we should improve the documentation.
P.S. - Interestingly, (based on a very quick analysis) I think that SnoQuery may use interpretation 2 and Ontoserver may use interpretation 1 (as did I).
- Permalink
- 2017-May-16
1. Brandon Ulrich
  For what it's worth, Snow Owl/the IHTSDO terminology server are using interpretation 1.
  Permalink
  
  2017-May-16
  1. user-ec161
    Thanks Brandon! Perfect!
    
    Permalink
    
    2017-May-17
Daniel Karlsson
This is a SQL interpretation of Approach 1 applied to < 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = *
SELECT DISTINCT relationships.destinationId FROM relationships
WHERE
relationships.active = 1 AND
relationships.destinationId IN (SELECT SubtypeId FROM transitiveclosure WHERE SupertypeId = 105590001 AND PathLength > 0) AND # substance
relationships.typeId = 127489000 # active ingredient
GROUP BY relationships.destinationId
HAVING count(relationships.Id) = 3
Can we get agreement on this interpretation?
- Permalink
- 2017-May-16
Jeremy Rogers
Personally, I think interpretation #2 might actually by the more correct, though perhaps less obviously useful interpretation in the specific domain of drug ingredients. Given that we don't actually have inverse attributes, #2 arguably fits better with the original idea of the R operator and notation as I understood it, of needing to be able to ask:
For the set of all * that satisfies:
*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|
...what is the non-redundant set of values that we encounter in the 105590001|Substance| slot?
Ie for the set of all drugs with exactly three ingredients, what is the set of substances found?
Though it does of course then beg the question of what notation is equivalent to interpretation #1 since this is itself also a perfectly valid question to ask!
- Permalink
- 2017-May-16
Daniel Karlsson
So, do we need both versions?
< 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = * - Approach #1
< 105590001 |Substance|: R [3..3] 127489000 |Has active ingredient| = * - Approach #2
Not super clear, and what would be the corresponding dot notation?
- Permalink
- 2017-May-16
Daniel Karlsson
Approach #2 could be written as:
<105590001|Substance|: R 127489000 |Has active ingredient|= (*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|)
Then, approach #1 would be the preferred interpretation, or?
Interestingly, in SnQuery, this and the original query give the same results (at least same number and, by manual inspection, the same concepts).
- Permalink
- 2017-May-16
user-ec161
I agree that both approaches could be useful, and as you suggest Daniel:
- Approach #1 could be written as:
  < 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = *
- Approach #2 could be written as:
  <105590001|Substance|: R 127489000 |Has active ingredient|= (*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|)
It's interesting to consider that approach 2 can also be written using dot notation as:
- (*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|).127489000 |Has active ingredient|
However, I can't think of an even mildly intuitive way of representing approach 1 using dot notation. Do we need one?
Kind regards,
Linda.
- Permalink
- 2017-May-17

Space shortcuts

Page tree

Daniel Karlsson

Contributors (4)

8 Comments

user-ec161

Brandon Ulrich

user-ec161

Daniel Karlsson

Jeremy Rogers

Daniel Karlsson

Daniel Karlsson

user-ec161