Protein Annotations
anno.Rd
A group of functions that allow for the simple search and access of protein annotation information based on an internal lookup table.
Value
get_anno()
: a tibble
of protein annotation information.
grep_anno()
: a tibble
of a subset of
protein information (annotations) corresponding
to matching rows of in pattern
.
seq_lookup()
: a tibble
, a subset of tbl
,
corresponding to the rows whose SeqIds
match the values in seq
.
seqify()
: a wranglr_seq
class object.
Details
Historical features may have been dropped from the menu
may be absent from the internal lookup table,
resulting in NA
for that row. In such cases,
they can be retrieved by explicitly passing an annotations
table corresponding to the original data prior to
the menu change. The easiest way to generate a
"time-capsuled" annotations table:
1. via attr(data, "Col.Meta")
Functions
get_anno()
: a convenient wrapper to easily retrieve the lookup table (tibble
) of proteomic annotations keyed onSeqId
.grep_anno()
: pattern matches via regular expression of the internal "annotations" table (tibble
). Matches are performed on any (regexpr = "|"
) of the following:EntrezGeneSymbol
TargetFullName
Target
List
SeqId
seq_lookup()
: conveniently looks up annotation keyed on a vector ofSeqId
s. The internal lookup dictionary is matched on values in theSeqId
column of annotations, which accesses a static internal object. Alternatively, the user can explicitly pass an annotations table of of their choice.seqify()
: converts towranglr_seq
object, primarily to dispatch the S3 print method during interactive use.
Examples
get_anno()
#> # A tibble: 11,083 × 44
#> SeqId TargetFullName EntrezGeneSymbol ApparentKdM UniProt SomaId
#> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 10000-28 Beta-crystallin B2 CRYBB2 1.24e-10 P43320 SL019…
#> 2 10001-7 RAF proto-oncogen… RAF1 1.18e-11 P04049 SL002…
#> 3 10003-15 Zinc finger prote… ZNF41 4.33e-11 P51814 SL019…
#> 4 10006-25 ETS domain-contai… ELK1 3.37e-12 P19419 SL019…
#> 5 10008-43 Guanylyl cyclase-… GUCA1A 5.26e-11 P43080 SL019…
#> 6 10010-10 Beclin-1 BECN1 3.94e-10 Q14457 SL014…
#> 7 10011-65 Inositol polyphos… OCRL 7.25e-11 Q01968 SL019…
#> 8 10012-5 SAM pointed domai… SPDEF 1.35e-10 O95238 SL014…
#> 9 10013-34 Internal Use Only… Igh NA Q99LC4 SL025…
#> 10 10014-31 Zinc finger prote… SNAI2 1.84e-11 O43623 SL007…
#> # ℹ 11,073 more rows
#> # ℹ 38 more variables: Organism <chr>, Dilution <chr>, Target <chr>,
#> # Type <chr>, List <chr>, Reason <chr>, `LoDB (RFU)` <dbl>,
#> # `Signal To Noise Plasma` <dbl>, `Intra-Plate CV Plasma` <dbl>,
#> # `Inter-Plate CV Plasma` <dbl>, `Total CV Plasma` <dbl>,
#> # `Correlation Plasma` <dbl>, `F-Statistic Plasma` <dbl>,
#> # `Lower 95% Normal Plasma (RFU)` <dbl>, …
grep_anno("^MMP") # match EntrezGeneSymbols
#> # A tibble: 24 × 44
#> SeqId TargetFullName EntrezGeneSymbol ApparentKdM UniProt SomaId
#> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 10479-18 Stromelysin-2 MMP10 2.31e-11 P09238 SL000…
#> 2 15419-15 Matrix metallopro… MMP20 1.36e-11 O60882 SL012…
#> 3 2579-17 Matrix metallopro… MMP9 2.89e-11 P14780 SL000…
#> 4 25932-61 Matrix metallopro… MMP23B 2.24e-11 O75900 SL024…
#> 5 2788-55 Stromelysin-1 MMP3 4.53e-11 P08254 SL000…
#> 6 2789-26 Matrilysin MMP7 7.89e-12 P09237 SL000…
#> 7 2838-53 Matrix metallopro… MMP17 1.88e-10 Q9ULZ9 SL003…
#> 8 2954-56 Neutrophil collag… MMP8 1.36e-11 P22894 SL000…
#> 9 31177-67 Matrix metallopro… MMP15 3.18e-11 P51511 SL003…
#> 10 33871-7 Matrix metallopro… MMP15 1.66e-11 P51511 SL003…
#> # ℹ 14 more rows
#> # ℹ 38 more variables: Organism <chr>, Dilution <chr>, Target <chr>,
#> # Type <chr>, List <chr>, Reason <chr>, `LoDB (RFU)` <dbl>,
#> # `Signal To Noise Plasma` <dbl>, `Intra-Plate CV Plasma` <dbl>,
#> # `Inter-Plate CV Plasma` <dbl>, `Total CV Plasma` <dbl>,
#> # `Correlation Plasma` <dbl>, `F-Statistic Plasma` <dbl>,
#> # `Lower 95% Normal Plasma (RFU)` <dbl>, …
grep_anno("^29") # match SeqId
#> # A tibble: 476 × 44
#> SeqId TargetFullName EntrezGeneSymbol ApparentKdM UniProt SomaId
#> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 2900-53 C-C motif chemok… CCL14 6.07e-11 Q16627 SL003…
#> 2 2906-55 Interleukin-4 IL4 1.15e-12 P05112 SL000…
#> 3 29065-61 Cyclin-dependent… CDK19 1.51e-10 Q9BWU1 SL021…
#> 4 29066-34 Myristoylated al… MARCKS 1.42e-11 P29966 SL008…
#> 5 29067-43 Protein MICAL-2 MICAL2 3.63e-12 O94851 SL014…
#> 6 29068-2 Melanoma-associa… MAGEB6 4.55e-11 Q8N7X4 SL013…
#> 7 29069-5 Tubulin beta-4 c… TUBB4A 4.69e-11 P04350 SL007…
#> 8 29070-6 TFAR15 PDCD10 1.18e-11 Q9BUL8 SL003…
#> 9 29071-219 UGT 1A10 UGT1A10 3.19e-10 Q9HAW8 SL000…
#> 10 29072-84 Calcium/calmodul… CAMK1G 1.81e-10 Q96NX5 SL021…
#> # ℹ 466 more rows
#> # ℹ 38 more variables: Organism <chr>, Dilution <chr>, Target <chr>,
#> # Type <chr>, List <chr>, Reason <chr>, `LoDB (RFU)` <dbl>,
#> # `Signal To Noise Plasma` <dbl>, `Intra-Plate CV Plasma` <dbl>,
#> # `Inter-Plate CV Plasma` <dbl>, `Total CV Plasma` <dbl>,
#> # `Correlation Plasma` <dbl>, `F-Statistic Plasma` <dbl>,
#> # `Lower 95% Normal Plasma (RFU)` <dbl>, …
grep_anno("^black") # Blacklisted
#> # A tibble: 363 × 44
#> SeqId TargetFullName EntrezGeneSymbol ApparentKdM UniProt SomaId
#> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 10013-34 Internal Use Only… Igh NA Q99LC4 SL025…
#> 2 10021-1 Internal Use Only… Igh NA Q99LC4 SL025…
#> 3 10034-16 Internal Use Only… Igh NA Q99LC4 SL025…
#> 4 10416-79 Internal Use Only… Igh NA Q99LC4 SL025…
#> 5 10427-2 Internal Use Only… Igh NA Q99LC4 SL025…
#> 6 10452-24 Internal Use Only… Igh NA Q99LC4 SL025…
#> 7 10461-57 Internal Use Only… Igh NA Q99LC4 SL025…
#> 8 10467-58 Internal Use Only… Igh NA Q99LC4 SL025…
#> 9 10471-25 Internal Use Only… Igh NA Q99LC4 SL025…
#> 10 10476-23 Internal Use Only… Igh NA Q99LC4 SL025…
#> # ℹ 353 more rows
#> # ℹ 38 more variables: Organism <chr>, Dilution <chr>, Target <chr>,
#> # Type <chr>, List <chr>, Reason <chr>, `LoDB (RFU)` <dbl>,
#> # `Signal To Noise Plasma` <dbl>, `Intra-Plate CV Plasma` <dbl>,
#> # `Inter-Plate CV Plasma` <dbl>, `Total CV Plasma` <dbl>,
#> # `Correlation Plasma` <dbl>, `F-Statistic Plasma` <dbl>,
#> # `Lower 95% Normal Plasma (RFU)` <dbl>, …
svec <- c("seq.2981.9", "seq.5073.30", "seq.4429.51", "seq.2447.7")
svec
#> [1] "seq.2981.9" "seq.5073.30" "seq.4429.51" "seq.2447.7"
seq_lookup(svec)
#> # A tibble: 4 × 9
#> seq SeqId EntrezGeneSymbol Target TargetFullName Dilution UniProt
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 seq.2981.9 2981… ESAM ESAM Endothelial c… 0.5% Q96AP7
#> 2 seq.5073.… 5073… NA NA NA NA NA
#> 3 seq.4429.… 4429… CHST6 CHST6 Carbohydrate … 20% Q9GZX3
#> 4 seq.2447.7 2447… PLA2G2E GIIE Group IIE sec… 20% Q9NZK7
#> # ℹ 2 more variables: List <chr>, Reason <chr>
# print method for class `wranglr_seq`
seqify(svec) |> class()
#> [1] "wranglr_seq" "character"
# works with seq.xxxx.xx format
seqify(svec)
#> ══ SeqId Lookup ══════════════════════════════════════════════════════════
#> SeqId-Feature GeneID Target List Reason
#> ──────────────────────────────────────────────────────────────────────────
#> ▶ seq.2981.9 ❯ ESAM ❯ Endothelial cell-selective adhesion molecule ❯ NA ❯ NA
#> ▶ seq.5073.30 ❯ NA ❯ NA ❯ NA ❯ NA
#> ▶ seq.4429.51 ❯ CHST6 ❯ Carbohydrate sulfotransferase 6 ❯ ❯
#> ▶ seq.2447.7 ❯ PLA2G2E ❯ Group IIE secretory phospholipase A2 ❯ ❯
# also works with pure SeqIds
vec <- sub("\\.", "-", sub("^seq\\.", "", svec))
vec
#> [1] "2981-9" "5073-30" "4429-51" "2447-7"
seqify(vec)
#> ══ SeqId Lookup ══════════════════════════════════════════════════════════
#> SeqId-Feature GeneID Target List Reason
#> ──────────────────────────────────────────────────────────────────────────
#> ▶ 2981-9 ❯ ESAM ❯ Endothelial cell-selective adhesion molecule ❯ NA ❯ NA
#> ▶ 5073-30 ❯ NA ❯ NA ❯ NA ❯ NA
#> ▶ 4429-51 ❯ CHST6 ❯ Carbohydrate sulfotransferase 6 ❯ ❯
#> ▶ 2447-7 ❯ PLA2G2E ❯ Group IIE secretory phospholipase A2 ❯ ❯