Step 3: Post-Processing

The post-processing function:

This vignette covers what takes place following the generation of SCONE output detailed in TheSconeWorkflow.Rmd. The obvious step that needs to take place is the Scone generated columns being merged into the original input data. The user gets the option of log base 10 transforming q values, which is easier to visualize. The user also gets the option to run t-SNE on the data, such that said maps can be colored by SCONE generated values. In this case, t-SNE is run utilizing the Rtsne package, using the same markers that were used as input for the KNN. generation.

library(Sconify)
wand.final <- PostProcessing(scone.output = wand.scone,
                         cell.data = wand.combined,
                         input = input.markers)

## Read the 1000 x 27 data matrix successfully!
## OpenMP is working. 1 threads.
## Using no_dims = 2, perplexity = 30.000000, and theta = 0.500000
## Computing input similarities...
## Building tree...
## Done in 0.11 seconds (sparsity = 0.125754)!
## Learning embedding...
## Iteration 50: error is 64.411918 (50 iterations in 0.20 seconds)
## Iteration 100: error is 60.664047 (50 iterations in 0.16 seconds)
## Iteration 150: error is 60.333087 (50 iterations in 0.17 seconds)
## Iteration 200: error is 60.215845 (50 iterations in 0.18 seconds)
## Iteration 250: error is 60.168929 (50 iterations in 0.19 seconds)
## Iteration 300: error is 1.356391 (50 iterations in 0.15 seconds)
## Iteration 350: error is 1.238554 (50 iterations in 0.13 seconds)
## Iteration 400: error is 1.205384 (50 iterations in 0.14 seconds)
## Iteration 450: error is 1.189444 (50 iterations in 0.14 seconds)
## Iteration 500: error is 1.175120 (50 iterations in 0.14 seconds)
## Iteration 550: error is 1.171034 (50 iterations in 0.13 seconds)
## Iteration 600: error is 1.168270 (50 iterations in 0.13 seconds)
## Iteration 650: error is 1.164415 (50 iterations in 0.13 seconds)
## Iteration 700: error is 1.160873 (50 iterations in 0.13 seconds)
## Iteration 750: error is 1.158613 (50 iterations in 0.13 seconds)
## Iteration 800: error is 1.155815 (50 iterations in 0.14 seconds)
## Iteration 850: error is 1.153598 (50 iterations in 0.14 seconds)
## Iteration 900: error is 1.152149 (50 iterations in 0.14 seconds)
## Iteration 950: error is 1.149654 (50 iterations in 0.14 seconds)
## Iteration 1000: error is 1.148313 (50 iterations in 0.14 seconds)
## Fitting performed in 2.96 seconds.

wand.combined # input data

## # A tibble: 1,000 × 51
##    `CD3(Cd110)Di` `CD3(Cd111)Di` `CD3(Cd112)Di` `CD235-61-7-15(In113)Di`
##             <dbl>          <dbl>          <dbl>                    <dbl>
##  1       -0.0291         -0.152          0.0859                   -0.389
##  2       -0.0291         -0.210         -0.490                    -0.369
##  3       -0.241          -0.134         -0.256                     0.913
##  4       -0.158          -0.0974        -0.266                    -1.35 
##  5        0.846          -0.229         -0.173                    -0.710
##  6       -0.00127         1.74           2.17                     -1.29 
##  7       -0.171           0.0286        -0.508                    -0.823
##  8        0.0627         -0.104         -0.416                     0.375
##  9       -0.0689         -0.152          0.493                    -0.666
## 10       -0.196           1.49          -0.0568                   -1.25 
## # ℹ 990 more rows
## # ℹ 47 more variables: `CD3(Cd114)Di` <dbl>, `CD45(In115)Di` <dbl>,
## #   `CD19(Nd142)Di` <dbl>, `CD22(Nd143)Di` <dbl>, `IgD(Nd145)Di` <dbl>,
## #   `CD79b(Nd146)Di` <dbl>, `CD20(Sm147)Di` <dbl>, `CD34(Nd148)Di` <dbl>,
## #   `CD179a(Sm149)Di` <dbl>, `CD72(Eu151)Di` <dbl>, `IgM(Eu153)Di` <dbl>,
## #   `Kappa(Sm154)Di` <dbl>, `CD10(Gd156)Di` <dbl>, `Lambda(Gd157)Di` <dbl>,
## #   `CD24(Dy161)Di` <dbl>, `TdT(Dy163)Di` <dbl>, `Rag1(Dy164)Di` <dbl>, …

wand.scone # scone-generated data

## # A tibble: 1,000 × 34
##    `pCrkL(Lu175)Di.IL7.qvalue` pCREB(Yb176)Di.IL7.qvalu…¹ pBTK(Yb171)Di.IL7.qv…²
##                          <dbl>                      <dbl>                  <dbl>
##  1                       1                          0.602                  0.981
##  2                       0.875                      0.774                  1    
##  3                       1                          0.882                  0.982
##  4                       0.939                      0.937                  0.969
##  5                       0.800                      0.949                  0.981
##  6                       0.758                      0.882                  0.999
##  7                       1                          0.980                  0.847
##  8                       1                          0.712                  0.847
##  9                       1                          0.666                  1    
## 10                       1                          0.649                  0.999
## # ℹ 990 more rows
## # ℹ abbreviated names: ¹`pCREB(Yb176)Di.IL7.qvalue`,
## #   ²`pBTK(Yb171)Di.IL7.qvalue`
## # ℹ 31 more variables: `pS6(Yb172)Di.IL7.qvalue` <dbl>,
## #   `cPARP(La139)Di.IL7.qvalue` <dbl>, `pPLCg2(Pr141)Di.IL7.qvalue` <dbl>,
## #   `pSrc(Nd144)Di.IL7.qvalue` <dbl>, `Ki67(Sm152)Di.IL7.qvalue` <dbl>,
## #   `pErk12(Gd155)Di.IL7.qvalue` <dbl>, `pSTAT3(Gd158)Di.IL7.qvalue` <dbl>, …

wand.final # the data after post-processing

## # A tibble: 1,000 × 87
##    `CD3(Cd110)Di` `CD3(Cd111)Di` `CD3(Cd112)Di` `CD235-61-7-15(In113)Di`
##             <dbl>          <dbl>          <dbl>                    <dbl>
##  1       -0.0291         -0.152          0.0859                   -0.389
##  2       -0.0291         -0.210         -0.490                    -0.369
##  3       -0.241          -0.134         -0.256                     0.913
##  4       -0.158          -0.0974        -0.266                    -1.35 
##  5        0.846          -0.229         -0.173                    -0.710
##  6       -0.00127         1.74           2.17                     -1.29 
##  7       -0.171           0.0286        -0.508                    -0.823
##  8        0.0627         -0.104         -0.416                     0.375
##  9       -0.0689         -0.152          0.493                    -0.666
## 10       -0.196           1.49          -0.0568                   -1.25 
## # ℹ 990 more rows
## # ℹ 83 more variables: `CD3(Cd114)Di` <dbl>, `CD45(In115)Di` <dbl>,
## #   `CD19(Nd142)Di` <dbl>, `CD22(Nd143)Di` <dbl>, `IgD(Nd145)Di` <dbl>,
## #   `CD79b(Nd146)Di` <dbl>, `CD20(Sm147)Di` <dbl>, `CD34(Nd148)Di` <dbl>,
## #   `CD179a(Sm149)Di` <dbl>, `CD72(Eu151)Di` <dbl>, `IgM(Eu153)Di` <dbl>,
## #   `Kappa(Sm154)Di` <dbl>, `CD10(Gd156)Di` <dbl>, `Lambda(Gd157)Di` <dbl>,
## #   `CD24(Dy161)Di` <dbl>, `TdT(Dy163)Di` <dbl>, `Rag1(Dy164)Di` <dbl>, …

# tSNE map shows highly responsive population of interest
TsneVis(wand.final, 
        "pSTAT5(Nd150)Di.IL7.change", 
        "IL7 -> pSTAT5 change")

# tSNE map now colored by q value
TsneVis(wand.final, 
        "pSTAT5(Nd150)Di.IL7.qvalue", 
        "IL7 -> pSTAT5 -log10(qvalue)")

# tSNE map colored by KNN density estimation
TsneVis(wand.final, "density")

Subsampling your data prior to running t-SNE:

If one has a large number of cells in the dataset (>100K), then t-SNE can become time-consuming and produce results that are less clean. As such, I provide a wrapper that allows one to subsample the final data and run t-SNE on the subsampled data, producing a new tibble that contains the subsampled data along with two t-SNE dimensions added to it. Note the two added dimensions at the end of the tibble are called “bh-SNE11” and “bh-SNE21”. This is because the dimensions “bh-SNE1” and “bh-SNE2” are already in the data, because t-SNE was run during the post processing step in this example. As I have stated, a user would realistically use this function with a much larger number of cells, in which case the user would have selected “tsne = FALSE” in the post.processing function detailed above in this vignette.

wand.final.sub <- SubsampleAndTsne(dat = wand.final, 
                                   input = input.markers, 
                                   numcells = 500)

## Read the 500 x 27 data matrix successfully!
## OpenMP is working. 1 threads.
## Using no_dims = 2, perplexity = 30.000000, and theta = 0.500000
## Computing input similarities...
## Building tree...
## Done in 0.04 seconds (sparsity = 0.239136)!
## Learning embedding...
## Iteration 50: error is 55.806149 (50 iterations in 0.11 seconds)
## Iteration 100: error is 54.675955 (50 iterations in 0.12 seconds)
## Iteration 150: error is 54.748604 (50 iterations in 0.12 seconds)
## Iteration 200: error is 54.609689 (50 iterations in 0.14 seconds)
## Iteration 250: error is 54.071519 (50 iterations in 0.14 seconds)
## Iteration 300: error is 0.893065 (50 iterations in 0.08 seconds)
## Iteration 350: error is 0.847295 (50 iterations in 0.06 seconds)
## Iteration 400: error is 0.831556 (50 iterations in 0.07 seconds)
## Iteration 450: error is 0.822483 (50 iterations in 0.06 seconds)
## Iteration 500: error is 0.819447 (50 iterations in 0.07 seconds)
## Iteration 550: error is 0.816091 (50 iterations in 0.07 seconds)
## Iteration 600: error is 0.812974 (50 iterations in 0.07 seconds)
## Iteration 650: error is 0.810991 (50 iterations in 0.06 seconds)
## Iteration 700: error is 0.808717 (50 iterations in 0.06 seconds)
## Iteration 750: error is 0.808218 (50 iterations in 0.06 seconds)
## Iteration 800: error is 0.807983 (50 iterations in 0.06 seconds)
## Iteration 850: error is 0.807091 (50 iterations in 0.06 seconds)
## Iteration 900: error is 0.805486 (50 iterations in 0.06 seconds)
## Iteration 950: error is 0.805247 (50 iterations in 0.06 seconds)
## Iteration 1000: error is 0.804152 (50 iterations in 0.06 seconds)
## Fitting performed in 1.62 seconds.

wand.final.sub

## # A tibble: 500 × 89
##    `CD3(Cd110)Di` `CD3(Cd111)Di` `CD3(Cd112)Di` `CD235-61-7-15(In113)Di`
##             <dbl>          <dbl>          <dbl>                    <dbl>
##  1        -0.0785       -0.00929       -0.00777                 -0.802  
##  2        -1.28         -1.09          -2.00                    -0.803  
##  3         0.0567        0.718          0.428                    0.899  
##  4        -0.624        -0.661         -0.499                   -1.33   
##  5        -0.183        -0.355         -1.02                    -1.04   
##  6         0.180        -0.104          1.48                     0.121  
##  7         0.864         0.324          1.63                    -0.324  
##  8        -1.42         -0.883         -1.43                    -1.02   
##  9        -0.0105       -0.173         -0.380                    0.00226
## 10        -0.147        -0.234         -0.0636                  -2.14   
## # ℹ 490 more rows
## # ℹ 85 more variables: `CD3(Cd114)Di` <dbl>, `CD45(In115)Di` <dbl>,
## #   `CD19(Nd142)Di` <dbl>, `CD22(Nd143)Di` <dbl>, `IgD(Nd145)Di` <dbl>,
## #   `CD79b(Nd146)Di` <dbl>, `CD20(Sm147)Di` <dbl>, `CD34(Nd148)Di` <dbl>,
## #   `CD179a(Sm149)Di` <dbl>, `CD72(Eu151)Di` <dbl>, `IgM(Eu153)Di` <dbl>,
## #   `Kappa(Sm154)Di` <dbl>, `CD10(Gd156)Di` <dbl>, `Lambda(Gd157)Di` <dbl>,
## #   `CD24(Dy161)Di` <dbl>, `TdT(Dy163)Di` <dbl>, `Rag1(Dy164)Di` <dbl>, …

Step 3: Post-Processing

Tyler J Burns

October 2, 2017

The post-processing function:

Subsampling your data prior to running t-SNE: