Analyze a sharded dataset#

import lamindb as ln
import lnschema_bionty as lb

ln.track()
💡 loaded instance: testuser1/test-facs (lamindb 0.54.3)
💡 notebook imports: lamindb==0.54.3 lnschema_bionty==0.31.2 scanpy==1.9.5
💡 Transform(id='zzJzdgJ763Dyz8', name='Analyze a sharded dataset', short_name='facs3', version='0', type=notebook, updated_at=2023-09-29 14:47:57, created_by_id='DzTjkKse')
💡 Run(id='LDiqMtRdidZshETUh5hp', run_at=2023-09-29 14:47:57, transform_id='zzJzdgJ763Dyz8', created_by_id='DzTjkKse')
ln.Dataset.filter().df()
name description version hash reference reference_type transform_id run_id file_id initial_version_id updated_at created_by_id
id
WmQQuZQyOo5E4NkwzF1r My versioned FACS dataset None 1 Piw2n0vdnoNoAV7ZxgsW-g None None OWuTtS4SAponz8 XrTJbAKJspEWgiAhQCHj WmQQuZQyOo5E4NkwzF1r None 2023-09-29 14:47:40 DzTjkKse
WmQQuZQyOo5E4NkwzFCS My versioned FACS dataset None 2 dmrCH-OEK94Zbh7i51wn None None SmQmhrhigFPLz8 Eb4qSnYGiAUnPvezF0Ip None WmQQuZQyOo5E4NkwzF1r 2023-09-29 14:47:50 DzTjkKse
dataset = ln.Dataset.filter(name="My versioned FACS dataset", version="2").one()
adata = dataset.load(join="inner")
/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/anndata/_core/anndata.py:1838: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
  utils.warn_names_duplicates("obs")

The AnnData has the reference to the individual files in the .obs annotations:

adata.obs.file_id.cat.categories
Index(['8SS4VRw5xd4ihh9lMn2o', 'WmQQuZQyOo5E4NkwzF1r'], dtype='object')

By default, the intersection of features is used:

adata.var.index
Index(['CD3', 'CD28', 'CD8', 'Cd4', 'CD57', 'Cd14', 'Cd19', 'CD27', 'Ccr7',
       'CD127'],
      dtype='object')

Let us create a plot:

markers = lb.CellMarker.lookup()
import scanpy as sc

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd14.name, save="_cd14")
filepath = "figures/pca_cd14"
WARNING: saving figure to file figures/pca_cd14.pdf
https://d33wubrfki0l68.cloudfront.net/f2f0ec81a03c6e5169978bbedc09dc54df80ece7/2eec5/_images/198fddd56be97e099c36c7b595cee1eb76b1dd2e6ebb642c777fe5756f62fb60.png
file = ln.File("./figures/pca_cd14.pdf", description="My result on CD14")
file.save()
file.view_flow()
https://d33wubrfki0l68.cloudfront.net/51f263a5a18b8260dcafaf9f9f611b46ff9cd122/c5e4d/_images/5a6cf799993e23a37c24df09a649237f849c2fe181de625f4133986cb23c7ef8.svg
# clean up test instance
!lamin delete --force test-facs
!rm -r test-flow
💡 deleting instance testuser1/test-facs
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--test-facs.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-facs
rm: cannot remove 'test-flow': No such file or directory