Research
Research Interest
Cross-modal data integration
Modern technologies in life sciences generate diverse types of data to reveal different aspects of the biological system, such as omics, imaging, and physiological data. We develop machine learning tools to establish relationships across different data modalities. These tools will contribute to the discovery of biological mechanism that will translate into biotechnology and medical innovations.
Current projects:
- A scalable cloud-based framework for multi-modal mapping across single neuron omics, morphology and electrophysiology. NIMH RF1MH133777
- As a part of the BRAIN Initiative Informatics program, this project aims to develop a broadly accessible cloud-based framework toward an integrative, multi-modal brain cell atlas using novel, scalable analytics tools, leveraging federated BRAIN Initiative resources and community engagement.
- Cancer Data Aggregator. NCI.
- This project builds a free NCI tool that unifies and standardizes metadata across multiple cancer data repositories, enabling researchers of all skill levels to easily search, access, and reuse cancer research data from a single interface.
Big data neuroscience
To tackle the complexity of almost 100 billion neurons in human brain, in addition to the similar amount of non-neuronal cells, we are really at the primitive stage of neuroscience data generation and analysis. Building scalable systems and high efficiency processing pipelines is crucial for neuroscience research now and in future. We leverage tools and lessons from artificial intelligence to provide novel solutions to large-scale neuroscience data and generate insights to the brain.
Current projects:
- NeMO Archive: SCORCH Support, Coordination and Outreach. NIDA UM1DA052244. (subaward)
-
The NeMO-SCORCH data center serves as the coordination center for the SCORCH consortium, where it handles the management, standardization, and distribution of single-cell data. This center ensures the data adheres to FAIR principles and facilitates integrated analysis and public reuse.
-
-
An Integrative Connectomics Coordination Center (IC3). NINDS U24NS139927. (subaward)
-
IC3 is a part of the BRAIN CONNECTS network, that works closely with the data generation teams in the network to develop cutting- edge, highly scalable technology platforms that enable the generation of unprecedented volumes of data to create comprehensive brain-wide connectivity maps in mouse, non-human primate and human.
-
Workshop: Streamlining Cross-Platform Data Integration: Processes and Solutions for Rapidly Developing an Integrated Workflow Across Independent Systems for the US BRAIN Initiative Cell Census. INCF Neuroinformatics Assembly 2023, Sep. 18-22.
A sustainable biomedical data ecosystem
This work focuses on constructing efficient, scalable life sciences data ecosystems that optimize standardized management, storage, sharing, and interoperability of cross-modal, multi-scale data to support multi-center collaborative research, implementing FAIR (Findable, Accessible, Interoperable, Reusable) principles and developing standardized data processing workflows. We strongly advocate for communication and collaboration across various ecosystems to build a sustainable environment that will continuously foster scientific discoveries.
Current projects:
- The LungMAP Data Coordination Center for Next Gen Systems Biology of Respiration. NHLBI U24HL148865 (subaward)
-
LungMAP DCC integrates multi-omics data from the research centers to create comprehensive atlases of normal and diseased lung development while providing data harmonization, cloud-based analysis tools, and community resources through the LungMAP.net ecosystem.
-
- The Neuroscience Multi-omic Data Archive (NeMO). NIMH R24MH114788 (subaward)
- The NeMO Archive is a cloud-enabled, FAIR-compliant data repository that stores, integrates, and shares multi-omic data from the BRAIN Initiative and related neuroscience projects to accelerate brain research and discovery.