profile

Aurel Prosz, PhD Computational biologist | Machine learning engineer

About me

Computational biologist and Machine learning engineer with strong physics background. Experienced in using predictive modeling, data processing, and statistical data analysis to solve challenging problems in biology. Skilled in R and Python. Passionate about network science and its application in the field of cancer research.

Resume

Proficiency

Data acquisition and understanding | Exploratory and statistical data analysis | Business understanding and applications

Code

Skilled in R and Python | Pytorch and Tensorflow | Bioinformatics pipelines | Docker

Domain knowledge

Bioinformatics and biophysics | Cancer genomics and the biology of aging | Basic financial modeling | Large Language Models

Blog posts and news

mountains

Introducing LLM-SCI-GEN, a collection of curated papers about scientific hypothesis generation with Large Language Models.

  • Github
  • LLMs

LLM-SCI-GEN, a repository for papers on scientific hypothesis generation using large language models (LLMs). Our aim is to curate and maintain a comprehensive collection of research in this dynamic field, ensuring that the latest advancements and insights are accessible to researchers, practitioners, and enthusiasts alike.

Check it out
mountains

Introducing the Paurel-tools project: Docker images for bioinformatics and computational biology.

  • Dockerization
  • Python & R & Docker

More details are coming soon!

Check it out

My Projects and Portfolio

mountains

Generating novel scientific hypotheses in breast cancer research using NVIDIA cloud endpoints

  • LLMs
  • Python & Streamlit

This project leverages the power of large language models to assist researchers, scientists, and enthusiasts in generating innovative scientific ideas given the output of statistical learning models in breast cancer research. Built with NVIDIA NIM and Langchain.

Check it out
mountains

Get Me A Nobel Prize - Generate and visualize new scientific ideas with LLMs

  • LLMs
  • Python & Streamlit

A streamlined interface built on Streamlit that interacts with the GPT-4 or GPT-3.5 models (custom models coming soon!) from OpenAI to generate scientific hypotheses. Users can input their scientific problems or the output of their machine learning algorithm of choice to generate potential solutions and visualizations. The visualization is powered by UMAP, and is used to visualize the one-sentence embeddings of the generated hypotheses, where semanticaly similar hypotheses are going to be closer together.

Check it out
mountains

Biologically informed deep neural networks in human aging

  • Deep learning
  • Python

Aging is defined by steady buildup of damage and is a risk factor for chronic diseases. Epigenetic mechanisms like DNA methylation may play a role in organismal aging, but whether they are active drivers or consequences is unknown. Here we present XAI-AGE, which is a biologically informed, explainable deep neural network model for accurate biological age prediction across many tissues. We show that this approach can identify differentially activated pathways and biological processes from the latent layers of the neural network, and is based on a recently published explainable model used in cancer research, called PNET by Elmarakeby et al..

Check it out
mountains

Evolving networks

  • Network theory
  • Python & R

A lot of real world systems in nature, like society, the Internet and many biological phenomenons can be described as a network, which evolves in time. By modelling the structure of these one can understand, predict, and optimize the behaviour of dynamic systems. My aim with this project is to study these kind of networks by reconstructing the telecommunication network of the provinces of Milan. The resulting model is a temporal, undirected, weighted network, with a time resolution of 100 milliseconds. During my project I succesfully identified periodicities in the daily and weekly periods of communication exchange between the provinces and located outlier events.

Check it out
mountains

Simulating single cell RNA-seq data using Generative Adversarial Neural Networks

  • Deep learning
  • Tensorflow

Recent advances have enabled gene expression profiling of single cells at lower cost. As more data is produced there is an increasing need to integrate diverse datasets and better analyse underutilised data to gain biological insights. Generative adversarial networks (GANs) are deep neural net architectures comprised of two nets, pitting one against the other (thus the “adversarial”). In my project I plan to develop and apply various GAN architectures on scRNA-seq data to uncover previously unknown gene regulatory relationships and regulators of epidermal cell state and cancer signaling. Using the trained generator neural network I demonstrate that GANs can be used to predict the effect of cell state perturbations on unseen single cells. To make the results more interpretable I plan to perform localized (cell subpopulation) gene coexpression network analysis from large-scale generated scRNA-seq profiles.

Check it out
mountains

Coupled predator-prey models and their application in investing in the stock market

  • R
  • Modeling chaotic systems

Predator-prey models are important topics not only in biology, but in other fields like plasma physics, economics, or even in criminology. In this project I present the classical Lotka-Volterra (LV) equations and its modified versions. I also reproduce a financial model described in (Addison, Bhatt, and Owen 2016), which can be used by a stock market trader to eliminate some of its risks involving the trading of specific stocks by buying the shares of so-called prey companies, and sell them to a predator company. The stock market also has equilibrium, so it behaves like an ecosystem and understanding the ramifications of these mechanisms can provide the investor with insights that yield competitive advantages.

Check it out
mountains

Simulating cryptocurrency price trajectories using jump-diffusion models

  • R
  • Applied physics
  • Blockchain

Blockchain technology opened a whole new class of decentralized digital currencies, which can be traded without a central bank or a single administrator. The cryptocurrency market has a market capitalization of $184 000 000 000 as of November 2018. The market itself has unique properties, like extremely high trading volume and price volatility in relatively short amount of time. In this project I utilize jump-diffusion models, which are originated from other fields, like condensed matter physics, to simulate the price trajectories of certain cryptocurrencies and indexes. I first calibrate the specific models using the historical time series of the cryptocurrency price and log returns. After this I simulate the trajectory of new price values and use this information to price derivative market assets, for example options. I also plan to use this method to explore the dynamics of Initial Coin Offerings.

Check it out
mountains

Developing a stacked predictor to predict cosmological redshift from photometric data

  • Python
  • Cosmology

To create a 3D map of the Universe we need to measure 3 coordinates of galaxies. The celestial coordinates on the sky are easy to get, but measuring their distance is much harder. Since the Universe is expanding, the photons from a far away galaxy are also expanded during their long journey. The expansion of photons is the redshift: well known spectral features are shifted to redder colors. According to Hubble's law, the distance is approximately proportional to the redshift. Since galaxies are very faint, it takes a lot of obesrving time to spread their light in spectrographs and get high resolution spectrum. So, we have spectroscopic redshift only for a limited set of them, for others, redshift may be estimated form broadband photometry, the brighness of galaxies took by few color filters. Estimating the redshift from this limited set is called photometric redshift estimation.

Check it out