Danilo Perez Rivera | Mount Sinai School of Medicine (original) (raw)

Uploads

Papers by Danilo Perez Rivera

Research paper thumbnail of DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

arXiv (Cornell University), Oct 5, 2023

In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capac... more In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.

Research paper thumbnail of GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics

ABSTRACTWe seek to transform how new and emergent variants of pandemiccausing viruses, specifical... more ABSTRACTWe seek to transform how new and emergent variants of pandemiccausing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pretraining on over 110 million prokaryotic gene sequences and finetuning a SARS-CoV-2-specific model on 1.5 million genomes, we show that GenSLMs can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLMs represents one of the first whole genome scale foundation models which can generalize to other prediction tasks. We demonstrate scaling of GenSLMs on GPU-based supercomputers and AI-hardware accelerators utilizing 1.63 Zettaflops in training runs with a sustained performance of 121 PFLOPS in mixed precision and peak of 850 PFLOPS. We present initial scientific insights from examining GenSLMs in tracking evolutionary dynamics of SARS-CoV-2...

Research paper thumbnail of Twitter under Storm: Social Media Response Sensitization in the Caribbean

Climate change has led to rising sea levels and warmer sea surface temperatures. These factors co... more Climate change has led to rising sea levels and warmer sea surface temperatures. These factors contribute greatly to the intensity of hurricanes and floods they provoke. Projections estimate there will be an increase of 45% to 87% in the frequency of Category >4 hurricanes originating in the Atlantic Basin, which typically impact the Caribbean and Continental United States of America. During the 2019 Hurricane Season, there were 20 depressions, 18 storms, 6 hurricanes and 3 major hurricanes. Through this work, we explored the response on Social Media to these natural phenomena as a function of their trajectory, intensity, and previous exposure of the population to intense natural disasters. Data was collected through the Twitter API. The influences of hurricane proximity and intensity on volume of Social Media production was explored. Hurricane Dorian, with its trajectory strongly threatening the previously exposed Puerto Rico, and eventually causing widespread damage in the Abac...

Research paper thumbnail of How differential privacy will affect our understanding of population growth in the United States

The implementation of a proposed differential privacy algorithm to 2020 US Census data releases, ... more The implementation of a proposed differential privacy algorithm to 2020 US Census data releases, and other census products has brought about discussions about the consistency and reliability of the data produced under the proposed disclosure avoidance system. We test the potential impact of this change in disclosure avoidance systems to the tracking of population growth and distribution using county-level population counts. We ask how population counts produced under the differential privacy algorithm might lead to different conclusions regarding population growth for the total population and three major racial/ethnic groups in comparison to counts produced using the traditional methods. Our results suggest that the implementation of differential privacy, as proposed, will impact our understanding of population changes in the US. We find potential for overstating and understating growth and decline, with these effects being more pronounced for non-Hispanic blacks and Hispanics, as w...

Research paper thumbnail of Engaging for Puerto Rico: #RickyRenuncia (and #RickySeQueda) during El Verano del 19 and digital identities

Between July 13-24, 2019 the people of Puerto Rico took the streets after a series of corruption ... more Between July 13-24, 2019 the people of Puerto Rico took the streets after a series of corruption scandals shocked the political establishment. The social uprising resulted in the ousting of the Governor of Puerto Rico (Dr. Ricardo Rosselló, Ricky), the resignation of the majority of his staff something unprecedented in the history of Puerto Rico; this period has been called El Verano del 19 (Summer of 19). Social media played a crucial role in both the organization and dissemination of the protests, marches, and other activities that occurred within this period. Puerto Ricans in the island and around the world engaged in this social movement through the digital revolution mainly under the hashtag #RickyRenuncia (Ricky Resign), with a small counter movement under the hashtag #RickySeQueda (Ricky will stay). The purpose of this study is to illustrate the magnitude and grass roots nature of the political movement’s social media presence, as well as their characteristics of the populati...

Research paper thumbnail of Development of a computational model of glucose toxicity in the progression of diabetes mellitus

Mathematical Biosciences and Engineering, 2016

Diabetes mellitus is a disease characterized by a range of metabolic complications involving an i... more Diabetes mellitus is a disease characterized by a range of metabolic complications involving an individual's blood glucose levels, and its main regulator, insulin. These complications can vary largely from person to person depending on their current biophysical state. Biomedical research day-by-day makes strides to impact the lives of patients of a variety of diseases, including diabetes. One large stride that is being made is the generation of techniques to assist physicians to ``personalize medicine''. From available physiological data, biological understanding of the system, and dimensional analysis, a differential equation-based mathematical model was built in a sequential matter, to be able to elucidate clearly how each parameter correlates to the patient's current physiological state. We developed a simple mathematical model that accurately simulates the dynamics between glucose, insulin, and pancreatic beta\betabeta-cells throughout disease progression with constraints to maintain biological relevance. The current framework is clearly capable of tracking the patient's current progress through the disease, dependent on factors such as latent insulin resistance or an attrite beta\betabeta-cell population. Further interests would be to develop tools that allow the direct and feasible testing of how effective a given plan of treatment would be at returning the patient to a desirable biophysical state.

Research paper thumbnail of DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

arXiv (Cornell University), Oct 5, 2023

In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capac... more In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.

Research paper thumbnail of GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics

ABSTRACTWe seek to transform how new and emergent variants of pandemiccausing viruses, specifical... more ABSTRACTWe seek to transform how new and emergent variants of pandemiccausing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pretraining on over 110 million prokaryotic gene sequences and finetuning a SARS-CoV-2-specific model on 1.5 million genomes, we show that GenSLMs can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLMs represents one of the first whole genome scale foundation models which can generalize to other prediction tasks. We demonstrate scaling of GenSLMs on GPU-based supercomputers and AI-hardware accelerators utilizing 1.63 Zettaflops in training runs with a sustained performance of 121 PFLOPS in mixed precision and peak of 850 PFLOPS. We present initial scientific insights from examining GenSLMs in tracking evolutionary dynamics of SARS-CoV-2...

Research paper thumbnail of Twitter under Storm: Social Media Response Sensitization in the Caribbean

Climate change has led to rising sea levels and warmer sea surface temperatures. These factors co... more Climate change has led to rising sea levels and warmer sea surface temperatures. These factors contribute greatly to the intensity of hurricanes and floods they provoke. Projections estimate there will be an increase of 45% to 87% in the frequency of Category >4 hurricanes originating in the Atlantic Basin, which typically impact the Caribbean and Continental United States of America. During the 2019 Hurricane Season, there were 20 depressions, 18 storms, 6 hurricanes and 3 major hurricanes. Through this work, we explored the response on Social Media to these natural phenomena as a function of their trajectory, intensity, and previous exposure of the population to intense natural disasters. Data was collected through the Twitter API. The influences of hurricane proximity and intensity on volume of Social Media production was explored. Hurricane Dorian, with its trajectory strongly threatening the previously exposed Puerto Rico, and eventually causing widespread damage in the Abac...

Research paper thumbnail of How differential privacy will affect our understanding of population growth in the United States

The implementation of a proposed differential privacy algorithm to 2020 US Census data releases, ... more The implementation of a proposed differential privacy algorithm to 2020 US Census data releases, and other census products has brought about discussions about the consistency and reliability of the data produced under the proposed disclosure avoidance system. We test the potential impact of this change in disclosure avoidance systems to the tracking of population growth and distribution using county-level population counts. We ask how population counts produced under the differential privacy algorithm might lead to different conclusions regarding population growth for the total population and three major racial/ethnic groups in comparison to counts produced using the traditional methods. Our results suggest that the implementation of differential privacy, as proposed, will impact our understanding of population changes in the US. We find potential for overstating and understating growth and decline, with these effects being more pronounced for non-Hispanic blacks and Hispanics, as w...

Research paper thumbnail of Engaging for Puerto Rico: #RickyRenuncia (and #RickySeQueda) during El Verano del 19 and digital identities

Between July 13-24, 2019 the people of Puerto Rico took the streets after a series of corruption ... more Between July 13-24, 2019 the people of Puerto Rico took the streets after a series of corruption scandals shocked the political establishment. The social uprising resulted in the ousting of the Governor of Puerto Rico (Dr. Ricardo Rosselló, Ricky), the resignation of the majority of his staff something unprecedented in the history of Puerto Rico; this period has been called El Verano del 19 (Summer of 19). Social media played a crucial role in both the organization and dissemination of the protests, marches, and other activities that occurred within this period. Puerto Ricans in the island and around the world engaged in this social movement through the digital revolution mainly under the hashtag #RickyRenuncia (Ricky Resign), with a small counter movement under the hashtag #RickySeQueda (Ricky will stay). The purpose of this study is to illustrate the magnitude and grass roots nature of the political movement’s social media presence, as well as their characteristics of the populati...

Research paper thumbnail of Development of a computational model of glucose toxicity in the progression of diabetes mellitus

Mathematical Biosciences and Engineering, 2016

Diabetes mellitus is a disease characterized by a range of metabolic complications involving an i... more Diabetes mellitus is a disease characterized by a range of metabolic complications involving an individual's blood glucose levels, and its main regulator, insulin. These complications can vary largely from person to person depending on their current biophysical state. Biomedical research day-by-day makes strides to impact the lives of patients of a variety of diseases, including diabetes. One large stride that is being made is the generation of techniques to assist physicians to ``personalize medicine''. From available physiological data, biological understanding of the system, and dimensional analysis, a differential equation-based mathematical model was built in a sequential matter, to be able to elucidate clearly how each parameter correlates to the patient's current physiological state. We developed a simple mathematical model that accurately simulates the dynamics between glucose, insulin, and pancreatic beta\betabeta-cells throughout disease progression with constraints to maintain biological relevance. The current framework is clearly capable of tracking the patient's current progress through the disease, dependent on factors such as latent insulin resistance or an attrite beta\betabeta-cell population. Further interests would be to develop tools that allow the direct and feasible testing of how effective a given plan of treatment would be at returning the patient to a desirable biophysical state.