2022

Time-to-relapse Predictions for Alcohol Use Disorder Patients

Much of the research in the addiction sciences field is centered around opioid use or substance use disorder (SUD) in general. Alcohol use disorder (AUD) which is the most prevalent substance use disorder worldwide is less commonly studied and even less so in the context of the intersection between social work and computer science. In this paper, we use a longitudinal cohort of (n = 3290) individuals across a 12 month treatment study specific to AUD to identify homogeneous subgroups within the heterogeneous population and determine how these subgroups differ with respect to relapse events. To do this we use clustering analysis to determine the unique characteristics of patients undergoing AUD treatment and survival modeling to determine patients’ time to relapse and predictors which contribute significantly to it. Based on these results, we can better understand the factors that affect patient relapse within each individual subgroup to inform better treatment programs tailored to individuals in a given subgroup.


Towards Precise Detection of Toxic Pharmaceutical Drugs using Vision Transformers

As modern health research continues to produce more complex pharmacological treatments, the public receives an ever-growing list of prospective tablets that can aid in treating various ailments and symptoms. While there are several benefits to these medical advancements, it is critical to realize the hazards associated with medications and their chemical combinations, one of them being the risk of medication errors by patients. This work employs Vision Transformer, a deeplearning model for determining medications. It employs fast data-preprocessing, data-augmentation techniques, and attention mechanisms to reliably recognize the kind of medicine given a picture of a pill. Furthermore, the model is built using a real-world dataset and evaluated using multiple quality metrics.


Leveraging Machine Learning to predict Diabetes in Republic of Mali and Republic of Benin

Leveraging Machine Learning (ML) for medical diagnosis is a growing sub field. ML is driving the data driven model data revolution to help medical providers diagnose diseases. Prior work, in the diabetes domain has focused on ML models working on impoverished data set. These data sets mostly deal with healthcare variables and have at most 10-20 features. This current ML pipeline aims to model a comprehensive Diabetes ML model accounting for Co-morbid conditions, Food habits, Addiction history, Socio-demographic features and Living conditions. This ML pipeline model aims to build a ML model to aid medical healthcare providers in Sub-Saharan African nations. The current case study focuses on Sub-Saharan African nations of Republic of Mali and Republic of Benin.


Relationship between Crimes and Socio-economic Factors in Los Angeles

In recent times there has been a motive among researchers to study the issues related to different types of crime and socioeconomic factors across the United States and to understand the relationship between the several underlying intricacies. Over the past decade, the crime rate has increased by over 30% in the greater Los Angeles area and it continues to increase at an unprecedented rate. It is paramount to identify the underlying causes and preconditions of crime which plays a crucial role in the process of crime reduction and prevention. With the advancement of technology in recent years, getting access to crime and census data has become relatively less challenging and it paves the way to predict population-based data at a microscopic level. The sole aim of this research is to identify the correlation between several socio-economic factors (Ex: Unemployment, poverty, educational background, mean income, etc.) of the population and corresponding crime occurrences in the city of Los Angeles. In addition to this, the experiment also aims to identify the hotspots of crime and evaluate the major contributing factors to such crimes.


Predicting PM2.5 Concentration in the Aftermath of a Wildfire

Wildfires are becoming increasingly common in the United States and are being further exacerbated by the effects of climate change such as severe drought and extreme heat. Particulate matter with diameter less than 2.5 micrometers is the main pollutant emitted from wildfire smoke. PM2.5 poses significant health risks to humans. Typically, public caution regarding wildfires focuses on avoiding active wildfires and offers far fewer warnings about the resulting pollutants in their aftermath, despite a provable danger to humans. PM2.5 is typically measured using air quality monitors which can offer valuable information about the concentration of dangerous pollutants in specific localities. Unfortunately, these air quality monitors are few and far between. In this paper, we use machine learning techniques such as linear regression, decision trees, and random forests to predict the level of PM2.5 at the epicenter of a wildfire, a week after it has been contained. Our predictions will be based solely on the wildfire characteristics and land usage patterns at the site of the fire. We strongly believe that such PM2.5 predictions will benefit communities impacted by wildfires by helping them make informed decisions on when it is safe to venture out in the absence of actual PM2.5 monitors.


A Machine Learning Approach to Predict Prevalence of Type-2 Diabetes and Asthma in California state

Asthma and type 2 diabetes mellitus (T2DM) are significant public health concerns that cause substantial amounts of potentially avoidable spending on healthcare. For years, many cases of T2DM and asthma go undiagnosed, which increases healthcare expenses and produces less favorable results. T2DM and asthma are frequently undertreated as well, which results in ineffective disease management and increased costs. The social determinants of health (SDoH), which include factors in the environments in which we are born, grow, live, work, and age, are also known to affect diabetes and asthma. This paper focuses on predicting and mapping the prevalence of Diabetes and Asthma based on census tract-level data for the state of California. The rate of prevalence for both diseases can be thought of as a measure of the susceptibility to underdiagnosis for the people residing in that particular tract. Predictions were visualized on a California census tract map and compared against the true values. The underlying feature importance was used to gain deeper insights into understanding the prevalence of diabetes and asthma in a particular census tract.


Vaccine Detection - Predicting H1N1 and Seasonal Flu Vaccines based on Behavioral and Socioeconomic Factors by Geographic Regions across the US, Key Factors, and Correlation Analysis

Following the profound impact of the COVID-19 pandemic on our day to day lives, society was once again reminded of the necessity of vaccines and how they are perceived varyingly by different people. Today, it is undeniable that the quick development of vaccines as well as the public’s readiness and education on these vaccines helped life return to normal within a few years. Therefore, this paper explores and analyzes how various behavioral and socioeconomic factors have influenced the public in receiving past vaccines to the seasonal flu as well as H1N1 (swine flu) to better understand certain motivations and responses to vaccines in the past and going forward especially across geographic regions in the United States. Based on this analysis, we aim to utilize ML strategies to predict either vaccine’s administration likelihood independently, codependently, and in conjunction with each other based on selected features. Ultimately, visualize the proportion of residents predicted to take the vaccine over 10 regions across the US. This would ideally help us optimize vaccine accessibility and administration per region based on making recommendations on which factors to allocate funding for and aid overall vaccine education. This report hence provides current progress on this project and highlights future tasks needed to accomplish our goal.


Predicting Foster Care Outcomes in the United States with the National Youth in Transition Database

The National Youth in Transition Database (NYTD) hosts one of the largest sources of information on youth aging out of the foster care system in the United States. It covers nearly 10 years and consists of basic demographic information coupled with outcome data and independent living service (ILS) utilization for thousands of individuals. While past work has identified ILS as an impactful measure on positive adult outcomes in foster youth, other work has shown that utilization of these services varies based on demographics and location. This project aims to identify to what extent independent living services influence youth outcomes with respect to demographic data. In this report, we will focus on substance abuse referrals to analyze indicators for high-risk behavior in youth who may lack adequate support systems. Out of our five classification models, we determine the random forest classifier has the highest performance for substance abuse referral prediction and maintains fairness across genders, races, and geographic regions. After feature selection, our random forest classifier achieves significant improvements in the run time and most scoring metrics. Across all five models, we identified that having a connection to an adult and current school enrollment increases the likelihood of substance abuse referral. On the other hand, educational aid, public food assistance, and other public financial assistance may deter referral, meaning youth who receive these forms of aid may have lower rates of substance abuse.


2021

Crop Classification Using Deep Learning on Satellite Imagery

Sound agricultural management is one of the defining issues for federal and local governments around the world. In low- and middle-income nations especially, accurate land use and land cover data can inform policy decisions that affect all levels of a country’s agricultural sector, potentially lifting millions of subsistence farmers out of poverty, as has been seen in the People’s Republic of China. Additionally, this land use data can direct more efficient distribution of aid to farmers, through both governments themselves and nongovernmental organizations (NGOs). In recent years, the increased availability and accessibility of remote sensing data has prompted the development of various machine learning approaches to more accurately estimate land usage across a region. In this paper we propose the use of two machine learning frameworks that use satellite data and time series analysis to classify crops within a growing season in the Republic of South Africa.


Disaster Response and Damage Assessment

During disasters, multimedia content on social media sites delivers vital information. Individuals utilize social media platforms like Twitter to report updates about injured or dead people, infrastructure damage, and missing or found people, among other types of information, during natural and man-made catastrophes. Reports of injured or deceased persons, infrastructure damage, and missing or found people are among the types of information shared. According to studies, this online information can be immensely beneficial for humanitarian groups in gaining situational awareness and planning relief activities provided it is handled quickly and properly. In this research, we propose employing state-of-the-art deep learning algorithms to create a joint representation from both text and image modalities of social media data. We use convolutional neural networks for image processing and BERT for text processing to define a multimodal deep learning system. We have employed an Early Fusion and Late fusion approach to concatenate the results obtained from both text processing and image processing pipelines to define results for our algorithm and to give us a substantial boost in results compared to baselines which were work done by early approaches. With best in class networks and efficient fusion techniques, we were successfully able to surpass the existing baselines in unimodal as well multimodal set up by a great margin.


Predicting California High School Graduation and Post-Secondary School Enrollment Based on Socioeconomic and Geographic Factors

California is a large state, and thus has many high schools in areas that are rural, suburban, and urban, and have widely varying levels of poverty, population density, and income. These factors all contribute to the gap in high school graduation rates and pursuit of post-secondary enrollment. Along with these features, there are student-based factors that contribute to graduation and post-secondary enrollment rates such as test scores and absenteeism rates. The purpose of this project is to analyze the most important features contributing to student success in completing high school and their pursuit of higher education. Our project aims to provide school administration with actionable and effective ways to improve graduation and post-secondary enrollment rates based on their school’s specific characteristics.


Investigating Bias & Demographic Distribution of Crime Prediction Models on Historically Red-Lined Communities

This paper aims to quantify the effect of predictive policing tactics in Los Angeles County. Since 2008, the PredPol algorithm has been in effect in determining police activity, and has been a hot topic for debate in its usage of historical data to generate potential crime hotspots. To quantify this, our group used the Los Angeles County Sheriff’s Office 2017 and 2018 federal crime dataset to train the PredPol algorithm on drug data. We used the results to generate simulations for the year 2018, and generated distributions of fairness against demographic Census data and geographic redlining data.


2020

Analyzing California K-12 School Performance and Funding

We are driven by the broader goal of developing computational methods for improving educational access, equity, quality, and accessibility in society. This project contributes to this broader goal by (1) analyzing financial, geographic, and demographic factors that contribute to student and school achievement gaps in California, (2) developing ML approaches for predicting student performance in California schools, and (3) using insights from ML to develop optimization strategies for allocating school funding to maximize student performance.


EV Chargers in the City of Los Angeles - Prediction and Optimization

We will create a predictive model to determine what factors are most important in determining a particular EV charging station’s usage. Then, we will utilize that model in an optimization algorithm to find suitable locations where EV charging stations will provide the most use to the Los Angeles community.


COVID-19 Vaccine Allocation Project

The goal of this project is to establish a framework for vaccine distribution by predicting neighborhoods in Los Angeles that are at risk. We want to ensure that the vaccine gets in the hands of the people it could help the most - front-line medical professionals, first responders, and people at high risk. If the vaccine is allocated to sites where it can quickly reach large numbers of this priority population, it will reduce the risk of further spread in the neighborhood and allow resources to be used in other high risk neighborhoods


Detecting the Influence of COVID-19 on Social Media Discourse

Twitter has reported a record increase in its daily user figures during the COVID-19 pandemic. As a global and free platform for discussion, analyzing and tracking the COVID-19 discourse on Twitter can provide public health scientists, economists and policy makers insights on the impacts of COVID-19. This project focuses on understanding the influence of COVID-19 which may or may not be induced by various economical, social, political and health-related factors.


2019

LINK TO PAPER