Empowering Discovery,
Enhancing Knowledge
Latest News
Faulty or Ready? Handling Failures in Deep-Learning Computer Vision Models until Deployment: A Study of Practices, Challenges, and Needs
Handling failures in computer vision systems that rely on deep learning models remains a challenge. While an increasing number of methods for bug identification and correction are proposed, little is known about how practitioners actually search for failures in these models. We perform an empirical study to understand the goals and needs of practitioners, the workflows and artifacts they use, and the challenges and limitations in their process. We interview 18 practitioners by probing them with a carefully crafted failure handling scenario. We observe that there is a great diversity of failure handling workflows in which cooperations are often necessary, that practitioners overlook certain types of failures and bugs, and that they generally do not rely on potentially relevant approaches and tools originally stemming from research. These insights allow to draw a list of research opportunities, such as creating a library of …
Perspective: leveraging human understanding for identifying and characterizing image atypicality
High-quality data plays a vital role in developing reliable image classification models. Despite that, what makes an image difficult to classify remains an unstudied topic. This paper provides a first-of-its-kind, model-agnostic characterization of image atypicality based on human understanding. We consider the setting of image classification “in the wild”, where a large number of unlabeled images are accessible, and introduce a scalable and effective human computation approach for proactive identification and characterization of atypical images. Our approach consists of i) an image atypicality identification and characterization task that presents to the human worker both a local view of visually similar images and a global view of images from the class of interest and ii) an automatic image sampling method that selects a diverse set of atypical images based on both visual and semantic features. We demonstrate the …
Human-centered AI: Crowd computing
Human computation (HCOMP) and crowdsourcing (Law and von Ahn, 2011; Quinn and Bederson, 2011; Kittur et al., 2013; Lease and Alonso, 2018) have been instrumental to advances seen in artificial intelligence (AI) and machine learning (ML) over the past 15+ years. AI/ML has an insatiable hunger for human labeled training to supervise models, with training data scale playing a significant (if not dominant) role in driving the predictive performance of models (Halevy et al., 2009). The centrality of such human-labeled data to the success and continuing advancement of AI/ML is thus at the heart of today’s data-centric AI movement (Mazumder et al., 2022). Moreover, recent calls for data excellence (Aroyo et al., 2022) reflect growing recognition that AI/ML data scale alone does not suffice. The quality of human labeled data also plays a tremendous role in AI/ML success, and ignoring this can be perilous to deployed AI/ML systems (Sambasivan et al., 2021), as prominent, public failures have shown.
HOMP and crowdsourcing have also enabled hybrid, human-in-the-loop, crowd-powered computing (Demartini et al., 2017). When state-of-the-art AI/ML cannot provide sufficient capabilities or predictive performance to meet practical needs for real-world deployment, hybrid systems utilize HCOMP at run-time to deliver last-mile capabilities where AI/ML fall short (Gadiraju and Yang, 2020). This has enabled a new class of innovative and more capable applications, systems, and companies to be built (Barr and Cabrera, 2006). While work in HCOMP is centuries old (Grier, 2013), access to an increasingly Internet-connected and well-educated world …
Trend and co-occurrence network of COVID-19 symptoms from large-scale social media data: infoveillance study
Background
For an emergent pandemic, such as COVID-19, the statistics of symptoms based on hospital data may be biased or delayed due to the high proportion of asymptomatic or mild-symptom infections that are not recorded in hospitals. Meanwhile, the difficulty in accessing large-scale clinical data also limits many researchers from conducting timely research.
Objective
Given the wide coverage and promptness of social media, this study aimed to present an efficient workflow to track and visualize the dynamic characteristics and co-occurrence of symptoms for the COVID-19 pandemic from large-scale and long-term social media data.
Methods
This retrospective study included 471,553,966 COVID-19–related tweets from February 1, 2020, to April 30, 2022. We curated a hierarchical symptom lexicon for social media containing 10 affected organs/systems, 257 symptoms, and 1808 synonyms. The dynamic characteristics of COVID-19 symptoms over time were analyzed from the perspectives of weekly new cases, overall distribution, and temporal prevalence of reported symptoms. The symptom evolutions between virus strains (Delta and Omicron) were investigated by comparing the symptom prevalence during their dominant periods. A co-occurrence symptom network was developed and visualized to investigate inner relationships among symptoms and affected body systems.
Results
This study identified 201 COVID-19 symptoms and grouped them into 10 affected body systems. There was a significant correlation between the weekly quantity of self-reported …

InLighta Patents
Academic Papers and Presentations by Dr. Jenny Yang

Explore Dr. Jenny Yang’s related academic papers, conference presentations, and more.