My research area is artificial intelligence, with an emphasis on machine learning foundations and applications. My work focuses on the exciting emerging field of computational media, and bridging the theory-practice gap in clustering.

Computational media and computational creativity are two related, emerging fields in Artificial Intelligence. Computational media offers many opportunities to utilize the power of machine learning in new domains, including visual art, dance, music and computer games, while computational creativity challenges us to design systems that act as autonomous creative agents, and assist humans in their creative endeavors. These exciting new areas of research have also been used to find innovative solutions in scientific discovery and education. My work in these areas combines my expertise in machine learning and training in the arts in order to transform the performing arts through interactive media.

Clustering is one of the most popular data mining tools, used in a wide range of applications. Yet, despite its popularity, there is a substantial gap between clustering theory and practice. My work focuses on bridging this gap through the integration of clustering theory and practice in the standard batch model, as well as the increasingly popular streaming model for clustering big data. For more on this line of research, have a look at my brief overview of formal foundations of clustering. I am also interested in information retrieval, game theory, automata theory, and bioinformatics.

My current research projects: Here is a complete list of my publications.

ALYSIA: Automated LYrical SongwrIting Application

While music was traditionally considered a purely human creative activity, the application of data analysis to this domain has seen much success in recent years, enabling the analysis of musical scores with unprecedented speed and accuracy, as well as the automatic generation of musical works in the style of specific human composers that is indistinguishable from the original composer to the lay listener. Similarly, data analysis has seen much success in text generation, from poetry, to metaphors, to the increasingly popular twitter-bots. However, remaining virtually unexplored is the combination of text and music generation.

Combining my expertise in machine learning with my extensive vocal training and performance experience, we've developed ALYSIA -- a system that creates melodies for user-provided lyrics. In a project with David Loker and Christopher Cassion, we designed a fully data-driven system that produces melodies and rhyme based on any given text.

To learn more about this project, and to listen to songs made using ALYSIA, check out ALYSIA's dedicated page.

Movement-Based Language for Interactive Dynamic Set Manipulation for Dance

In recent years, there has been a rise in the use of projection-mapped environments applied to dance performance. This revolutionary performance tool creates a computer generated, projected environment that dancers appear to influence through their movement, resulting in stunning performances (see, for example, performances by erna.) Unfortunately, the application of such systems is limited by their complexity, necessitating a computing expert to act as an intermediary between the dancers and the program, which raises both the complexity and cost of using these systems and as such limits their utility to larger dance companies. The price and complexity of these systems also keeps them out of reach of dance students, making it challenging to learn this important aspect of their art. The systems are also often not truly interactive, requiring an extreme degree of accuracy from the dancers or relying on the tracing the dancer's movement on a touchscreen. Can truly interactive sets be integrated into dance performance in a manner that is compatible with dancers' native practices and forms of creative expression? Can we further eliminate the need for computing expertise, making it easy for dancers to expand their domain through movement-triggered visualizations?

In collaboration with dancer and correographer Jennifer Petuch, Dr. Gary Tyson and our students Taylor Brockhoeft, James Bach, and Emil Djerekarov, we are creating a movement-based language that would allow an easy integration of technology into the dance creation process. In parallel, we are building a system that would then allow dancers to trigger projected effects through movement by relying on native rehearsal and performances practices. This would eliminate the need for a middle-man computing expert and make dynamic sets readily accessible to dance students as well as smaller dance companies. The system is entirely movement driven, allowing it to become an integral part of the dance process, from choreography to rehearsals to performance, without the need for a tech expert. This work explores a little-studied space in computational creativity where computer systems aid in human creativity by providing an interface that is natural to the intended user despite the inherent complexity of the system, by leveraging complexity in the domain expertise of the intended user. Here is a demonstration of our fully interactive system. In line with the goals of professional dance, our system requires no change to the dancer's attire, which could hinder movement. Body movement is tracked through infrared lights, which directly affects the visualizations projected on the screen. This video is designed to reveal some of the inside workings of the system. More videos forthcoming!

Our work on augemented reality for dance recently appreared in the news! Check out the article and video on WTXL's website.

Back to project list.

Clustering Bio-Acoustic Signals of Marine Animals

There are many regions of the ocean where there is little knowledge about odontocete species assemblages, in particular, many species that are not acoustically well-understood. Standard classification techniques rely on the use of labeled training data. Unfortunately, for this application, labeled data is often unavailable and difficult to attain. Together with Dr. Marie Roch's team and Dr. Simone Baumman-Pickering from the Scripps Institution of Oceanography, we are studying the application of unsupervised clustering techniques to bio-acoustic signals in order to group data by species without requiring labeled data. This work is funded by the Office on Navel Research.
Photo credit

Back to project list.

Practical Clusterability Evaluation

The goal of clustering is to discover cluster structure in data. However, such structure is not always present, in which case it becomes impossible to successfully apply this data mining tool. As such, one of the most fundamental concepts in clustering is clusterability, which aims to quantify the degree to which data possesses cluster structure. An integral part of the clustering pipeline, a measure of clusterability can inform a clustering user whether there is sufficient inherent cluster structure to successfully apply this data mining tool.

Figure 3: The clustering pipeline, depicting the role of clusterability evaluation in the clustering process. Clusterability enables the user to determine whether the data possesses sufficient cluster structure to be meaningfully partitioned.
Unfortunately, clusterability of data is typically NP-hard to evaluate (Ackerman and Ben-David, AISTATS 2009), greatly limiting the utility of this concept in practice. In addition, clusterability notions that are not NP-hard are too strict for use on real data. Lastly, clusterability notions often rely on specific algorithms or objective functions, effectively inverting the clustering pipeline by requiring that an algorithm be selected before it was determined whether the data can be meaningfully clustered. Further, as different algorithms often produce radically different partitions on the same data, algorithm-dependent notions are limited in their ability to detect only those cluster structures which their underlying algorithms can identify. Together with Dr. Naomi Brownstein and my student Andreas Adolfsson, we discovered several clusterability evaluations that are independent of any algorithm or objective function, computationally efficient, and successfully capture how real data sets are structured in practice.
Figure 4: An example of data sets on which different clustering algorithms produce different results. Algorithms based on the k-means objective tend to produce the clustering depicted on the right, while classical linkage-based methods such as single-linkage and average-linkage, produce the partitioning on the left.This illustrates the need for algorithm-independent clusterability evaluation.

Back to project list.