Machine Learning Research
Below is a sampling of active ML research projects and labs. Additional research projects are described on the home pages of individual faculty.
|
|
The AUTON LabOur main research is into useful data structures and algorithms for making interesting statistical and learning approaches tractable on large volumes of data. We are very interested in the underlying computer science, mathematics, statistics, and in practical applications of our work. We collaborate closely with food safety analysts public health agencies, nuclear safety experts, managers of fleets of equipment, social networkers, astrophysicists, biologists, drug companies, exploration companies and roboticists. |
|
|
Brain Image Analysis Research GroupOur group develops statistical machine learning algorithms to analyze fMRI data. We are specifically interested in algorithms that can learn to identify and track the cognitive processes that give rise to observed fMRI data. |
![]() |
Cell Organizer
|
|
|
Databases Group
|
![]() |
GraphLabDesigning and implementing efficient and provably correct parallel machine learning (ML) algorithms can be very challenging. Existing high-level parallel abstractions like MapReduce are often insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. http://graphlab.org/ |
![]() |
Querendipity
Working scientists need to track an enormous amount of information -in addition to the scientific literature, which is currently growing at a rate of a million articles a year, biologists need to understand when new high-throughput experimental results have been obtained that might impact their work. The model traditionally used in biology to solve this problem is creation of a manually curated community database of experimental results and literature. The Querendipity project aims to create a new model for managing and distributing scientific data. Querendipity is a personalized adaptive information system that works by loosely integrating data of many sorts (including unstructured text) into a single structure that can be queried using "schema-free similarity queries" - which are similar to keyword queries, but allow queries to structured data with few text annotations as well as to text. http://www.cs.cmu.edu/~wcohen/querendipity/ |
|
|
Read the WebCan computers learn to read? We think so. "Read the Web" is a research project that attempts to create a computer system that learns over time to read the web. Since January 2010, our computer system called NELL (Never-Ending Language Learner) has been running continuously, attempting to perform two tasks each day: |
|
|
SAILING LabLaboratory for Statistical Artificial InteLligence & INtegrative Genomics Projects in Graphical models, Bayesian approaches, inference algorithms, and learning theories for analyzing and mining high-dimensional, longitudinal, and relational data Computational and comparative genomic analysis of biological sequences, systems biological investigation of gene regulation, and statistical analysis of genetic variation, demography and linkage (to diseases) Application of statistical learning in text/image mining, vision, and machine translation |
![]() |
SELECT LabOur main long-term research goal is developing efficient algorithms and methods for designing, analyzing, and controlling complex real-world systems. To achieve this goal, our research spans the entire spectrum from theoretical foundations to real-world applications. |
|
|
Systems Biology GroupOur group develops computational methods for understanding the dynamics, interactions and conservation of complex biological systems. As new high-throughput biological data sources become available, they hold the promise of revolutionizing molecular biology by providing a large-scale view of cellular activity. However, each type of data is noisy, contains many missing values and only measures a single aspect of cellular activity. Our computational focus is on methods for large scale data integration. We primarily rely on machine learning and statistical methods. Most of our work is carried out in close collaboration with experimentalists. Many of the computational tools we develop are available and widely used.
|










