bioinformatics
»» home :: projects ::
``Don’t say you don’t have enough time. You have exactly the same number of hours per day that were given to Helen Keller, Pasteur, Michaelangelo, Mother Teresa, Leonardo da Vinci, Thomas Jefferson, and Albert Einstein.'' H. Jackson Brown, Jr., writer

CSI 4900 Projects (Fall 2013 and Winter 2014)

For the following two projects, students will be supervised and work closely with members of Biodiversity Bioinformatics Group at Agriculture and Agri-Food Canada (AAFC), as well as research staff.

  1. Visual comparison of phylogenetic trees

    The goal of the project is to allow scientists to visually compare several taxonomic trees, see the differences and be able to correct one of the trees, generating a difference file.

    Background

    Taxonomy is a constantly evolving science. New organisms need to be classified and classified organisms need to be re-classified based on new discoveries in gene sequencing. In addition, there is no set standard in classification, so many scientists work with their own taxonomic trees. Therefore, it is important to be able to easily compare taxonomic trees for the purpose of publication or submission to common sequence databases.

    Project description

    The student will research available software for tree visualization and comparison. Based on the findings, the student will then create new or improve existing visualization software to be able to highlight same subtrees, move nodes on the same level, and collapse and rename the nodes. Software should also have an ability to generate a file summarizing the differences between two trees.

  2. Cloud computing project with Cloudman

    Background

    To run computationally intensive bioinformatics programs, Bioinformatics Group at AAFC maintains and administers a cluster of 368 processing cores, called Biocluster. The biocluster runs a Rocks Cluster Distribution, which is a Linux-based distribution and uses Grid Engine to accept, dispatch, and manage jobs, submitted by user.

    Project description

    Work with the team to configure and install Eucalyptus private cloud. Investigate and install Cloudman cloud manager (http://usecloudman.org) to manage compute clusters on a Eucalyptus cloud, instead of queue management with Grid Engine. In addition, integrate Cloudman with Galaxy scientific workflow platform, which is used at AAFC for bioinformatics workflow.

The projects describe below are derived from my research.

  1. Frequent Subgraph Mining for the Discovery of RNA Structures and Interactions

    Background

    Frequent subgraph mining is a useful method for extracting meaningful patterns from a set of graphs or a single large graph. In this work, the graph represents all possible RNA structures and interactions.

    Project

    Several projects are available. One of them consists of implementing an efficient stem enumeration module using suffix tree, suffix array or trie data structure. An other project consists of developing a Web interface for this software system (RiboFSM).

  2. Frequent Subgraph Mining (FSM), Description Logic (DL), and Inductive Logic Programming (ILP)

    Backrground

    Frequent subgraph mining is a useful method for extracting meaningful patterns from a set of graphs or a single large graph. Description Logic is the foundation of Web semantic technologies. Inductive Logic programming is an area of research at the intersection of logic programming and machine learning.

    Project

    For students looking for a more theoretical research subject. The project consists of researching the possibility of using Description Logic to represent background information, examples, hypotheses in inductive logic programming systems. Furthermore, since the data represented using Description Logic formalisms is often massively large, can frequent subgraph mining algorithms be used by ILP inference engines.

  3. Asymmetrical Substitution Matrices (Asym)

    Sequence alignment techniques require a substitution matrix, which tells the algorithm the cost of aligning two, possibly different, amino acid types, in an alignment. These matrices are typically symmetrical, i.e. the cost of substituting a for b and b for a are the same. Basic physico-chemistry tells us that these costs should not be the same. This project explores the possibility of developing a novel framework based on asymmetrical substitution matrices. One framework that can be used for developing these matrices is the Markov chains. This will involve collecting data from an established database called Balibase so as to construct a M(1) matrix. Develop a novel multiple sequence alignment method based on the new matrices. Finally, validate the approach.

  4. RNA-RNA Secondary Structure Motifs Editor (R2S2Ed)

    The project involves developing a graphical user interface for interactively building RNA-RNA secondary structure motifs. Contact me to get further information.

  5. Other ideas include: exploring the possibilities of Cloud Computing (MapReduce, CloudBurst...) for bioinformatics applicactions [could possibly be done in partnership with industry], as well as graphics card (Open CL...).

Projects with practical and immediate applications

  1. Web Application for Managing Syllabi and Graduate Attribute Assessments

    The accreditation of Engineering and Computer Science programs requires the assessment of graduate attributes (akin to degree level expectations). In order to assist the work of program coordinators and professors, the project consists of developing a Web application for the management of two kinds of data: course syllabi and graduate attribute assessments.

  2. Booth Allocation Application (Booth)

    This project consists of writing an application to assist the allocation of space (booths) at conference events. One of the tasks in planning an event, such as the organization of a symposium, consists of allocating space for the exhibitors. Their preferences must be taken into consideration, for example: preference for a location near the entrance, near the buffet, not too close to competitors X, Y and Z, etc. These requirements can be formulated as a constraint satisfaction problem (CSP). In fact, a prototype implementation of this application has been written in SWI-Prolog and its CSP library. The objectives of this project are 1) to design a graphical user interface allowing to represent the booths and input the user's constraints, 2) write an algorithm (using an existing CSP library) to find all the solutions, 3) display the solutions.

Past Projects:

  1. D3.js visualisation of sequence and structure RNA motifs

    2014 F, Joseph Sleiman

  2. An Efficient and Effective Algorithm to Evolve Regular Expressions

    2012 F, Manuel Belmadani

  3. Application Web pour la gestion des ressources du 412e Escadron

    2011, Jean-Philippe Pellerin

  4. Iterative Maximum Parsimony Multiple Sequence Alignment (ParAli)

    2010 W, Derek O'Brien

  5. A Genetic Programming Approach to RNA-RNA Interaction Motif Discovery (GP-RNA^2)

    2009 F, Christopher Saunders

  6. Approximate Matching of RNA Secondary Structure Expressions Containing Pseudoknots (pkSeed)

    2006 F, Penny J.X. Pan

  7. Progressive Simultaneous Alignment and Structure Prediction of Multiple RNA Sequences (hD)

    2006 W, Luke Cen

  8. Implementation of Range Minimum Query Algorithm (RMQ)

    2006 W, Ayse Abacioglu

  9. Approximate Matching of RNA Secondary Structure Expressions (RNA Matching)

    2005 W, Sol Ackerman

  10. Implementing a Parallel Version of Dynalign for the SunFire V880 architecture (pD)

    2004 W, Philippe Desjardins

  11. A Genetic Programming Approach to RNA Secondary Structure Motif Discovery (GP)

    2003 F, Robert Collier

  12. Simulating Genetic Drift (Sim)

    2003 F, Alain Gagnon

  13. RNA Secondary Structure Viewer (RS2V)

    2003, F, Dina Bilenkis

  14. String Algorithms in Java (Suffix Trees)

    2003 W, Daniela Cernea

  15. Learning Representations of Protein Inter-Domain Linkers Using Inductive Logic Programming (Linkers)

    2003 W, Patrick Wisking

  16. Intelligent Agents for Updating Biological Databases (AgentDB)

    2003 W, Navneet Bhalla

  17. Simulator for the TC-1101 Computer (VM)

    2002 W, Yvgeniya Lozdernik

  18. Protein Viewer/Modeler Written with Java 3D (Java Protein 3D Viewer)

    2002 W, Andrew Henry, Elton Lum and Devin Kennedy