Objective An in-depth study as a Ph.D student in Computer Science or relevant area that will utilize my research and hands-on experiences.
Research InterestsHPC, Cloud Computing, System Architecture, Data Mining and Bioinformatics are major areas I'm following. In particular, my research interests focus on some novel theories and technologies in the infrastructure for Data Center and HPC, including Data-intensive and Datacenter-Scale Computing, Modeling Supercomputer and Scientific Applications, MapReduce Framework, Distributed File System, High Performance Network, Realtime Stream Data Mining, Data Center Virtualization, Cyberinfrastructure for e-Science and Next Generation Sequencing.
Education - Jan. 2012 - Dec. 2015 (expect)
School of Computing, Clemson University, USA Ph.D Student, Advisor: Dr. Amy Apon - Sep. 2009 - Dec. 2011
School of Computing, Clemson University, USA Master of Computer Science, Advisor: Dr. Feng Luo (GPA: 3.87) Thesis Topic: - Data-intensive Computing for Bioinformatics Using Virtualization
- Technologies and HPC Infrastructures
- Sep. 2005 - Jul. 2008
Graduate School of the Chinese Academy of Sciences, Beijing, China Master of IT Project Management - Sep. 1998 - Jul. 2002
Beijing Institute of Technology, Beijing, China Bachelor of Science in Engineering
Selected Projects
- Eukaryotic Automated Structural Annotation Workflow - pic
Designed and implemented an automated genome structural annotation pipeline based on Kepler workflow and HPC platform, including stages of transcript QA, repeat analysis, transcript mapping, gene prediction, combined consensus prediction and EST based refinement.
Tools: Kepler, SeqClean, RepeatMasker, PASA, Augustus, GeneMark, EVidenceModeler, Java, Perl
- The Glycan Array QSAR Tool - website
Designed and implemented a quantitative structure-activity relationship (QSAR) online tools to analyse the glycan array data based on the regression coefficients of PLS. The tool will report the sub-trees that determine the binding specificities of glycan-binding proteins.
Tools: Drupal, Webform, MATLAB, PHP
- Cotton Marker Database - website
Designed and implemented two new features for CMD website: Train View and Primer Redundancy Analysis; Applied Xen and KVM virtualization solution to the system platform of CMD; Ported all computational and data-intensive jobs to Clemson Palmetto HPC; Developed a suite of web-based bioinformatics and data mining tools to help the biological scientists without a background in computer science can easily handle big sequence data. This project funded by Cotton Incorporated and website has been accessed by users from 101 countries and 48 states in USA.
Tools: mpiBLAST, mpiFASTA, BioPerl, CMap, CAP3, MySQL, PostgreSQL, Perl, CPAN, SQL, Bash, HTML, JavaScript
- Transposable Element De Novo Annotation - poster
Deployed two bioinformatics pipelines (TEdenovo and TEannot) on HPC based on Sun Grid Engine (SGE) platform; Detected, annotated and analyzed transposable elements (TEs) repeats in genomic sequences. Worked as a cooperator with Dr. Véronique Decroocq (Project Coordinator, INRA, France).
Tools: REPET, NCBI-BLAST, WU-BLAST, RECON, PILER, CENSOR, RepeatMasker, TRF, Mreps, hmmer3, SGE, Perl, Python
- Improvement of SSRs Redundancy Identification by Machine Learning Approach - poster
Improved the accuracy of the SSRs (Microsatellites) redundant detection and reduced the cost of expert intervention in polymorphism discovery by Support Vector Machine (SVM) approach in the project of Cotton Marker Database (for Dr. Anna Blenda, PI, Cotton Marker Database) and the project of genetic map integration (for Dr. Jean-Marc Lacape, Project Coordinator, UMR-DAP, France).
Tools: LIBSVM, Weka, Perl
- FHI Chinese Chestnut Assembly - website
Deployed different assembler programs on Clemson Palmetto HPC based on MPI and MapReduce parallel models; Resolved performance and overflow problems caused by the big data; Optimized the I/O performance between bioinformatcs program and PVFS file system; Assembled the whole genome sequence of Chinese Chestnut using NGS data (20x 454 data, 47x Illumina data and limited Sanger sequences); Worked as a HPC specialist with Dr. Meg Staton (Bioinformaticist, Clemson University Genomics Institute, USA).
Tools: Celera Assembler, Abyss, Velvet, SOAPdenovo, Newbler, Quake
- VoIP Assessment Tool Base on Autonomous NAT Traversal (Course Project) - report, slides
Designed and deployed an experimental platform for VoIP assessment to capture different behaviors of NAT Traversal protocol; Evaluated and analyzed the VoIP voice quality based upon observed transport measurements.
Tools: Iperf, tc, Netfilter, C, Bash
- Hybrid Parallel Algorithms for Constructing bi-directed de Bruijn Graphs (Course Project) - report
Designed and implemented a set of NGS sequence assembly algorithms based on bi-directed de Bruijn graph using MapReduce programming framework; Evaluated the parallel performance with differences size of computing nodes and two differences of network environments; Designed a solution to enable Hadoop running on the traditional HPC platform.
Tools: Hadoop streaming, Perl, Java, Bash, PBS Script
- Compiler Design (Course Project)
Designed and implemented a compiler using C and Intel x86 assembly language, including handling and manipulating integer and real numbers, procedures, recursions, matrices, integer arithmetic, parameter passing, variable scoping, etc.
Tools: C
- Course Project: Introduction of HPC Toolkit OSCAR (Course Project) - slides
Gave an introduction for design, building and maintain HPC using OSCAR toolkit.
Work and Research Experiences
- 2012 - Present
School of Computing, Clemson University, USA Research Assistant in Dr. Apon's Research Group - 2010 - 2011
Clemson University Genomics Institute, USA
Research Assistant, Bioinformatics Application and HPC Specialist
- 2009 - 2011
School of Computing, Clemson University, USA
Research Assistant, System Administrator for Dr. Luo's Lab
- 2009 - 2011
Cotton Marker Database, USA
Research Assistant, Developer
- 2002 - 2008
Institute of Chemistry, Chinese Academy of Sciences, Beijing, China
IT Director, HPC and Datacenter System Administrator - 2003 - 2008
Center for Molecular Sciences, Chinese Academy of Sciences, Beijing, China HPC Specialist and System Administrator for Dr. ZhiGang Shuai's Research Group
- 2005 - 2007
Titan Grid Technology Co., Ltd., Beijing, China
Founder, Architect in HPC System
- 2004 - 2008
Jin Rui Jia Investment Consulting Ltd., Beijing, China
Architect in Web System
- 2001 - 2002
The Chemical Information Network, Beijing, China
Internship, System Developer
- 2000 - 2002
Zi Fang De Information Technology Co., Ltd., Beijing, China
Founder, Web System Developer
Other Certifications / Professional Training
- Microsoft Certified System Engineer (MCSE) Certification, 2002
- Microsoft Certified Database Administrators (MCDBA) Certification, 2002
- Cisco Certified Network Associate (CCNA) Training Courses, 2003
- Oracle 9i DBA/OCP Training Courses, 2005
Skills
- Operating Systems: Windows Server (12 years), Linux Server (11 years)
- Languages: C, Java, Perl, PHP, Sparc Assembly and (Learning: Go, Scala)
- Parallelism Model: MapReduce, MPI and Multithreading
- Platforms/APIs: Hadoop, Kepler, GWT, Drupal, GMOD, LabVIEW, MATLAB and Coloured Petri Nets
- Development Tools: Vim, Eclipse, GNU GCC/GDB, Intel Cluster Toolkit
Papers/Posters
Pengfei Xuan, David Camak, Feng Luo, Don
Jones, and Anna Blenda. CMD: a cotton marker database resource for gossypium
genetics and genomics research. 2011. (paper preparing)
Pengfei Xuan, Yuehua Zhang, Tzuen-rong Jeremy Tzeng, Xiu-Feng Wan, Feng Luo. A quantitative structure-activity relationship (QSAR) study on glycan array data to determine the specificities of glycan-binding proteins. Oxford Journals Glycobiology. 2011. (paper)
Pengfei Xuan, Justin Bartanus, David Camak, Feng Luo, Don C. Jones, Anna Blenda. Recent Updates of the Cotton Marker Database (CMD). Plant & Animal Genomes XX Conference. 2012. (poster) Pengfei Xuan, Feng Luo, Albert Abbott, Don Jones, Anna Blenda. Improvement of SSR redundancy identification with machine leaning approach using dataset from cotton marker database. Plant & Animal Genomes XIX Conference. 2011. (poster)
Justin Bartanus, Pengfei Xuan, Anna Blenda. Collection, Annotation and Public Database Display of the Agronomically Important Traits and QTLs in Cotton. ASB Tri-Beta. 2011. (poster) David Camak, Pengfei Xuan, Anna Blenda. Analysis of the Phenotypic Traits in Cotton Linked to the Genetically Mapped Molecular Markers. ASB Tri-Beta. 2010. (poster)
V. Decroocq, Pengfei Xuan, T. Zhebentyayeva, S. Scalabrin, I. Verde, B. Sosinski, A. Abbott. Transposable element annotation and the development of insertion site-based polymorphism markers in Prunus species. 5th International Rosaceae Genomics Conference (RGC5). 2010. (poster)
Lingyun Zhu, Xia Yang, Yuanping Yi, Pengfei Xuan, Zhigang Shuai, Jean-Luc Brédas. Three-photon absorption in anthracene-porphyrin-anthracene
triads: A quantum-chemical study. The Journal of Chemical Physics. 2004, 121:11060. (paper)
Shiwei Yin, Liping Chen, Pengfei Xuan, Ke-Qiu Chen, Zhigang Shuai. Field effect on the singlet and triplet exciton formation in
organic/polymeric light-emitting diodes. The Journal of Physical Chemistry B.
2004, 108(28):9608-13. (paper)
|
|