Search button


Date Range:

From: To:

View all

  • Graduate Student Center Graduate Student Center
  • General Public Presentations General Public Presentations
  • Thesis/Dissertation Seminars Thesis/Dissertation Seminars
  • Arts and Humanities Seminars Arts and Humanities Seminars
  • Education Seminars Education Seminars
  • Health Professions Seminars Health Professions Seminars
  • Professional/Business Seminars Professional/Business Seminars
  • Social Sciences Seminars Social Sciences Seminars
  • STEM* Seminars STEM* Seminars
  • Social Events Social Events
  • Student and Professional Development Student and Professional Development
  • Informational Events Informational Events
  • Important Dates Important Dates

*STEM: Science, Technology, Engineering, and Mathematics

International Community

Events Calendar   

Back to Summary

Thesis/Dissertation Seminars

Dissertation Defense: Improving the Performance of Data-Intensive Computing on Cloud Platforms

HEC 450
May 23, 2017 @ 02:00 PM - 04:00 PM

Announcing the Final Examination of Wei Dai for the degree of Doctor of Philosophy

Big Data such as Terabyte and Petabyte datasets are rapidly becoming the new norm for various organizations across a wide range of industries. The widespread data-intensive computing needs have inspired innovations in parallel and distributed computing, which has been the effective way to tackle massive computing workload for decades. One significant example is MapReduce. Since it was originally proposed by Google, MapReduce has become the most popular technology for data-intensive computing. While Google owns its proprietary implementation of MapReduce, an open source implementation called Hadoop has gained wide adoption in the rest of the world. The combination of Hadoop and Cloud platforms has made data-intensive computing much more accessible and affordable than ever before.

This dissertation includes five contributions that address the performance issue of data-intensive computing on Cloud platforms from three different aspects: task assignment, replica placement, and straggler identification. Most of the research work presented in this dissertation is conducted in the context of Hadoop running on Cloud platforms.

The first contribution presents an improved task assignment scheme based on an optimal minimum makespan algorithm. The scheme projects and compares the completion times of all task slots' next data block, and explicitly strives to shorten the map phase completion time of MapReduce jobs. The results of extensive evaluation tests indicate that, compared with the Hadoop task assignment scheme, the proposed scheme can remarkably reduce the map phase completion time, and it can reduce the amount of remote processing to a much more significant extent, which makes the data processing much less vulnerable to both network congestion and disk contention.

The replica placement policy of Hadoop Distributed File System (HDFS) has a drawback that it cannot generate balanced replica assignment, and hence has to rely on a load balancing utility to balance replica assignment across the cluster nodes at the cost of extra system resources and running time. The second contribution presents an innovative replica placement policy that can assign replicas to nodes in homogeneous clusters as evenly as possible, and also meet all replica placement requirements of HDFS. As a result, there is no need to run any load balancing utility to balance the replica assignment. The third contribution presents an improved replica placement policy that can work in heterogeneous clusters where the nodes on the same rack have the same processing capability. A more advanced and general solution is presented in the fourth contribution, which can work in any homogeneous or heterogeneous environment.

The Standard Deviation (SD) method is a commonly used straggler identification scheme in parallel processing. In spite of its wide adoption, the SD method has certain inherent limitations. The fifth contribution presents an improved straggler identification scheme based on Tukey's method for outlier detection. Tukey's method has two unique features that make it more suitable for straggler identification than the SD method. The results of extensive evaluation tests confirm that the proposed scheme can identify stragglers and, more importantly, start speculative execution earlier than the SD method.

Committee in Charge: Mostafa Bassiouni (Chair), Cliff(Changchun) Zou, Jun Wang, Mingjie Lin, Yuanli Bai


The University of Central Florida is accredited by the Southern Association of Colleges and Schools Commission on Colleges (SACSCOC) to award degrees at the associate, baccalaureate, master’s, specialist, and doctoral levels. Contact the Commission on Colleges at 1866 Southern Lane, Decatur, Georgia 30033-4097 or call (404) 679-4500 for questions about the accreditation of the University of Central Florida.

Please note the commission's expectation that contact occur only if there is evidence to support significant non-compliance with a requirement or standard. For other information about UCF’s SACSCOC accreditation, please contact the university's SACSCOC liaison in UCF's Office of Academic Affairs.

| © 2015 University of Central Florida - College of Graduate Studies