Publication Search

논문 인용하기

각 논문마다 생성되어 있는 BibTeX를 사용하시면 자신이 원하는 스타일의 인용 문구를 생성할 수 있습니다.

생성된 BibTeX 코드를 복사하여 BibTeX Parser를 사용해 일반 문자열로 바꾸십시오. 아래의 사이트와 같이 웹에서 변환할 수도 있습니다.

bibtex.online

2023

Yoon, Daegun; Oh, Sangyoon

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training Conference

30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2023), 2023.

Links | BibTeX | 태그: distributed deep learning, gradient sparsification

Yoon, Daegun; Oh, Sangyoon

DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification Conference

International Conference on Parallel Processing (ICPP) 2023, 2023.

Abstract | Links | BibTeX | 태그: distributed deep learning, gradient sparsification

@conference{nokey,

title = {DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification},

author = {Daegun Yoon and Sangyoon Oh},

url = {https://dl.acm.org/doi/10.1145/3605573.3605609},

year  = {2023},

date = {2023-08-07},

urldate = {2023-08-07},

booktitle = {International Conference on Parallel Processing (ICPP) 2023},

abstract = {Gradient sparsification is a widely adopted solution for reducing

the excessive communication traffic in distributed deep learning.

However, most existing gradient sparsifiers have relatively poor

scalability because of considerable computational cost of gradient

selection and/or increased communication traffic owing to gradient

build-up. To address these challenges, we propose a novel gradient

sparsification scheme, DEFT, that partitions the gradient selection

task into sub tasks and distributes them to workers. DEFT differs

from existing sparsifiers, wherein every worker selects gradients

among all gradients. Consequently, the computational cost can

be reduced as the number of workers increases. Moreover, gradient build-up can be eliminated because DEFT allows workers to

select gradients in partitions that are non-intersecting (between

workers). Therefore, even if the number of workers increases, the

communication traffic can be maintained as per user requirement.

To avoid the loss of significance of gradient selection, DEFT

selects more gradients in the layers that have a larger gradient

norm than the other layers. Because every layer has a different

computational load, DEFT allocates layers to workers using a binpacking algorithm to maintain a balanced load of gradient selection

between workers. In our empirical evaluation, DEFT shows a significant improvement in training performance in terms of speed

in gradient selection over existing sparsifiers while achieving high

convergence performance.},

keywords = {distributed deep learning, gradient sparsification},

pubstate = {published},

tppubtype = {conference}

}

Yoon, Daegun; Jeong, Minjoong; Oh, Sangyoon

SAGE: toward on-the-fly gradient compression ratio scaling Journal Article

In: The Journal of Supercomputing, pp. 1–23, 2023.

Abstract | Links | BibTeX | 태그: distributed deep learning, gradient sparsification

2022

Yoon, Daegun; Oh, Sangyoon

Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment Conference

The 8th International Conference on Next Generation Computing (ICNGC) 2022, 2022.

Abstract | Links | BibTeX | 태그: distributed deep learning, GPU, gradient sparsification