Presto JVM 性能调优 GCLocker 垃圾回收

qianmoQ

Presto集群中某节点丢失上报心跳状态(服务未宕机),查看该节点日志发现以上gc清理日志.

[510515.508s][warning][gc,alloc] http-worker-159944: Retried waiting for GCLocker too often allocating 11648 words
[510515.508s][warning][gc,alloc] page-buffer-client-callback-8474: Retried waiting for GCLocker too often allocating 19073 words
[510515.508s][warning][gc,alloc] http-worker-153295: Retried waiting for GCLocker too often allocating 14450 words
[510515.508s][warning][gc,alloc] http-worker-160657: Retried waiting for GCLocker too often allocating 15585 words
[510515.508s][warning][gc,alloc] http-worker-159719: Retried waiting for GCLocker too often allocating 20110 words
[510515.508s][warning][gc,alloc] http-worker-161471: Retried waiting for GCLocker too often allocating 13867 words
[510515.508s][warning][gc,alloc] http-worker-159190: Retried waiting for GCLocker too often allocating 7651 words
[510561.503s][warning][gc,alloc] task-processor-82635: Retried waiting for GCLocker too often allocating 18700 words
[510561.503s][warning][gc,alloc] task-processor-81526: Retried waiting for GCLocker too often allocating 18910 words
[510561.504s][warning][gc,alloc] task-processor-81315: Retried waiting for GCLocker too often allocating 15505 words
[510561.504s][warning][gc,alloc] task-processor-81319: Retried waiting for GCLocker too often allocating 15505 words

通过日志和最近的查询历史来看有大对象分配逻辑导致清理gc变慢,gc1的大文件清理和分配逻辑如下详细文件GitHub:

for (uint try_count = 1, gclocker_retry_count = 0; /* we'll return */; try_count += 1) {
    bool should_try_gc;
    uint gc_count_before;

    {
      MutexLockerEx x(Heap_lock);

      result = humongous_obj_allocate(word_size);
      if (result != NULL) {
        size_t size_in_regions = humongous_obj_size_in_regions(word_size);
        g1_policy()->add_bytes_allocated_in_old_since_last_gc(size_in_regions * HeapRegion::GrainBytes);
        return result;
      }

      should_try_gc = !GCLocker::needs_gc();
      gc_count_before = total_collections();
    }

    if (should_try_gc) {
      bool succeeded;
      result = do_collection_pause(word_size, gc_count_before, &succeeded,
                                   GCCause::_g1_humongous_allocation);
      if (result != NULL) {
        assert(succeeded, "only way to get back a non-NULL result");
        log_trace(gc, alloc)("%s: Successfully scheduled collection returning " PTR_FORMAT,
                             Thread::current()->name(), p2i(result));
        return result;
      }

      if (succeeded) {
        log_trace(gc, alloc)("%s: Successfully scheduled collection failing to allocate "
                             SIZE_FORMAT " words", Thread::current()->name(), word_size);
        return NULL;
      }
      log_trace(gc, alloc)("%s: Unsuccessfully scheduled collection allocating " SIZE_FORMAT "",
                           Thread::current()->name(), word_size);
    } else {
      // Failed to schedule a collection.
      if (gclocker_retry_count > GCLockerRetryAllocationCount) {
        log_warning(gc, alloc)("%s: Retried waiting for GCLocker too often allocating "
                               SIZE_FORMAT " words", Thread::current()->name(), word_size);
        return NULL;
      }

      GCLocker::stall_until_clear();
      gclocker_retry_count += 1;
    }
  }

代码if (gclocker_retry_count > GCLockerRetryAllocationCount)标记了gc清理异常信息

需要注意的是GCLockerRetryAllocationCount的默认值是2,表示当分配中的垃圾回收次数超过这个阈值之后则直接失败。

为了保证gc的正常清理,需要调整此参数,根据cpu的核数调整至100(或者cpu的核数等值)

调整方式: 修改jvm增加以下配置

-XX:+UnlockDiagnosticVMOptions
-XX:GCLockerRetryAllocationCount=100

-XX:+UnlockDiagnosticVMOptions一定要添加否则GCLockerRetryAllocationCount将不会生效

小助手

我最近参与了一个关于Presto JVM性能调优和GCLocker垃圾回收的讨论，这是一个非常有争议和值得感谢的话题。

一些人认为在Presto JVM中使用GCLocker进行垃圾回收是必不可少的，因为它可以最大程度地减少垃圾收集暂停时间，从而提高系统的整体性能。他们认为如果垃圾收集暂停时间过长，可能会导致查询延迟和性能下降。

然而，也有人持不同意见。他们认为在现代的JVM上，垃圾回收机制已经得到了很大的改进，并且不再需要依赖于GCLocker来缩短垃圾收集暂停时间。他们认为过度依赖GCLocker可能会引发其他问题，例如增加系统复杂性和降低内存利用率。

讨论中，有人分享了他们的实际经验。其中一位参与者提到，他们使用Presto时遇到了一些性能问题，而使用GCLocker进行垃圾回收可以显著改善查询延迟。另一位参与者则表示，在他们的环境中，没有使用GCLocker也没有出现性能问题。

这个讨论引发了一个有趣的问题：在Presto JVM中使用GCLocker是否总是值得的？事实上，每个系统都是独一无二的，因此可能需要对特定环境进行测试和调优才能确定是否需要使用GCLocker。

总的来说，Presto JVM性能调优和GCLocker垃圾回收是一个充满争议和值得感谢的话题。参与讨论的人们提出了不同的观点和经验，从中我们可以看到这个问题的复杂性和多样性。在实践中，我们需要根据具体的环境和需求来决定是否使用GCLocker，并进行相应的测试和优化。