Java中简化堆外内存结构、元组,java简化
Java中简化堆外内存结构、元组,java简化
在上一篇文件章中,我详细描述了内存访问模式的重要性。从那时起,我就在考虑在Java中怎样做才能更好的预测内存的使用情况。有一些使用数组结构的模式可以使用,我将在另一篇文章中讨论。这篇文章将探讨怎样模拟一个Java中严重缺失的特性——类似于C语言中的结构数组。
结构是非常有用的,无论在栈还是在堆。据我所知,Java栈的特性是无法模拟的。这样极大地限制了一些并行算法的性能,这一点以后再说。
在Java中,所有的用户自定义类型必须存在于堆中。在一般情况下,Java堆是由垃圾回收器管理,但是Java程序中还存在更广泛的堆。通过引入直接的ByteBuffer,内存可以不被垃圾收集器跟踪,因为它可以使本地代码在与内核IO交互时避免对数据重复性操作。因此,管理结构时假装它们在ByteBuffer内是一种合理的方法。这可以简化数据表示,但有性能和规模的限制。例如,ByteBuffer不能大于2GB,所有的访问都要进行边界检查,这将影响性能。另一个可供选择的方法是使用Unsafe类,速度更快,并且不像ByteBuffer那样有大小限制。
我将要详述的不是传统的方法。如果你要处理大规模数据,或者对性能要求很高,那请你接着往下看。如果你的数据集较小,性能不是问题,那么现在跑开,以免陷入本地内存管理的黑魔法中。
使用该方法具体的好处是:
- 显著提高性能
- 更简化的数据表示
- 非常强的大数据集工作能力,同时避免讨厌的GC pauses. [1]
使用这些方法也要承担一些后果。使用下面提到的方法时需要你自己进行一部分内存管理工作。如果出现问题可能导致内存泄露,更严重的会导致JVM崩溃!请谨慎行事…
合适的例子 – 贸易数据
金融应用程序面临的一个共同挑战是需要获取和处理大量的订单及贸易数据。对于这个例子,我会根据内存中的贸易数据创建一张大表,并可以针对它运行分析查询。这张表将使用两种对比性很强的方法创建。第一种,我将使用Java中的传统方法,创建一个大的数组,数组中的每个元素是一个Trade对象的引用。第二种,我将使用相同的代码,但是把大数组和Trade对象换成堆外数组结构,使得可以通过一个享元(Flyweight)模式处理。
如果在传统方法我使用了其它的数据结构,如Map或Tree, 那么内存占用会更大,性能会更低。
传统的Java方法:
public class TestJavaMemoryLayout { private static final int NUM_RECORDS = 50 * 1000 * 1000; private static JavaMemoryTrade[] trades; public static void main(final String[] args) { for (int i = 0; i < 5; i++) { System.gc(); perfRun(i); } } private static void perfRun(final int runNum) { long start = System.currentTimeMillis(); init(); System.out.format('Memory %,d total, %,d free\n', Runtime.getRuntime().totalMemory(), Runtime.getRuntime().freeMemory()); long buyCost = 0; long sellCost = 0; for (int i = 0; i < NUM_RECORDS; i++) { final JavaMemoryTrade trade = get(i); if (trade.getSide() == 'B') { buyCost += (trade.getPrice() * trade.getQuantity()); } else { sellCost += (trade.getPrice() * trade.getQuantity()); } } long duration = System.currentTimeMillis() - start; System.out.println(runNum + ' - duration ' + duration + 'ms'); System.out.println('buyCost = ' + buyCost + ' sellCost = ' + sellCost); } private static JavaMemoryTrade get(final int index) { return trades[index]; } public static void init() { trades = new JavaMemoryTrade[NUM_RECORDS]; final byte[] londonStockExchange = {'X', 'L', 'O', 'N'}; final int venueCode = pack(londonStockExchange); final byte[] billiton = {'B', 'H', 'P'}; final int instrumentCode = pack( billiton); for (int i = 0; i < NUM_RECORDS; i++) { JavaMemoryTrade trade = new JavaMemoryTrade(); trades[i] = trade; trade.setTradeId(i); trade.setClientId(1); trade.setVenueCode(venueCode); trade.setInstrumentCode(instrumentCode); trade.setPrice(i); trade.setQuantity(i); trade.setSide((i & 1) == 0 ? 'B' : 'S'); } } private static int pack(final byte[] value) { int result = 0; switch (value.length) { case 4: result = (value[3]); case 3: result |= ((int)value[2] << 8); case 2: result |= ((int)value[1] << 16); case 1: result |= ((int)value[0] << 24); break; default: throw new IllegalArgumentException('Invalid array size'); } return result; } private static class JavaMemoryTrade { private long tradeId; private long clientId; private int venueCode; private int instrumentCode; private long price; private long quantity; private char side; public long getTradeId() { return tradeId; } public void setTradeId(final long tradeId) { this.tradeId = tradeId; } public long getClientId() { return clientId; } public void setClientId(final long clientId) { this.clientId = clientId; } public int getVenueCode() { return venueCode; } public void setVenueCode(final int venueCode) { this.venueCode = venueCode; } public int getInstrumentCode() { return instrumentCode; } public void setInstrumentCode(final int instrumentCode) { this.instrumentCode = instrumentCode; } public long getPrice() { return price; } public void setPrice(final long price) { this.price = price; } public long getQuantity() { return quantity; } public void setQuantity(final long quantity) { this.quantity = quantity; } public char getSide() { return side; } public void setSide(final char side) { this.side = side; } } }
简化的堆外内存结构
import sun.misc.Unsafe; import java.lang.reflect.Field; public class TestDirectMemoryLayout { private static final Unsafe unsafe; static { try { Field field = Unsafe.class.getDeclaredField('theUnsafe'); field.setAccessible(true); unsafe = (Unsafe)field.get(null); } catch (Exception e) { throw new RuntimeException(e); } } private static final int NUM_RECORDS = 50 * 1000 * 1000; private static long address; private static final DirectMemoryTrade flyweight = new DirectMemoryTrade(); public static void main(final String[] args) { for (int i = 0; i < 5; i++) { System.gc(); perfRun(i); } } private static void perfRun(final int runNum) { long start = System.currentTimeMillis(); init(); System.out.format('Memory %,d total, %,d free\n', Runtime.getRuntime().totalMemory(), Runtime.getRuntime().freeMemory()); long buyCost = 0; long sellCost = 0; for (int i = 0; i < NUM_RECORDS; i++) { final DirectMemoryTrade trade = get(i); if (trade.getSide() == 'B') { buyCost += (trade.getPrice() * trade.getQuantity()); } else { sellCost += (trade.getPrice() * trade.getQuantity()); } } long duration = System.currentTimeMillis() - start; System.out.println(runNum + ' - duration ' + duration + 'ms'); System.out.println('buyCost = ' + buyCost + ' sellCost = ' + sellCost); destroy(); } private static DirectMemoryTrade get(final int index) { final long offset = address + (index * DirectMemoryTrade.getObjectSize()); flyweight.setObjectOffset(offset); return flyweight; } public static void init() { final long requiredHeap = NUM_RECORDS * DirectMemoryTrade.getObjectSize(); address = unsafe.allocateMemory(requiredHeap); final byte[] londonStockExchange = {'X', 'L', 'O', 'N'}; final int venueCode = pack(londonStockExchange); final byte[] billiton = {'B', 'H', 'P'}; final int instrumentCode = pack( billiton); for (int i = 0; i < NUM_RECORDS; i++) { DirectMemoryTrade trade = get(i); trade.setTradeId(i); trade.setClientId(1); trade.setVenueCode(venueCode); trade.setInstrumentCode(instrumentCode); trade.setPrice(i); trade.setQuantity(i); trade.setSide((i & 1) == 0 ? 'B' : 'S'); } } private static void destroy() { unsafe.freeMemory(address); } private static int pack(final byte[] value) { int result = 0; switch (value.length) { case 4: result |= (value[3]); case 3: result |= ((int)value[2] << 8); case 2: result |= ((int)value[1] << 16); case 1: result |= ((int)value[0] << 24); break; default: throw new IllegalArgumentException('Invalid array size'); } return result; } private static class DirectMemoryTrade { private static long offset = 0; private static final long tradeIdOffset = offset += 0; private static final long clientIdOffset = offset += 8; private static final long venueCodeOffset = offset += 8; private static final long instrumentCodeOffset = offset += 4; private static final long priceOffset = offset += 4; private static final long quantityOffset = offset += 8; private static final long sideOffset = offset += 8; private static final long objectSize = offset += 2; private long objectOffset; public static long getObjectSize() { return objectSize; } void setObjectOffset(final long objectOffset) { this.objectOffset = objectOffset; } public long getTradeId() { return unsafe.getLong(objectOffset + tradeIdOffset); } public void setTradeId(final long tradeId) { unsafe.putLong(objectOffset + tradeIdOffset, tradeId); } public long getClientId() { return unsafe.getLong(objectOffset + clientIdOffset); } public void setClientId(final long clientId) { unsafe.putLong(objectOffset + clientIdOffset, clientId); } public int getVenueCode() { return unsafe.getInt(objectOffset + venueCodeOffset); } public void setVenueCode(final int venueCode) { unsafe.putInt(objectOffset + venueCodeOffset, venueCode); } public int getInstrumentCode() { return unsafe.getInt(objectOffset + instrumentCodeOffset); } public void setInstrumentCode(final int instrumentCode) { unsafe.putInt(objectOffset + instrumentCodeOffset, instrumentCode); } public long getPrice() { return unsafe.getLong(objectOffset + priceOffset); } public void setPrice(final long price) { unsafe.putLong(objectOffset + priceOffset, price); } public long getQuantity() { return unsafe.getLong(objectOffset + quantityOffset); } public void setQuantity(final long quantity) { unsafe.putLong(objectOffset + quantityOffset, quantity); } public char getSide() { return unsafe.getChar(objectOffset + sideOffset); } public void setSide(final char side) { unsafe.putChar(objectOffset + sideOffset, side); } } }
运行结果:
Intel i7-860 @ 2.8GHz, 8GB RAM DDR3 1333MHz, Windows 7 64-bit, Java 1.7.0_07 ============================================= java -server -Xms4g -Xmx4g TestJavaMemoryLayout Memory 4,116,054,016 total, 1,108,901,104 free 0 - duration 19334ms Memory 4,116,054,016 total, 1,109,964,752 free 1 - duration 14295ms Memory 4,116,054,016 total, 1,108,455,504 free 2 - duration 14272ms Memory 3,817,799,680 total, 815,308,600 free 3 - duration 28358ms Memory 3,817,799,680 total, 810,552,816 free 4 - duration 32487ms java -server TestDirectMemoryLayout Memory 128,647,168 total, 126,391,384 free 0 - duration 983ms Memory 128,647,168 total, 126,992,160 free 1 - duration 958ms Memory 128,647,168 total, 127,663,408 free 2 - duration 873ms Memory 128,647,168 total, 127,663,408 free 3 - duration 886ms Memory 128,647,168 total, 127,663,408 free 4 - duration 884ms Intel i7-2760QM @ 2.40GHz, 8GB RAM DDR3 1600MHz, Linux 3.4.11 kernel 64-bit, Java 1.7.0_07 ================================================= java -server -Xms4g -Xmx4g TestJavaMemoryLayout Memory 4,116,054,016 total, 1,108,912,960 free 0 - duration 12262ms Memory 4,116,054,016 total, 1,109,962,832 free 1 - duration 9822ms Memory 4,116,054,016 total, 1,108,458,720 free 2 - duration 10239ms Memory 3,817,799,680 total, 815,307,640 free 3 - duration 21558ms Memory 3,817,799,680 total, 810,551,856 free 4 - duration 23074ms java -server TestDirectMemoryLayout Memory 123,994,112 total, 121,818,528 free 0 - duration 634ms Memory 123,994,112 total, 122,455,944 free 1 - duration 619ms Memory 123,994,112 total, 123,103,320 free 2 - duration 546ms Memory 123,994,112 total, 123,103,320 free 3 - duration 547ms Memory 123,994,112 total, 123,103,320 free 4 - duration 534ms
分析
让我们按上面提到的三个优势依次比较一下运行结果。
- 显著提高性能结果很明显,使用离堆结构要快一个数量级。最好的情况是在用Sandy Bridge处理器运行至第五次的时候,我们完成任务快了43.2倍。这也说明了Sandy Bridge处理器在数据存取预测方面做的比较好。不仅性能更好,而且更加稳定。而稍后使用标准的Java方法运行时,我们可以看到,随着堆变得分散,数据访问随机性增强,性能会下降。
- 更简化的数据表示在我们的离堆例子中,每个对象需要42个字节。而在这个例子中需要存储5千万个对象,也就是2,100,000,000字节。JVM堆所需要的内存是:所需的内存=总内存 – 空闲内存 – 基本JVM需求2,883,248,712 = 3,817,799,680 – 810,551,856 – 123,999,112这意味着JVM需要〜40%的内存来表示相同的数据。此开销用在Java对象的引用数组及对象头的存储上。在前面的文章中我讨论过Java的对象设计。当数据量非常大时这方面的开销可能成为一个显著的限制因素。
- 非常强的大数据集工作能力,同时避免讨厌的GC pauses.上面的示例代码在每次运行周期前强制GC,这样可以提高每次运行结果的一致性。你可以去掉System.gc()调用后再观察一下影响。如果您添加以下命令行参数后再运行,垃圾回收器将会输出具体发生的令人头疼的细节。-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics分析输出信息,我们以可看出一共运行了29个GC周期。下面列出的暂停时间是从输出中提取出来的,表示程序线程停止时间:
With System.gc() before each run ================================ Total time for which application threads were stopped: 0.0085280 seconds Total time for which application threads were stopped: 0.7280530 seconds Total time for which application threads were stopped: 8.1703460 seconds Total time for which application threads were stopped: 5.6112210 seconds Total time for which application threads were stopped: 1.2531370 seconds Total time for which application threads were stopped: 7.6392250 seconds Total time for which application threads were stopped: 5.7847050 seconds Total time for which application threads were stopped: 1.3070470 seconds Total time for which application threads were stopped: 8.2520880 seconds Total time for which application threads were stopped: 6.0949910 seconds Total time for which application threads were stopped: 1.3988480 seconds Total time for which application threads were stopped: 8.1793240 seconds Total time for which application threads were stopped: 6.4138720 seconds Total time for which application threads were stopped: 4.4991670 seconds Total time for which application threads were stopped: 4.5612290 seconds Total time for which application threads were stopped: 0.3598490 seconds Total time for which application threads were stopped: 0.7111000 seconds Total time for which application threads were stopped: 1.4426750 seconds Total time for which application threads were stopped: 1.5931500 seconds Total time for which application threads were stopped: 10.9484920 seconds Total time for which application threads were stopped: 7.0707230 seconds Without System.gc() before each run =================================== Test run times 0 - duration 12120ms 1 - duration 9439ms 2 - duration 9844ms 3 - duration 20933ms 4 - duration 23041ms Total time for which application threads were stopped: 0.0170860 seconds Total time for which application threads were stopped: 0.7915350 seconds Total time for which application threads were stopped: 10.7153320 seconds Total time for which application threads were stopped: 5.6234650 seconds Total time for which application threads were stopped: 1.2689950 seconds Total time for which application threads were stopped: 7.6238170 seconds Total time for which application threads were stopped: 6.0114540 seconds Total time for which application threads were stopped: 1.2990070 seconds Total time for which application threads were stopped: 7.9918480 seconds Total time for which application threads were stopped: 5.9997920 seconds Total time for which application threads were stopped: 1.3430040 seconds Total time for which application threads were stopped: 8.0759940 seconds Total time for which application threads were stopped: 6.3980610 seconds Total time for which application threads were stopped: 4.5572100 seconds Total time for which application threads were stopped: 4.6193830 seconds Total time for which application threads were stopped: 0.3877930 seconds Total time for which application threads were stopped: 0.7429270 seconds Total time for which application threads were stopped: 1.5248070 seconds Total time for which application threads were stopped: 1.5312130 seconds Total time for which application threads were stopped: 10.9120250 seconds Total time for which application threads were stopped: 7.3528590 seconds
从输出中可以看出垃圾回收器花费了相当比例的时间。当你的线程停止时,你的程序不再响应。这些测试中使用了默认的GC设置。可以通过调整GC配置来获得更好的结果,但是这需要高超的技术水平和能力。我知道的唯一一个即使在高通量条件下也可以不依赖强制设置长的停顿时间来高效处理的JVM是Azul并行压缩收集器。
通过分析这个程序,我可以看到大部分时间都是花在分配对象和使他们成为老一代,因为他们不适合年轻一代。初始化成本通过定时删除也是不现实的。如果使用传统的Java方法,在查询进行前,必须先建立状态。程序的最终用户必须等待状态建立和查询执行。
这个测试是真的很微不足道。但是想象一下,类似的数据集在100 GB的规模时会怎样运作。
注意:当垃圾回收器处理一个区域时,会将对象彼此移动到较远的位置,导致TLB和其他高速缓存命中率降低。
序列化的边注
使用离堆结构的一个巨大优势是他们可以很容易的序列化,通过我之前文章提到的简单的内存拷贝来在网络中传输或存储。这样我们就可以完全绕过中间缓冲和对象分配。
结论
如果你愿意对大型数据集做一些C风格的编程,就可以通过离堆来控制Java中的内存布局。如果你这样做,在性能,紧凑性,避免GC问题等方面的好处将很显著。然而,不是所有应用程序都适用这样的方法。只有在数据量非常巨大,或者在吞吐量和/或延迟方面性能要求非常高的场合才比较适用。
我希望Java社区能集体意识到堆和栈两者的支撑结构的重要性。John Rose在定义元组怎样才能被添加到JVM这一领域做了很多出色的工作。他在今年的JVM语言峰会上关于Arrays 2.0的谈话值得注意。John在他的谈话中讨论了结构数组和数组结构的选择。如果John提议的元组可用,那么这里所描述的测试可能有相当的性能,并可成为一个更愉快的编程风格。结构的整体阵列可以被分配在一个单一的动作,从而绕过单个对象的跨代的副本,并将使用一个紧凑的连续的方式储存。这将消除这类显著的GC问题。
最近,我比较了Java和.NET之间的标准数据结构。在某些情况下,如图或字典,Java比使用自身数据结构的.NET有6到10倍的性能优势。让我们尽快使用Java!
结果也很明显的表明,如果我们要使用Java进行大数据的实时分析,那么我们的标准垃圾收集器需要进行显著改善,并能够支持真正的并行操作。
[1]据我所知,唯一的能处理非常大的堆的JVM是Azul Zing
原文链接: javacodegeeks 翻译: Wld5.com - norwind译文链接: http://www.wld5.com/10567.html
[ 转载请保留原文出处、译者和译文链接。]
相关文章
- 暂无相关文章
用户点评