AWS Java SDK @ scale

download AWS Java SDK @ scale

of 27

  • date post

    17-Jul-2015
  • Category

    Software

  • view

    560
  • download

    0

Embed Size (px)

Transcript of AWS Java SDK @ scale

  • A W S J AVA S D K @ S C A L EB A S E D M O S T LY O N E X P E R I E N C E S W I T H S 3

    image source: http://xkcd.com/

  • C R E D E N T I A L SO U R

  • E N D P O I N T S

    REST API for everyone

    Great documentation

    http://aws.amazon.com/documentation/

  • A W S J AVA S D K

    One monolithic jar before 1.9.0

    Currently split into ~48 smaller modules dedicated to individual Amazon services

    All depend on aws-java-sdk-core module

    Other runtime dependencies:

    commons-logging

    apache http client (4.3.4)

    joda time

  • C R E D E N T I A L S

    Manually provide accessKey and secretKey (generated by IAM)

    Manual key management

    No automatic rotation

    Leaked keys will loose you serious $$$

    new AmazonS3Client(new BasicAWSCredentials(accessKey, secretKey));

  • C R E D E N T I A L S

    I only had S3 keys on my GitHub and they where gone within 5 minutes!

    Turns out through the S3 API you can actually spin up EC2 instances, and my key had been spotted by a bot that continually searches GitHub for API keys. Amazon AWS customer support informed me this happens a lot recently, hackers have created an algorithm that searches GitHub 24 hours per day for API keys. Once it finds one it spins up max instances of EC2 servers to farm itself bitcoins.

    Boom! A $2375 bill in the morning.

    http://www.devfactor.net/2014/12/30/2375-amazon-mistake/

  • C R E D E N T I A L S

    Use credentials provider

    Default behaviour when zero argument constructor is invoked

    EnvironmentVariableCredentialsProviderSystemPropertiesCredentialsProviderProfileCredentialsProvider InstanceProfileCredentialsProvider

    All but last one share security problems with manual access/secret keys management

    new AmazonS3Client();

  • C R E D E N T I A L S

    Use InstanceProfileCredentialsProvider

    Needs IAM role of the server to be configured with permissions needed by the service using this provider.

    Calls EC2 Instance Metadata Service to get current security credentials.

    http://169.254.169.254/latest/meta-data/iam/security-credentials/

    Automatic management and rotation of keys.

    Stored only in memory of calling process

  • C R E D E N T I A L S

    Use InstanceProfileCredentialsProvider

    Credentials are reloaded under lock which may cause latency spikes (every hour).

    Instantiate with refreshCredentialsAsync == true

    Problems when starting on developers machines

    Use AdRolls hologram to create fake environment locally

    https://github.com/AdRoll/hologram

  • B U I LT I N M O N I T O R I N G

    amazonS3Client.addRequestHandler(new RequestHandler2() { @Override public void beforeRequest(Request request) { } @Override public void afterResponse(Request request, Response response) { request.getAWSRequestMetrics()... } @Override public void afterError(Request request, Response response, Exception e) { }});

  • B U I LT I N M O N I T O R I N G

    AmazonS3Client amazonS3 = new AmazonS3Client( new StaticCredentialsProvider(credentials), new ClientConfiguration(), new RequestMetricCollector() { @Override public void collectMetrics(Request request, Response response) { }}

    );

  • T E S T I N G W I T H S 3

    Use buckets located close to testing site

    Use fake S3 process:

    https://github.com/jubos/fake-s3

    https://github.com/tkowalcz/fake-s3

    same thing but with few bug fixes

    Not scalable enough

    Write your own :(

    Not that hard//lookout for issue 414 amazonS3.setEndpoint(http://localhost...");

  • S C A R Y S T U F F

    #333 SDK can't list bucket nor delete S3 object with characters in range [0x00 - 0x1F] #333

    According to the S3 objects naming scheme, [0x00 - 0x1F] are valid characters for the S3 object. However, it's not possible to list bucket with such objects using the SDK (XML parser chokes on them) and also, they can't be deleted thru multi objects delete (also XML failure). What is interesting, download works just fine.

    #797 S3 delete_objects silently fails with object names containing characters in the 0x00-0x1F range

    Bulk delete over 1024 objects will fail with unrelated exception

  • A S Y N C H R O N O U S V E R S I O N S

    There is no truly asynchronous mode in AWS SDK

    Async versions of clients use synchronous blocking http calls but wrap them in a thread pool

    S3 has TransferManager (we have no experience here)

  • B A S I C S 3 P E R F O R M A N C E T I P S

    Pseudo random key prefix allows splitting files among S3 partitions evenly

    Listing is usually the bottleneck. Cache list results.

    Or write your own microservice to eliminate lists

  • S D K P E R F O R M A N C E

    Creates tons of short lived objects

    Many locks guarding internal state

    Profiled with Java Mission Control (if it does not crash)

    Or Yourkit

    Then test on production data

  • public XmlResponsesSaxParser() throws AmazonClientException { // Ensure we can load the XML Reader. try { xr = XMLReaderFactory.createXMLReader(); } catch (SAXException e) { throw new AmazonClientException("Couldn't initialize a SAX driver to create an XMLReader", e); } }

  • @Overrideprotected final CloseableHttpResponse doExecute(final HttpHost target, final HttpRequest request, final HttpContext context) throws IOException, ClientProtocolException { Args.notNull(request, "HTTP request"); // a null target may be acceptable, this depends on the route planner // a null context is acceptable, default context created below HttpContext execContext = null; RequestDirector director = null; HttpRoutePlanner routePlanner = null; ConnectionBackoffStrategy connectionBackoffStrategy = null; BackoffManager backoffManager = null; // Initialize the request execution context making copies of // all shared objects that are potentially threading unsafe. synchronized (this) {

  • public synchronized final ClientConnectionManager getConnectionManager() { if (connManager == null) { connManager = createClientConnectionManager(); } return connManager; } public synchronized final HttpRequestExecutor getRequestExecutor() { if (requestExec == null) { requestExec = createRequestExecutor(); } return requestExec; } public synchronized final AuthSchemeRegistry getAuthSchemes() { if (supportedAuthSchemes == null) { supportedAuthSchemes = createAuthSchemeRegistry(); } return supportedAuthSchemes; } public synchronized void setAuthSchemes(final AuthSchemeRegistry registry) { supportedAuthSchemes = registry; } public synchronized final ConnectionBackoffStrategy getConnectionBackoffStrategy() { return connectionBackoffStrategy; }

  • O L D A PA C H E H T T P C L I E N T ( 4 . 3 . 4 )

    Riddled with locks

    Reusing same client can save resources but at cost of performance

    different code paths may not target same sites

    open sockets are not that costly

    better use many client instances (e.g. per-thread)

    Make sure number of threads using one client instance it is less than maximum number of connections in its pool

    severe contention on returning connections to pool

    recent versions got better

  • B A S I C C O N F I G U R AT I O N

  • C L I E N T P O O L

    com.amazonaws.services.s3.AmazonS3

    int index = ThreadLocalRandom.current().nextInt(getMaxSize()); return clients[index];

  • W H AT T O D O W I T H T H I S ?

    Hardcore approach (classpath overrides of following classes)

    Our own AbstractAWSSigner that uses third party, lock free HmacSHA1 signing algorithm

    ResponseMetadataCache without locks (send metadata to /dev/null)

    AmazonHttpClient to remove call to System.getProperty

    DateUtils using joda time (now fixed in SDK itself)

  • D s t a t o u t p u t . U s e r m o d e c p u u s a g e m o s t l y r e l a t e d t o d a t a p r o c e s s i n g .

    P E R F O R M A N C E A C H I E V E D

    CPU (user, system, idle) Network transfer (IN/OUT) IRQ/CNTX

  • O P T I M I S AT I O N S R E S U LT

    com.amazonaws.services.s3.model.AmazonS3Exception:

    Please reduce your request rate.

    (Service: Amazon S3; Status Code: 503; Error Code: SlowDown)

  • H E N R Y P E T R O S K I

    "The most amazing achievement of the computer software industry is its continuing cancellation of

    the steady and staggering gains made by the computer hardware industry."