NetBeans 6 delivers great updates to the Matisse GUI builder. Spend a few minutes with Roman Strobl and get an expert briefing on what's new and what has changed. (sponsored)
In this, the third and final installation of Andres' Introduction to Groovy series, you learn about how Groovy handles variable numbers of arguments, named parameters, currying, and more about Groovy operators. Including, some new operators.
Swing Fuse (actually just Fuse), is a framework designed to make it easier to create your own custom desktop components. In this article, Daniel Spiewak shows you how to get started and provides sample source code you can download.
Willam Louth shows how he uses JXInsight Probes to investigate probable performance issues with code bases that he is not familiar with. He also highlights possible pitfalls in creating a benchmark, as well as in the analysis of results.
Architecting a flexible, scalable application or service can be a significant challenge. It's hard to decide on where the stress points are in your application until you've actually seen it fail, at least once. Yet more and more businesses are building applications which require high load tolerance and greater up-time. Not only that, but they want to be able to grow their user base with a minimum of fuss, thus putting more load on the application and forcing it to cope with ever increasing demand.
For such applications, a solid architecture is essential. Thought needs to be put into every aspect of the design, considering ways to reduce the runtime of areas you know will be hit hard, or ease the memory footprint by making the caching just a bit smarter. In short, it's a tough nut to crack, and really more of an art than a science.
This is relevant to desktop applications every bit as much as server-side. In fact, it may be of even greater importance. Users are willing to wait an extra second for a page load, sometimes more. But if a desktop application freezes for even half a second, panic alarms sound. Granted, server-side applications get traffic from diverse sources, but desktop applications have to deal with other problems such as perfect real-time user input, sometimes fronting for extremely complex applications.
What factors do you consider when designing performant (I just had to sneak that word in there) and scalable applications? How do you test your applications to ensure they perform well under load? Are there any dos or don'ts to avoid?
A scalable architecture has to be:
1) almost linear
2) cheap
3) based on clonable components/servers
4) transparent to the user
5) transparent to the application developer
6) self-scaling
7) self-healing
8) self-balanced
This is a wishlist list, of course. It does not exists such a system. Or does it?
No, really. Anyone who has been in this industry at all for any length of time, even just as an interested observer, knows the solution to this problem.
The scalable architecture is a stateless, and asynchronous one.
That's it. That's your silver bullet.
People like to point to Google and say "Look, they do it right, lets do what they do." But the real problem is that the Google system punts on a pretty major requirement for most every other system.
That requirement is freshness of the data. Google works on stale data. It's not up to date. It's recent, but hardly up to date.
If I change my website just before Google crawls it, and then hit Google right after, we all know that the change won't be reflected for some period of time.
A better example is to look at something like eBay. There, it's obviously quite important that when you hit refresh, you see the changes made to the item.
But eBay, while large, isn't as scalable as Google precisely for that reason.
But the problem with the stateless, asynchronous architecture is applying it to mundane systems.
The classic, N-Tier, deep stack, RPC based system won't scale as well as a stateless, async system, but they are SO much easier to write, and so applicable to MOST situations, that this is where most of the experience in the community lies.
Any Joe off the street can download NetBeans and pound out a useful, functional, and performant application easily using standard JEE patterns and practices.
But it won't scale to huge traffic, not without a lot of work.
The infrastructure of large systems do not "evolve" very well. They really need to be designed up front.
But the keys are: stateless requests for scalability across an asynchronous messaging structure for reliability. But this architecture is a pain to write for, and overkill for most applications.
Google runs far more services than just general search, and plenty of them operate on up-to-date data. Gmail, Docs, Spreadsheets, Blogger, Photos, Base, etc Google shards like crazy, shares-none when they can, and pretty much does everything right, as well as having half-a-million machines to play with.
If you look at the way people are scaling Rails and PHP, you'll see that pretty much anything can be scaled if you are willing to shard, cache, and replicate as much as possible.
Granted, most Web2.0 'twitter' style sites don't really require the kind of ACID properties you'd expert in a system that deals with financial transactions.
> If you look at the way people are scaling Rails and
> PHP, you'll see that pretty much anything can be
> scaled if you are willing to shard, cache, and
> replicate as much as possible.
"Shard" is a "new" word that keeps popping up. Anyone have a reference as to what it is?
The cost per transaction has to account for all cost factors: bandwidth, server capacity, physical infrastructure, administration, operations, backups, and the cost of capital.
(BTW, it's even better when the ratio of revenue to cost per transaction
grows
as the volume increases.)
The second really tough thing about scalability and architecture is that there isn't one that's
right
.
An architecture may work perfectly well for a range of transaction volumes, but fail badly as one variable gets large.
Don't treat "scalability" as either a binary issue or a moral failing. Ask instead, "how far will this architecture scale before the marginal cost deteriorates relative to the marginal revenue?" Then, follow that up with, "What part of the architecture will hit a scaling limit, and what can I incrementally replace to remove that limit?"
> > If you look at the way people are scaling Rails
> and
> > PHP, you'll see that pretty much anything can
> be
> > scaled if you are willing to shard, cache, and
> > replicate as much as possible.
>
> "Shard" is a "new" word that keeps popping up. Anyone
> have a reference as to what it is?
It's a term that, IIRC, Google invented, that is a synonym for partition. It's usually used to talk about partitioning a database across multiple instances. Google recently contributed a Shards module to Hibernate that automatically connects to multiple database instances and will partition your data based off of a provided hashing function.
> > > If you look at the way people are scaling
> Rails
> > and
> > > PHP, you'll see that pretty much anything
> can
> > be
> > > scaled if you are willing to shard, cache,
> and
> > > replicate as much as possible.
> >
> > "Shard" is a "new" word that keeps popping up.
> Anyone
> > have a reference as to what it is?
>
> It's a term that, IIRC, Google invented, that is a
> synonym for partition. It's usually used to talk
> about partitioning a database across multiple
> instances. Google recently contributed a Shards
> module to Hibernate that automatically connects to
> multiple database instances and will partition your
> data based off of a provided hashing function.
Wild! I've never heard of this. Seems like the logical next step really, though putting it in the ORM seems a bit odd. Can't this be handled at a lower level, preferably in a JDBC driver or something?
> Wild! I've never heard of this. Seems like the
> logical next step really, though putting it in the
> ORM seems a bit odd. Can't this be handled at a
> lower level, preferably in a JDBC driver or something?
There is no technical limitation to putting it in the JDBC driver, but there is a very real design issue. Normally partitioning (or sharding) is effective because there are easily identifiable boundaries in your dataset. For example, in a blog, you might never do a join across different user's blog postings. That type of data makes sense to be sharded since you don't lose anything (you can still have all your database level foreign keys, perform joins, etc.) but can still load balance across many database instances.
Knowledge of how and where to partition data lives way up in the business rules of an application. So putting the sharding code in the ORM is a kind of happy medium between making the business logic deal with multiple databases and having the JDBC driver, which knows nothing of your business rules, try and make guesses as to how best partition your data.
Another potential issue with putting it in the JDBC drivers is database vendors aren't exactly big fans of sharding. Some of them charge big money for similar clustering technologies.
What are the Keys to a Scalable Architecture?
At 1:20 AM on Sep 14, 2007, Daniel Spiewak wrote:
Fresh Jobs for Developers Post a job opportunity
For such applications, a solid architecture is essential. Thought needs to be put into every aspect of the design, considering ways to reduce the runtime of areas you know will be hit hard, or ease the memory footprint by making the caching just a bit smarter. In short, it's a tough nut to crack, and really more of an art than a science.
This is relevant to desktop applications every bit as much as server-side. In fact, it may be of even greater importance. Users are willing to wait an extra second for a page load, sometimes more. But if a desktop application freezes for even half a second, panic alarms sound. Granted, server-side applications get traffic from diverse sources, but desktop applications have to deal with other problems such as perfect real-time user input, sometimes fronting for extremely complex applications.
What factors do you consider when designing performant (I just had to sneak that word in there) and scalable applications? How do you test your applications to ensure they perform well under load? Are there any dos or don'ts to avoid?
11 replies so far (
Post your own)
Re: What are the Keys to a Scalable Architecture?
http://pragmaticprogrammer.com/titles/mnee/index.htmlRe: What are the Keys to a Scalable Architecture?
1,cache point, consider what kind of object should be cached.2,clear data flow.
3,key-point are testable
4,distributed,load balance
Re: What are the Keys to a Scalable Architecture?
A scalable architecture has to be:1) almost linear
2) cheap
3) based on clonable components/servers
4) transparent to the user
5) transparent to the application developer
6) self-scaling
7) self-healing
8) self-balanced
This is a wishlist list, of course. It does not exists such a system. Or does it?
Re: What are the Keys to a Scalable Architecture?
Reading this now, it's very good.Re: What are the Keys to a Scalable Architecture?
The problem isn't the architecture.That's a solved problem.
No, really. Anyone who has been in this industry at all for any length of time, even just as an interested observer, knows the solution to this problem.
The scalable architecture is a stateless, and asynchronous one.
That's it. That's your silver bullet.
People like to point to Google and say "Look, they do it right, lets do what they do." But the real problem is that the Google system punts on a pretty major requirement for most every other system.
That requirement is freshness of the data. Google works on stale data. It's not up to date. It's recent, but hardly up to date.
If I change my website just before Google crawls it, and then hit Google right after, we all know that the change won't be reflected for some period of time.
A better example is to look at something like eBay. There, it's obviously quite important that when you hit refresh, you see the changes made to the item.
But eBay, while large, isn't as scalable as Google precisely for that reason.
But the problem with the stateless, asynchronous architecture is applying it to mundane systems.
The classic, N-Tier, deep stack, RPC based system won't scale as well as a stateless, async system, but they are SO much easier to write, and so applicable to MOST situations, that this is where most of the experience in the community lies.
Any Joe off the street can download NetBeans and pound out a useful, functional, and performant application easily using standard JEE patterns and practices.
But it won't scale to huge traffic, not without a lot of work.
The infrastructure of large systems do not "evolve" very well. They really need to be designed up front.
But the keys are: stateless requests for scalability across an asynchronous messaging structure for reliability. But this architecture is a pain to write for, and overkill for most applications.
Re: What are the Keys to a Scalable Architecture?
Google runs far more services than just general search, and plenty of them operate on up-to-date data. Gmail, Docs, Spreadsheets, Blogger, Photos, Base, etc Google shards like crazy, shares-none when they can, and pretty much does everything right, as well as having half-a-million machines to play with.If you look at the way people are scaling Rails and PHP, you'll see that pretty much anything can be scaled if you are willing to shard, cache, and replicate as much as possible.
Granted, most Web2.0 'twitter' style sites don't really require the kind of ACID properties you'd expert in a system that deals with financial transactions.
Re: What are the Keys to a Scalable Architecture?
> If you look at the way people are scaling Rails and> PHP, you'll see that pretty much anything can be
> scaled if you are willing to shard, cache, and
> replicate as much as possible.
"Shard" is a "new" word that keeps popping up. Anyone have a reference as to what it is?
Re: What are the Keys to a Scalable Architecture?
"Scalable" is a tricky word. We use it like there's one single definition. We speak as if it's binary: this architecture is scalable, that one isn't.The first really tough thing about scalability is finding a useful definition. Here's the one I use:
"Marginal revenue / transaction > Marginal cost / transaction"
The cost per transaction has to account for all cost factors: bandwidth, server capacity, physical infrastructure, administration, operations, backups, and the cost of capital.
(BTW, it's even better when the ratio of revenue to cost per transaction grows as the volume increases.)
The second really tough thing about scalability and architecture is that there isn't one that's right .
An architecture may work perfectly well for a range of transaction volumes, but fail badly as one variable gets large.
Don't treat "scalability" as either a binary issue or a moral failing. Ask instead, "how far will this architecture scale before the marginal cost deteriorates relative to the marginal revenue?" Then, follow that up with, "What part of the architecture will hit a scaling limit, and what can I incrementally replace to remove that limit?"
michael@michaelnygard.com
www.michaelnygard.com
Author of Release It!: Design and Deploy Production-Ready Software
Re: What are the Keys to a Scalable Architecture?
> > If you look at the way people are scaling Rails> and
> > PHP, you'll see that pretty much anything can
> be
> > scaled if you are willing to shard, cache, and
> > replicate as much as possible.
>
> "Shard" is a "new" word that keeps popping up. Anyone
> have a reference as to what it is?
It's a term that, IIRC, Google invented, that is a synonym for partition. It's usually used to talk about partitioning a database across multiple instances. Google recently contributed a Shards module to Hibernate that automatically connects to multiple database instances and will partition your data based off of a provided hashing function.
Re: What are the Keys to a Scalable Architecture?
> > > If you look at the way people are scaling> Rails
> > and
> > > PHP, you'll see that pretty much anything
> can
> > be
> > > scaled if you are willing to shard, cache,
> and
> > > replicate as much as possible.
> >
> > "Shard" is a "new" word that keeps popping up.
> Anyone
> > have a reference as to what it is?
>
> It's a term that, IIRC, Google invented, that is a
> synonym for partition. It's usually used to talk
> about partitioning a database across multiple
> instances. Google recently contributed a Shards
> module to Hibernate that automatically connects to
> multiple database instances and will partition your
> data based off of a provided hashing function.
Wild! I've never heard of this. Seems like the logical next step really, though putting it in the ORM seems a bit odd. Can't this be handled at a lower level, preferably in a JDBC driver or something?
ActiveObjects: an Easier Java ORM; Fuse: Resource Injection for Java
Re: What are the Keys to a Scalable Architecture?
> Wild! I've never heard of this. Seems like the> logical next step really, though putting it in the
> ORM seems a bit odd. Can't this be handled at a
> lower level, preferably in a JDBC driver or something?
There is no technical limitation to putting it in the JDBC driver, but there is a very real design issue. Normally partitioning (or sharding) is effective because there are easily identifiable boundaries in your dataset. For example, in a blog, you might never do a join across different user's blog postings. That type of data makes sense to be sharded since you don't lose anything (you can still have all your database level foreign keys, perform joins, etc.) but can still load balance across many database instances.
Knowledge of how and where to partition data lives way up in the business rules of an application. So putting the sharding code in the ORM is a kind of happy medium between making the business logic deal with multiple databases and having the JDBC driver, which knows nothing of your business rules, try and make guesses as to how best partition your data.
Another potential issue with putting it in the JDBC drivers is database vendors aren't exactly big fans of sharding. Some of them charge big money for similar clustering technologies.