Professional Documents
Culture Documents
www.cs.wisc.edu/condor 1
What does Condor have?
› …lots of core technology for building a
distributed system
www.cs.wisc.edu/condor 2
What does Condor have?
› …lots of core technology for building a
distributed system
› …lots of core technology for monitoring the
status of a machine
www.cs.wisc.edu/condor 3
What does Condor have?
› …lots of core technology for building a
distributed system
› …lots of core technology for monitoring the
status of a machine
› …lots of core technology for managing a
work load of tasks
www.cs.wisc.edu/condor 4
What does Condor have?
› …lots of core technology for building a
distributed system
› …lots of core technology for monitoring the
status of a machine
› …lots of core technology for managing a
work load of tasks
› …lots of really, truly, skilled and
experienced developers and researchers at
building distributed systems. Some of the
best. Standout state employees. Honest.
Email for Wisconsin Gov Scott McCallum:
wisgov@gov.state.wi.us
www.cs.wisc.edu/condor 5
One day an
avid Condor
user asked:
www.cs.wisc.edu/condor 6
One day an Say, could Condor
avid Condor Technology be used
for distributed system
user asked: administration??
www.cs.wisc.edu/condor 7
Time to think…
› Gathered up our experiences with our
own management tasks, looked at the
mature Condor technology available to
us, and HawkEye effort was born.
› Completely separate from Condor
from end user prospective.
Can install HawkEye, or Condor, or both
www.cs.wisc.edu/condor 8
First Component:
MONITORING
› Sysadmins first need information
about what is happening on the
machines they are responsible for.
Both Current and Past
Information must be consolidated and
easily accessible
Information must be dynamic
www.cs.wisc.edu/condor 9
Condor ClassAds
› Technology for an entity to describe
itself
› Simple attribute value pairs
[
load_average = 1.3
free_Swap_space_mb = 140
number_of_processes = 92
keyboard_idle_secs = 6
ram = 128
total_swap = 512
total_memory = ram + total_swap
busy = load_average > 1.0
]
www.cs.wisc.edu/condor 10
Condor ClassAds, cont.
› No fixed schema
› Attributes can contain values or
expressions
› Serialize Ads in XML
› Open source libraries on C++ and Java to:
Manipulate Ads and Ad attributes
Store Ads
Query collections of Ads
www.cs.wisc.edu/condor 11
HawkEye Monitoring Agent
HawkEye
Manager ClassAd
Updates
Via
Secure
UDP
www.cs.wisc.edu/condor 12
HawkEye Monitoring Agent
HawkEye
Manager HawkEye Monitoring Agent
www.cs.wisc.edu/condor 13
HawkEye Monitoring Agent
Hawkeye_Startup_Agent
HawkEye Hawkeye_Monitor
Manager ClassAd
Updates /proc, kstat…
Via
Secure
UDP
HawkEye Monitoring Agent
www.cs.wisc.edu/condor 14
Monitor Agent, cont.
› Updates are sent periodically
Information does not get stale
www.cs.wisc.edu/condor 15
What if I want
to monitor
something you
didn’t think
about?
www.cs.wisc.edu/condor 16
Custom Attributes
Hawkeye_Startup_Agent
HawkEye Hawkeye_Monitor
Manager
/proc, kstat…
www.cs.wisc.edu/condor 17
Role of HawkEye HawkEye
Manager
Manager
www.cs.wisc.edu/condor 20
Running tasks on behalf of
the sysadmin
› Submit your sysadmin tasks to HawkEye
Tasks are stored in a persistent queue by the
Manager
Tasks can leave the queue upon completion, or
repeat after specified intervals
Tasks can have complex interdependencies via
DAGMan
Records are kept on which task ran where
› Sounds like Condor, eh?
Yes, but simpler…
www.cs.wisc.edu/condor 21
Run Tasks in response to
monitoring information
› ClassAd “Requirements” Attribute
› Example: Send email if a machine is low on
disk space or low on swap space
Submit an email task with an attribute:
Requirements = free_disk < 5 || free_swap < 5
› Example w/ task interdependency: If load
average is high and OS=Linux and console is
Idle, submit a task which runs “top”, if top
sees Netscape, submit a task to kill Netscape
www.cs.wisc.edu/condor 22
HawkEye Design Goals
› Monitoring
Reliable presence
Get Data off the node in an extensible, consistent
manner
› Run Tasks
In response to probe information
Repeat or once-only semantics
Audit Log
www.cs.wisc.edu/condor 23
Current Status
› Just Beginning this project
› Initial release early summer
› Prototypes already running –
Stop in and see initial HawkEye Work
Rm 3385 on Weds 9am – 12pm
www.cs.wisc.edu/condor 24
Thank you!
I was an
overworked
sysadmin. Now
I have more free
time thanks to
HawkEye!
www.cs.wisc.edu/condor 25